Uncategorized

Designing Multi-Model AI Systems on AWS: Routing, RAG, and Inference Optimization

A team I worked with ran their entire product on Claude 3.5 Sonnet. Every request – from simple classification to complex document analysis – hit the same model at $6 per million input tokens and $30 per million output tokens. When they implemented Bedrock’s Intelligent Prompt Routing to split traffic between Haiku and Sonnet based […]

Designing Multi-Model AI Systems on AWS: Routing, RAG, and Inference Optimization Read More »

AI Safety Engineering: From Constitutional Classifiers to Circuit Tracing

Anthropic ran a public red teaming exercise against their Constitutional Classifiers system via HackerOne, offering up to $15,000 to anyone who could find a universal jailbreak. 405 invited participants spent over 3,000 hours (mean estimate: 4,720 hours) trying to answer ten targeted CBRN queries at a harmful threshold. No report succeeded. One apparent universal jailbreak

AI Safety Engineering: From Constitutional Classifiers to Circuit Tracing Read More »

Agentic Frameworks in 2026: What Actually Works in Production

Six months ago, picking an agent framework felt like choosing a JavaScript framework in 2016 – new options every week, each claiming to be the production-ready one, none with enough real-world mileage to prove it. That changed faster than expected. LangGraph reached 1.0 in October 2025, CrewAI passed 450 million processed workflows, Amazon Bedrock AgentCore

Agentic Frameworks in 2026: What Actually Works in Production Read More »

AWS Optimization in 2026: Graviton4, Lambda Managed Instances, and the New Cost Playbook

A team I advise spent three months migrating their Java microservices to Graviton3. They finished in November 2025. Two weeks later, AWS announced Graviton4 with 30% better compute performance. Their reaction was predictable. But when we looked at the numbers, the real optimization opportunity wasn’t the chip upgrade at all. It was the combination of

AWS Optimization in 2026: Graviton4, Lambda Managed Instances, and the New Cost Playbook Read More »

Building Phone Call Agents on AWS With Nova 2 Sonic

If you want to build a production-grade voice agent, there are some things you will need: A way to connect phone networks. Speech-to-text processing. Understanding requests and executing actions. Text-to-speech output. Observability so it’s not a black box. The hard part is not any single component. Managing voice streams, converting speech to text for processing,

Building Phone Call Agents on AWS With Nova 2 Sonic Read More »

Defense-in-Depth for Healthcare AI: Evaluating Architectural Approaches for Safety and Compliance

Your healthcare AI chatbot passed security review. It has Amazon Bedrock guardrails configured to block PII and sensitive medical topics. The web client connects directly to the Bedrock runtime endpoint. Everything works in testing. Then a patient asks: “I’m John Smith, SSN 123–45–6789, and I have stage 4 pancreatic cancer. What are my treatment options?”

Defense-in-Depth for Healthcare AI: Evaluating Architectural Approaches for Safety and Compliance Read More »

What AWS Thinks You Should Know About Building GenAI Systems

AWS just released a new certification: Certified Generative AI Developer – Professional. If you work with GenAI on AWS or plan to, this exam outline doubles as a surprisingly useful roadmap for what you need to learn, regardless of whether you ever sit for the test. The blueprint maps what separates proof-of-concept GenAI from production-grade

What AWS Thinks You Should Know About Building GenAI Systems Read More »

Werner Vogels’ Last re:Invent Keynote and the Six Qualities That Define Developers in the AI Era

Werner Vogels stepped on stage for what he announced would be his final re:Invent keynote after roughly 14 years. After more than a decade of defining how AWS thinks about building software, he handed developers a framework for the next era: the Renaissance Developer. The timing makes sense. AI coding assistants are everywhere. Developers are

Werner Vogels’ Last re:Invent Keynote and the Six Qualities That Define Developers in the AI Era Read More »

AI, Quotes and Vibecoding: How To Compare Software Proposals In 2025

Two quotes land on your desk for the same feature set. Both list similar features and both mention AI in their process. One includes architecture reviews, test suites, and hardening sprints. The other focuses on “AI-powered development, delivered in days.” On paper they can look comparable. In practice they are often based on very different

AI, Quotes and Vibecoding: How To Compare Software Proposals In 2025 Read More »

Trying ChatGPT Atlas? Here’s How to Run It Even If You Don’t Own a Mac with Apple Silicon

OpenAI just launched ChatGPT Atlas on October 21, 2025, a browser that brings AI-powered assistance directly into your web experience. There’s one catch: it’s macOS-only at launch, with Windows, iOS, and Android versions coming soon. If you’re on Windows or Linux today and want to try it, you’ll need access to a Mac environment. The

Trying ChatGPT Atlas? Here’s How to Run It Even If You Don’t Own a Mac with Apple Silicon Read More »