Insights
Thoughts on building AI systems that actually work.
Agents need evals, not vibes
Most agent demos succeed because the human silently compensates for the agent. In production, nobody is there to nudge the model. Here's how to build agents that survive contact with reality.
LLM cost control is a product feature
If your unit economics are powered by tokens, you're running a software business and a commodities desk at the same time. Here's how to govern LLM spend without breaking quality.
Chaos-proof delivery: shipping AI with TDD + CI
AI moves too fast for discipline? Reality: AI moves too fast without discipline. Here's how TDD and CI actually work for AI systems.
Fine-tune, RAG, or prompt? A ruthless decision framework
Start cheap. Most teams fine-tune too early. Here's a ruthless decision framework for when to use prompting, RAG, or fine-tuning.
A production blueprint for one agentic workflow
Take one workflow and ship it as a reliable service. Here's the minimal architecture that actually works in production.