Blog
Notes on building with AI.
Short essays, lessons, and breakdowns on the AI space — agents, retrieval, evals, governance, and the engineering work that turns research into something a business can actually rely on.
-
The short leash is the sane way to use coding agents.
The backlash against AI-authored code is not really about banning coding agents. It is about refusing to merge code nobody owns. The short-leash pattern treats AI as a fast collaborator, not a mystery coworker with commit access and a suspiciously confident grin.
Read article → -
Agents need a trust layer.
Agent demos feel magical until someone asks what they are allowed to do. The next serious agent platforms will compete on trust as much as intelligence: identity, permissions, audit trails, and revocation. Capability got us here. Boundaries are what make it usable.
Read article → -
Fable 5 Didn't Fail. Our AI Governance Model Did.
The Fable 5 suspension wasn't a cloud outage or a bad deployment. It was a governance outage — one that reveals why frontier models can no longer be treated like ordinary software. Here's what enterprises and the industry need to reckon with.
Read article → -
The merge test is the only eval that matters.
Most AI coding benchmarks ask "does this code pass the tests?" That's the wrong question. The right question is the one a senior engineer asks before hitting approve on a PR — and a handful of teams are finally starting to build evals around it. Here's why that framing shift matters more than where any tool sits on the leaderboard.
Read article → -
Fable 5 Just Raised the Ceiling. The Most Interesting Part Isn't the Benchmarks.
Benchmark day again. The leaderboard moved, the numbers are big, and every headline is about the benchmarks. Fine. But the detail that actually stopped me mid-coffee was a single sentence about stamina — and it changes the question every agent builder should be asking about the systems they ship.
Read article → -
The Data Was Always There.
Apple built a billion-device empire on a promise not to look inside it. Your photos, messages, health data, sleep patterns — all of it sat on hardware Apple deliberately chose not to mine. That was integrity. Then the AI era arrived and data became the whole game. Now Apple is playing catch-up without the training fuel its competitors spent a decade quietly collecting.
Read article → -
The Modular Pipeline Had a Good Run.
The canonical multimodal AI pipeline — vision encoder, audio encoder, projection layer, language model — became so standard that it stopped feeling like a design choice. Google's Gemma 4 12B has no encoders at all. Vision and audio flow directly into the language backbone as tokens. That's not an optimization. That's a different theory of where perception belongs, and it's probably the direction the whole field is heading.
Read article → -
The Cap is the Canary.
Uber capped AI coding tool spend at $1,500 per engineer per month after burning through their 2026 AI budget in four months. Everyone called it a cost story. I think it's something more revealing: the first honest public valuation of what enterprise AI is actually worth to the people paying for it. The number tells you more than the press release ever would.
Read article → -
The evidence trail is the product.
Long context is useful, but it doesn't magically make an AI system trustworthy. The harder problem is knowing what evidence the system used, what it ignored, and whether the answer can be replayed. Evidence trails are becoming a core product feature. Nobody wants a confident junk drawer with a chat box.
Read article → - Retrieval
The case for Hermes.
Nous Research's Hermes Agent is open-source, self-hosted, and free — but the thing that matters is what it bets on: an agent that actually grows with you, not a stateless harness that forgets every session. Notes from a weekend with it (and why it made me leave OpenClaw).
Read article →The agent reliability gap.
Every other AI feed is an agent demo going viral. The gap between those demos and an agent that survives a Monday morning in production is wider than it looks — and closing it is the real story of AI in 2026.
Read article →Welcome — what this blog is for.
A short intro to what I'm planning to write about here: practical AI engineering, agent architectures, the gap between demos and production, and the trends actually worth your attention.
Read article →