Blog

Notes on building with AI.

Short essays, lessons, and breakdowns on the AI space — agents, retrieval, evals, governance, and the engineering work that turns research into something a business can actually rely on.

July 04, 2026 · 8 min read

The short leash is the sane way to use coding agents.

The backlash against AI-authored code is not really about banning coding agents. It is about refusing to merge code nobody owns. The short-leash pattern treats AI as a fast collaborator, not a mystery coworker with commit access and a suspiciously confident grin.
- Agents
- Engineering
Read article →
June 30, 2026 · 5 min read

Agents need a trust layer.

Agent demos feel magical until someone asks what they are allowed to do. The next serious agent platforms will compete on trust as much as intelligence: identity, permissions, audit trails, and revocation. Capability got us here. Boundaries are what make it usable.
- Agents
- Governance
Read article →
June 13, 2026 · 16 min read

Fable 5 Didn't Fail. Our AI Governance Model Did.

The Fable 5 suspension wasn't a cloud outage or a bad deployment. It was a governance outage — one that reveals why frontier models can no longer be treated like ordinary software. Here's what enterprises and the industry need to reckon with.
- Governance
- Security
Read article →
June 13, 2026 · 5 min read

The merge test is the only eval that matters.

Most AI coding benchmarks ask "does this code pass the tests?" That's the wrong question. The right question is the one a senior engineer asks before hitting approve on a PR — and a handful of teams are finally starting to build evals around it. Here's why that framing shift matters more than where any tool sits on the leaderboard.
- Evals
- Production
Read article →
June 10, 2026 · 7 min read

Fable 5 Just Raised the Ceiling. The Most Interesting Part Isn't the Benchmarks.

Benchmark day again. The leaderboard moved, the numbers are big, and every headline is about the benchmarks. Fine. But the detail that actually stopped me mid-coffee was a single sentence about stamina — and it changes the question every agent builder should be asking about the systems they ship.
- Agents
- Production
Read article →
June 9, 2026 · 6 min read

The Data Was Always There.

Apple built a billion-device empire on a promise not to look inside it. Your photos, messages, health data, sleep patterns — all of it sat on hardware Apple deliberately chose not to mine. That was integrity. Then the AI era arrived and data became the whole game. Now Apple is playing catch-up without the training fuel its competitors spent a decade quietly collecting.
- Industry
- Meta
Read article →
June 08, 2026 · 4 min read

The Modular Pipeline Had a Good Run.

The canonical multimodal AI pipeline — vision encoder, audio encoder, projection layer, language model — became so standard that it stopped feeling like a design choice. Google's Gemma 4 12B has no encoders at all. Vision and audio flow directly into the language backbone as tokens. That's not an optimization. That's a different theory of where perception belongs, and it's probably the direction the whole field is heading.
- Architecture
- Models
Read article →
June 7, 2026 · 5 min read

The Cap is the Canary.

Uber capped AI coding tool spend at $1,500 per engineer per month after burning through their 2026 AI budget in four months. Everyone called it a cost story. I think it's something more revealing: the first honest public valuation of what enterprise AI is actually worth to the people paying for it. The number tells you more than the press release ever would.
- Production
- Enterprise
Read article →
June 01, 2026 · 6 min read

The evidence trail is the product.

Long context is useful, but it doesn't magically make an AI system trustworthy. The harder problem is knowing what evidence the system used, what it ignored, and whether the answer can be replayed. Evidence trails are becoming a core product feature. Nobody wants a confident junk drawer with a chat box.
- Evals
- Retrieval
Read article →
Retrieval

Read article →

May 30, 2026 · 6 min read

The case for Hermes.

Nous Research's Hermes Agent is open-source, self-hosted, and free — but the thing that matters is what it bets on: an agent that actually grows with you, not a stateless harness that forgets every session. Notes from a weekend with it (and why it made me leave OpenClaw).

Agents
Tools

Read article →

May 15, 2026 · 5 min read

The agent reliability gap.

Every other AI feed is an agent demo going viral. The gap between those demos and an agent that survives a Monday morning in production is wider than it looks — and closing it is the real story of AI in 2026.

Agents
Evals

Read article →

May 15, 2026 · 3 min read

Welcome — what this blog is for.

A short intro to what I'm planning to write about here: practical AI engineering, agent architectures, the gap between demos and production, and the trends actually worth your attention.

Meta

Read article →

Notes on building with AI.

The short leash is the sane way to use coding agents.

Agents need a trust layer.

Fable 5 Didn't Fail. Our AI Governance Model Did.

The merge test is the only eval that matters.

Fable 5 Just Raised the Ceiling. The Most Interesting Part Isn't the Benchmarks.

The Data Was Always There.

The Modular Pipeline Had a Good Run.

The Cap is the Canary.

The evidence trail is the product.

The case for Hermes.

The agent reliability gap.

Welcome — what this blog is for.