Agents · Engineering

The short leash is the sane way to use coding agents.

I keep seeing the same coding-agent debate walk into the room wearing a different hat.

One version says AI-authored code is the future and anyone resisting it is just protecting the candle industry from electricity. Another version says AI code is a liability grenade with syntax highlighting.

Both versions are too neat.

The useful question is not whether AI should write code. It already does. The useful question is whether a team can explain, test, maintain, and own the code after the agent leaves the chat window and wanders off to confidently solve someone else’s problem.

That is where the short leash comes in.

I do not mean a leash as punishment. I mean a leash as engineering hygiene. The kind where you love the dog, but you also know it will chase a squirrel into traffic if you let vibes run the household.

Agents need boundaries for the same reason production systems need alerts, deployment gates, and people who say “no” in meetings. Capability is not the same thing as judgment.

The backlash is really about unowned code

Godot recently drew a hard line against AI-authored code contributions, with the concern captured bluntly in a PC Gamer report: maintainers cannot trust heavy users of AI to understand their code well enough to fix it.

That sounds like an anti-AI position if you squint at it from a LinkedIn comment section. I read it differently.

It is an ownership position.

Open source maintainers are not running a code-shaped landfill. They inherit the consequences of every merged change: bugs, edge cases, platform quirks, confused users, security reports, build failures on an operating system nobody on earth admits to using, and the delightful little surprises that arrive three months later wearing a fake mustache.

If a contributor cannot explain why the code works, the maintainer effectively becomes the author.

That is not collaboration. That is a drive-by inheritance event.

The same problem shows up inside companies, just with different vocabulary. Instead of maintainers, you have product teams. Instead of pull requests from strangers, you have internal agent-generated patches. Instead of public issue trackers, you have incident reviews where someone eventually says, “Wait, who wrote this?” and the room performs a brief archaeological dig.

Somewhere in the sediment layer, between “quick fix” and “temporary helper,” there is a prompt.

Nobody wants to be the person maintaining a clever diff that arrived without context. Especially if the clever diff is clever in the way raccoons are clever. Technically impressive. Morally chaotic. Somehow inside the ductwork.

The short leash is a workflow, not a personality disorder

The short-leash approach, described well in the Short Leash AI Method, is not “never let the model touch code.” It is closer to pair programming with a very fast junior developer who has read every manual, occasionally invents a basement, and needs clear boundaries.

You keep the work close. You keep the scope small. You inspect the output before momentum turns into fog.

That usually means:

  • Small tasks. Ask the agent for one focused change, not an entire feature arc with database migrations, UI polish, three new abstractions, and a philosophical stance on CSS.
  • Frequent checkpoints. Review the diff early, while the blast radius is still measured in files, not regrets.
  • Human-owned decisions. Let the agent propose options, but keep architecture, tradeoffs, and acceptance criteria with the person accountable for the system.
  • Testable outputs. Every agent-produced change should move through the same tests, reviews, and deployment gates as human code. The compiler does not care who had the idea.

This is not anti-autonomy. It is pro-feedback.

Long-leash workflows feel magical when they work. “Build the feature” becomes a slot machine with a progress spinner. Sometimes you get a working implementation. Sometimes you get a confident tour of a solution that almost compiles and somehow added a second logging framework.

The short leash gives the agent less room to wander into performance art.

It also keeps the human engaged at the moment engagement matters most: before the solution hardens. Once a large agent-generated branch exists, the review problem changes. You are no longer evaluating a proposed idea. You are excavating a small civilization. There are customs. There is architecture. There may be a temple to an unnecessary helper function.

Small changes keep the conversation alive.

The trap is confusing activity with leverage

Coding agents create motion. Lots of it. Files change. Tests appear. Comments multiply. The terminal scrolls like it is auditioning for a medical drama.

Motion feels like progress because, in normal software work, progress often leaves footprints. The problem is that agents are exceptionally good at leaving footprints. A lot of footprints. Occasionally in circles.

That is where teams get fooled.

An agent can produce a large diff faster than a human can form a mild concern about it. But a large diff is not automatically leverage. Sometimes it is just inventory. And inventory has carrying costs: review time, mental overhead, regression risk, future maintenance, and the tiny emotional tax of opening a file and whispering, “What are we doing here?”

The short leash forces a better question: did this change reduce the total burden on the team?

If the answer is yes, keep going. If the answer is no, the agent did not accelerate engineering. It generated homework with better marketing.

This matters because the ROI story around coding agents can get sloppy very quickly. It is easy to measure output. It is harder to measure rework. It is easy to count generated lines. It is harder to count the senior engineer quietly spending Friday afternoon convincing the codebase to stop being weird.

And Friday afternoon is where the truth lives.

The best teams will use agents more carefully, not less

There is a tempting but lazy binary here: either embrace coding agents fully or ban them before they turn the repo into soup.

Serious teams will do neither.

They will build working habits around the parts agents are good at: first drafts, narrow refactors, test scaffolding, API exploration, migration sketches, repetitive glue work, and “please explain this haunted function before I touch it.”

They will also put friction around the parts agents are bad at: unstated assumptions, broad architectural changes, security-sensitive flows, dependency churn, and anything where “looks right” is being asked to substitute for “is right.”

The pattern is not complicated. It is just annoyingly adult:

  • Use the agent to accelerate thought, not outsource judgment. Speed is helpful. Abdication is not.
  • Require the human to narrate the change. If the developer cannot explain the diff, the diff is not ready.
  • Prefer reversible steps. Small patches can be inspected, tested, and rolled back. Giant agent branches become weather systems.
  • Measure rework. If a team saves two hours generating code and spends three hours untangling it, that is not productivity. That is an incident report doing cardio.

This is where enterprise AI programs should pay attention. The difference between “AI coding is working” and “AI coding is making noise” will not show up in the demo. It will show up in review latency, defect rates, incident causes, onboarding confusion, and whether senior engineers quietly stop trusting the codebase.

Trust erodes in small diffs.

That should make leaders a little uncomfortable. Not scared. Uncomfortable. The useful kind. The kind that makes you ask whether your rollout has review patterns, ownership rules, and escalation paths, or whether you accidentally bought a fog machine and called it transformation.

Ownership is the unit of engineering

I do not think the future belongs to teams that proudly refuse AI help. That feels like refusing autocomplete because monks had better handwriting.

But I also do not think the future belongs to teams that let agents roam the repo like an unsupervised Roomba in a room full of cables.

The durable pattern is human-owned, agent-assisted development. The human sets the intent. The agent accelerates the work. The team keeps the standards. The tests still run. The reviewer still reviews. The person merging the code can still explain why it belongs.

That last part matters most.

Because production does not care that the code had an interesting origin story. Production does not read the prompt. Production receives the artifact, executes it with the warmth of a vending machine, and then reports whatever happened next.

Code is not done when an agent produces it. Code is done when a team can own it on a random Tuesday, under pressure, after the original prompt has vanished into the scrollback void.

The short leash is not a lack of ambition. It is the operating model for taking the tool seriously.

Because the real risk is not that AI writes code.

The real risk is that nobody does.

Comments 0

No login needed. Be kind, stay on topic, no profanity.