Here’s a scene every working programmer has lived.
It’s 2 AM. Something is broken in production. You’re staring at a commit from eight months ago. The message says “refactor auth logic.” That’s it. The PR has two approvals—thumbs-up emoji, no comments. The Slack thread where the team debated the approach was in a channel that got archived when the org restructured. The person who wrote it left the company in June.
You don’t just need to understand what this code does. You need to understand why it exists in this form—what alternatives were considered, what constraints shaped it, what tradeoffs were accepted.
That information isn’t missing. It was never captured.
It lived in someone’s head. It passed through ephemeral media. It evaporated.
We’ve always treated code as the primary artifact of software development. Everything else—the reasoning, the tradeoffs, the rejected approaches—is secondary. Informal. Optional. We write it down when we feel like it, in whatever format is convenient, with no expectation that it will survive contact with time.
That was a defensible tradeoff when humans wrote all the code. The alternative—capturing every decision rigorously—was more expensive than the knowledge loss it prevented.
Humans optimized for shipping. Documentation was a tax.
It breaks completely when agents enter the picture.
The Real Artifact
When an agent writes code, something fundamental shifts.
The code is no longer where decisions are made. It’s where decisions show up.
The real work happens in the back-and-forth. You try something, it does something slightly wrong, you tighten the prompt, add a constraint, back it out, try again. That loop is the engineering.
The code is just what falls out of that loop—the way a compiled binary falls out of source.
Once you see it, you can’t unsee it:
The conversation isn’t attached to the commit.
The conversation is the commit.
The code is derived.
Which means the interesting part of any line of code isn’t the line itself. It’s the path that led there. The moment someone said “we can’t do it that way because of X,” or “we tried that last year and it blew up,” or “this only works if we assume Y.”
That’s the thing we’ve been throwing away.
Editing Compiled Binaries
If the conversation is the source and the code is the compiled output, what does it mean to manually edit the code?
You’re editing compiled binaries.
You’re bypassing the process that produced the artifact. You’re breaking the provenance chain. The code no longer traces back to a decision record—it traces back to a person’s fingers on a keyboard, with whatever reasoning they had in their head at the time.
That reasoning will be gone.
And if the conversation is gone, you’re left trying to reverse-engineer intent from what is basically obfuscated output.
Manual edits aren’t always wrong. Sometimes you need to patch a binary.
But you should recognize them for what they are:
An escape hatch, not a methodology.
In a system built on conversation-driven development, every manual edit introduces a new kind of debt:
something like provenance debt — code that just… appeared, with no way to ask why it’s there
The Constraint
If the conversation is the source and the code is just the output, then there’s an obvious question:
Why are humans allowed to edit the output at all?
If you care about provenance—about being able to trace every line of code back to a decision—then manual edits are a problem. They break the chain. They introduce code that has no record, no explanation, no origin you can inspect.
So you end up in an uncomfortable place:
You don’t just prefer that agents write the code.
You require it.
Not because agents are better programmers.
But because they are the only way to guarantee that every change passes through a process that can be captured, inspected, and replayed.
Humans don’t stop participating. They just move up a level.
They define intent. They shape constraints. They review outcomes.
But they don’t write the code directly anymore.
Because the moment they do, the system forgets how it got there.
Process Becomes Enforceable
Every software team has rules.
No code without tests.
No merge without review.
At least, in theory.
In practice, those rules bend. Writing the “why” is a tax we stop paying the moment the coffee wears off or someone says “can we just ship this?” in Slack.
Each shortcut makes sense at the time.
Stack enough of them together and you get a system nobody really understands anymore.
When code has to go through an agent, that changes.
You can make it impossible to produce code without tying it back to something—some requirement, some explanation of what it’s supposed to do. You can require tests. You can require that there is a reason, somewhere, that can be pointed to.
And because the agent is doing the work, you actually get it. Every time.
Of course, agents can fake this. They can generate tests that don’t test anything, explanations that don’t explain.
So you need a way to verify the constraints are actually being met.
The difference is: that verification can also be automated.
The Responsibility Inversion
At first, this sounds backwards.
Letting agents write all your code feels reckless. Taking humans out of the loop feels like giving up control.
But look at what you get if you don’t.
Code appears with no history. Decisions live in someone’s head. The reasoning is gone the moment it matters.
That’s the system we’ve been calling “safe.”
If agents are the only way to guarantee:
that every change has a reason
that the reason is recorded
that the process can be inspected and replayed
then letting humans write code directly isn’t caution.
It’s accepting that your system will drift out of understanding over time.
The safer system is the one that doesn’t allow untraceable changes at all.
Version Control for Intent
Git answers one question well: what changed?
It answers the important questions poorly:
why did it change?
what alternatives were considered?
what assumptions were active?
what would make this wrong?
When every commit is linked to the full conversation that produced it, version control starts to look less like a history of files and more like a history of decisions.
You can trace a piece of code back to the moment someone decided it had to exist. You can see what was tried before that. You can see what assumptions were in play.
You’re not just diffing code anymore. You’re diffing reasoning.
There’s a wrinkle here.
Unlike source code, the mapping from conversation to code isn’t perfectly deterministic. Run the same prompt against a different model, or even the same model at a different time, and you may get a different result.
That doesn’t invalidate the idea. It just means the conversation alone isn’t enough.
The “source” becomes the conversation plus the execution context: the model, the tools, the constraints, the evaluation criteria. A serialized state, not just a transcript.
Which is another way of saying: we don’t just need version control for code. We need version control for the entire process that produces it.
The Substrate Problem
If conversations are the source of truth, they need infrastructure worthy of that role.
Today’s tools—Slack, GitHub, email—are:
ephemeral
fragmented
structurally disconnected from the artifacts they produce
What this looks like when it works is less glamorous than it sounds.
On Monday morning, you don’t start by grepping through a 10,000-line file that hasn’t been touched in three years. You don’t start by guessing.
You start by asking the system: “Why did we stop using the Redis cache here?”
And it can answer. Not with a guess, but by pointing you to the conversation where that decision was made. What broke. What was tried. What didn’t work.
We need a substrate where:
conversations stick around
you can point to them
they’re tied directly to the code they produced
agents operate inside that system, not bolted on the side
Where the whole process is:
replayable
inspectable
auditable
Without that, you’re relying on convention.
And convention is exactly what failed us.
The Shift
We’ve been treating code as the thing that matters, and everything that produced it as exhaust.
That worked, as long as humans were doing the work and we were willing to forget most of it.
It stops working the moment the process becomes the only place the real decisions are made.
At that point, the code isn’t the source of truth anymore.
It’s just what fell out the bottom.
If you want to understand a system, or trust it, or change it safely, you don’t start with the code.
You start with whatever produced it.
And increasingly, that’s the conversation.
In that world, writing code directly starts to feel less like craftsmanship and more like bypassing the system entirely.