Most of the conversation about AI-native development is about getting the AI to write better code. Better prompts, better context, better models. I spent the first stretch of building KarmaClock there too.

It was the wrong place to look. The code was never the hard part.

The hard part is what happens around the code. An AI will change a behavior in thirty seconds and leave every description of that behavior silently stale. The spec still says the old thing. The docs still say the old thing. The README still says the old thing. Nothing is broken. Nothing throws an error. The code runs. And the gap between what your software does and what your software says it does widens by one more notch.

The faster the code moves, the faster that gap opens. Speed isn't free. It's borrowed against the integrity of everything that describes the system.

I wrote a while back that code isn't the artifact engineers produce anymore, the spec is. That was the claim. This post is the obvious follow-up question: if the spec is the real artifact, do you actually treat it like one? Because if a stale spec is just an inconvenience you'll fix later, you don't believe your own thesis. A stale spec should be a regression. And regressions break the build.

So I built a gate to make that true.

The gate

The mechanism is a pre-commit hook, but not the ordinary kind that runs when you type git commit. It runs when my AI coding agent tries to commit. It intercepts the commit before it happens, reads what's staged, and asks one question: does this change touch behavior?

The logic is deliberately blunt. If the staged files include anything in src/services/, src/stores/, src/components/, the app routes, a dependency manifest, or a database migration, the kinds of files that change what the software actually does, then the spec files (README.md, CLAUDE.md) had better be staged in the same commit. If they are, the commit passes. If they aren't, the hook exits with a failure code and the commit does not happen.

Not a warning. Not a yellow squiggle you can ignore on a deadline. The commit is refused.

There's one escape hatch, and it's there on purpose: a [skip-spec-check] flag you can put in the commit message to override. A gate with no override gets ripped out in a week, the first time it blocks something genuinely trivial like an internal refactor with zero visible change. But a gate with a deliberate, written-down override is one you actually live with, because skipping it becomes a choice you record rather than a discipline you quietly abandon. The override isn't a weakness in the gate. It's what keeps the gate alive.

That's the whole idea. Behavior changes ship with their spec, in the same commit, or they don't ship. Drift isn't allowed to accumulate, because the moment it tries to, the build stops.

I was pleased with it. I shouldn't have been, not yet.

The hole

A few releases later, I ran a separate audit across my public-facing documentation, the privacy policy and support pages the app links out to. I'd removed a feature two releases earlier: cloud sync. Users used to be able to sign in and have their data sync across devices. I'd pulled it. The data now lives only on the device.

The support page hadn't gotten the message.

It still had a FAQ answering "I'm signed in, does Reset also wipe my cloud data?", a question about a feature that no longer existed. It still told users that if they signed in, their data was "also stored on a server." That wasn't stale phrasing. That was my public-facing documentation making a false claim about where users' personal data lived, to users, and to any App Store reviewer who clicked the link.

And here's the part that mattered more than the bug: my gate hadn't caught it.

It couldn't have. Documentation files live in docs/. The hook watches src/, dependencies, and migrations, the code-shaped surfaces. It was never looking at docs/, because most edits there genuinely don't need a spec update, and a hook that fired on every doc typo would be noise. The gate was working exactly as designed. The design had a blind spot the size of my entire public documentation surface.

This is the thing nobody tells you about building discipline into a system: your first gate will be confident and incomplete. It will catch the class of drift you imagined when you built it, and it will wave through every class you didn't.

The second gate

So I stopped treating "the gate" as a thing and started treating it as a layer.

The hook is one layer: fast, deterministic, commit-time. It catches code that changes without its spec, the instant it tries to land. What it can't see is drift that lives entirely in prose, a doc that quietly describes a world that no longer exists.

That needs a different kind of check: not deterministic pattern-matching but read-and-compare. A pass that reads the documentation end-to-end against the current state of the code and asks, for each claim, is this still true? Does the privacy policy describe the data flow the code actually implements? Does the support page reference UI that still exists? Do the docs even agree with each other? It's slower. It requires judgment, not just a file-path match. And it runs before a release, not on every commit, because that's the cadence the slower check can afford.

One gate catches code lying about itself. The other catches docs lying about the code. They sit at different points, run at different speeds, and cover for each other's blind spots, because truth doesn't go stale in just one way, and a single gate that pretends otherwise is the most dangerous kind: the one you trust.

What I actually believe now

I didn't set out to build defense-in-depth. I set out to build a clever hook, congratulated myself, and then watched it sail a false privacy claim straight through to my users. The second layer exists because the first one wasn't enough, and I only know that because the first one failed somewhere I could see.

I'm not writing this as someone who has spec discipline figured out. I'm writing it as someone roughly a dozen releases into figuring it out, who has the scar to show for one specific gap and is fairly sure there are others he hasn't hit yet. That's the honest state of it.

But the principle underneath has held up every time I've tested it: if the spec is the artifact, drift is a regression, and regressions have to be able to stop the build. Not nag. Not warn. Stop.

Building one gate that does that taught me something I didn't expect, though. Once you've made drift expensive at the moment of commit, you start noticing all the other moments where speed quietly tempts you to skip the discipline, and you start wondering whether the same move works there too.

It does. That's the next post.

Keep Reading