Niklas Saers
Back to writing

Spec-Driven AI Development

You write a prompt, hit enter, and watch the console light up green. Code pours out. It feels like magic. Then you try to maintain the thing a week later, and you realize your shiny new AI agent just built a leaking bucket.

Marc Brooker nailed this dynamic on Software Engineering Radio Episode 710. AI compresses the time we spend actually typing logic. That sounds incredible, but it warps your working day.

If your old routine was spending 70% of your time typing and 30% thinking about architecture, AI completely inverts the ratio. Suddenly, you're spending half your week grading homework from a machine that hallucinated three different API wrappers for the exact same service.

To survive this, you have to write a rigid specification before you ever fire up the agent. Brooker calls it spec-driven development.

The spec isn't a loose memo. It's a machine-readable contract sitting right in your monorepo. It draws a hard boundary around the work. If it's not explicitly in the spec, the agent has to treat it as undefined territory and stop to ask.

You can't get away with sloppy paragraphs anymore. You need concrete flows. I write these in Gherkin syntax, stripping back the ambiguity: user connects, system drops a token, database records the audit entry.

You tie those flows to invariant rules. Think along the lines of property-based testing, where you define a universal truth the software must never break.

Once the spec is solid, you split the work into two parallel tracks. Your builder agent reads the Markdown file and writes the feature. Your verifier agent reads the exact same file and blasts the implementation with thousands of randomized inputs.

The spec becomes the center of gravity. It's a living prompt for the builder, a constraint map for the verifier, and a source of truth for the human reviewer. When a user trips over a bug, you update the spec. The spec breaks the tests, the builder fixes the code, and the loop closes.

But none of this works if you overwhelm the agent. Managing context windows is now the hardest part of our jobs. Dumping your entire codebase into an LLM is like handing a barista your autobiography before ordering an espresso.

You have to enforce strict anti-noise design. Build modular interfaces. Tie your spec to those clean, sanded-down edges. Give your agents the tools to semantically search the repo so they only pull the exact snippets they need.

Skipping the specification phase just automates the creation of technical debt at warp speed. But if you nail the spec, you get an artifact that prevents regressions and keeps your agents on a very short leash.

Margin for error? Zero.