Part 1: One Spec To Rule Them All

One Spec to rule them all, one Spec to find them, one Spec to bring them all, and in the darkness bind them.

This is the first post in a series about spec-driven development. Not which tool to use or how to get started, but what I learned after living with a spec long enough to hit the problems that nobody writes about yet. I do not have all the answers and I am not trying to be a guru. I am sharing what worked, what did not, and what I am still figuring out. If you have been down this road too, I would love to hear your experience in the comments.

Spec-driven development is having a moment. Microsoft shipped a spec-kit and wrote about it on their developer blog. JetBrains published a dedicated series on using a spec-driven approach with AI coding tools. Tools like Kiro and CodeSpeak are building entire development models around the idea that specs, not code, are the primary artefact. Martin Fowler's blog has a detailed breakdown comparing SDD tools. The term is everywhere.

Most of this content is useful. It explains what SDD is, compares approaches, and helps teams get started. But almost all of it is written from the outside looking in, by people who adopted the practice recently or are building tools around it. Very little comes from engineers who have been doing it long enough to hit the problems that only show up later.

I have been writing and maintaining a spec across nine SDKs for three years. I started before SDD had a name, before LLMs made it a topic of conversation, and before any of the current tooling existed. I have a lot of thoughts about what makes a spec genuinely useful over time and what makes it quietly fall apart. This series is my attempt to think through that in public, and hopefully start a conversation with people who are navigating the same problems.

What a Spec Actually Is (And What It Is Not)

A lot of the current SDD conversation frames the spec as a disposable implementation plan for an LLM. You write it, the agent consumes it, code comes out, job done. Some tools are explicitly built around this model. The spec is an intermediate artifact, a way of communicating intent to an AI before it disappears into the generated code.

That framing is not wrong for certain use cases. But it is a narrow way to think about something with much broader value.

The definition I find more useful, and the one this series is built around, is this:

A spec is a contract between implementations.

It does not describe code. It defines behavior: what a feature should do, how it should respond to edge cases, what a developer can rely on regardless of which language or platform they are using. The moment you have more than one implementation of the same thing, you need something that sits above all of them and answers the question: what does correct actually mean here?

Tests do not answer this. Tests verify that your code behaves the way you wrote it. They say nothing about whether the behavior you wrote was the right one. A test can pass in nine SDKs while each of them does something subtly different, and nothing in your CI pipeline will flag it.

Code reviews do not answer it either. A reviewer working inside a single codebase has no way to know whether this implementation matches what the mobile client does, or what the desktop client does, or what a developer reading your docs will reasonably expect.

The spec is the only artifact that exists at the level of behavior rather than implementation and it can be the referee when two implementations disagree.

Tips for Keeping Your Spec Useful Over Time

Writing a spec is the easy part. Keeping it honest with reality over months and years is where most teams quietly struggle. Before closing this first post I want to share some practical foundations that have helped us keep our spec useful over time.

Keep your spec in version control, close to the code

A spec that lives in a wiki or a shared document will drift. It needs to be versioned alongside the code it describes, treated with the same discipline as a codebase. If a spec change does not go through a pull request, it will quietly stop reflecting reality. Nobody intends this to happen. It happens anyway, gradually, and by the time you notice, the spec has become historical fiction.

Give every behavior a unique stable ID

This is the single most practical thing I can pass on. Every spec entry describing a distinct behavior should have a unique identifier. At Ably we use abbreviations of feature areas combined with alternating numbers and letters for nesting levels. So RTP is Realtime Presence, RSP is REST Presence, and a deeply nested entry might look like RTP2a5c6b7. These IDs let you reference spec entries directly from tests, from code comments, from pull requests, from conversations. Instead of describing a behavior in prose every time, you point to the ID. Anyone reading the code can trace it back to the contract it implements. This traceability is what separates a spec that is actually used from one that exists only to be consulted.

Use cross-references instead of repeating logic

Duplication in a spec is as dangerous as duplication in code. When the same behaviour is described in two places, they will eventually diverge, and you will have two sources of truth instead of one. The solution is the same as in code: do not repeat yourself. When one spec entry depends on or extends another, reference it by ID rather than restating the logic. This keeps each behaviour defined exactly once, makes the spec easier to maintain, and means that when something changes you update it in one place and the rest of the spec stays coherent.

Versioning/deprecation

If you change what a behaviour does and keep the same ID, you silently break the traceability chain. Tests referencing the old ID now verify the wrong thing, code comments become misleading, and the history becomes unreadable. The discipline of generating a new ID when behaviour changes forces an explicit acknowledgement: this is not a correction, it is a new contract. Old entries get replaced rather than deleted, leaving a paper trail. For example, "This entry has been superseded by RSC25 as of specification version 4.0.0".

Use RFC 2119 requirement language

Ambiguous language in a spec is a slow poison. Words like "should", "must" and "may" mean different things to different people, and when an LLM or a new engineer reads your spec, those differences matter. RFC 2119 solves this cleanly: MUST means mandatory, SHOULD means recommended, MAY means optional. Adopting this convention costs nothing and eliminates an entire category of misinterpretation. When someone asks "is this behaviour required or just a suggestion", the spec answers the question without needing a conversation.