Spec Kit vs BMAD vs OpenSpec: Choosing an SDD Framework in 2026

If the AI writes the code, the spec is the artifact. That's the entire thesis. Everything else is tooling.

TL;DR

Pick based on your codebase:

Existing codebase, adding features → OpenSpec
New project from scratch → Spec Kit
Compliance, audit trails, regulated → BMAD
Unsure? → OpenSpec. It tends to minimize adoption friction compared to the others, works on both greenfield and brownfield, and won't lock you in.

If that's all you needed, stop here. The rest is the reasoning.

Disclosure

I haven't run all three of these in production. This is structural analysis: docs, case studies, design choices, and community reports — not a veteran's field guide. I'll flag where I'm extrapolating versus citing something documented. If you've shipped with any of these, your experience outranks this article.

What SDD actually is (and isn't)

Spec-Driven Development isn't a 2025 invention. BDD, formal requirements docs, ICDs — all versions of the same idea. What changed is that LLMs turned natural-language specs into something you can execute. A Markdown file plus Claude or GPT produces working code. No custom DSL, no code generator, no parser.

The workflow, across all frameworks, is roughly:

Constitution — standards that apply to every change (tests, stack, security).
Specification — what and why.
Design — how, architecture decisions.
Tasks — ordered implementation units.
Implementation — the agent executes; you review.

Steps 1–4 used to fit in a three-line Jira ticket because writing them properly cost more than the code itself. That calculation flipped. AI generates a draft spec in minutes. But "draft" is doing work in that sentence — catching missing edge cases, validating assumptions, and detecting hallucinations still costs real human time. LLMs collapsed the cost of drafting, not the cost of quality. The difference matters.

The economic shift

Old pattern: planning is compressed. Tickets are thin. The real spec is in the developer's head, in Confluence pages nobody updates, in Slack threads from two sprints ago. Code is the expensive part, so you optimize for coding time.

New pattern: code is cheap. AI writes it. The expensive thing is now intent — making sure the AI builds what you actually need. Suddenly an exhaustive spec with acceptance criteria, Gherkin scenarios, error-handling sections, and architectural constraints is worth producing because the AI uses it and the cost of generating the draft is trivial.

Spec is the source of truth. Code is the build output. That's the inversion. The frameworks below are different implementations of the same idea.

The frameworks

Spec Kit (GitHub)

GitHub's open-source SDD toolkit. CLI called specify, with 90K+ GitHub stars at the time of writing. Integrates with a broad range of AI coding agents (the project lists 30+), including GitHub Copilot, Claude Code, Cursor, and Gemini CLI.

The workflow uses slash commands:

/speckit.constitution → project principles
/speckit.specify      → feature specification
/speckit.plan         → technical design
/speckit.tasks        → implementation breakdown
/speckit.implement    → agent executes

The constitution.md is the piece worth understanding. It's not just a rules file — it's the document every subsequent spec references. Your testing strategy, your security posture, your stack constraints, your error-handling conventions. Write it well once and it multiplies across every feature. Write it badly and you get exactly the chaos documented below.

Spec Kit is greenfield-optimized. Its branch-per-spec model treats specs as change artifacts, not long-lived capability contracts. On a mature codebase, that means every feature starts with reverse-engineering and the artifacts don't compound into system-level documentation. Microsoft Learn now has a brownfield module for Spec Kit, and presets help, but the underlying model is still change-scoped. If your codebase is 3 years old and you want specs that describe the system, not just the next PR, this is a friction point.

Getting started. Requires Python 3.11+ and uv. Pin a release tag for stability (check Releases for the latest):

uv tool install specify-cli --from git+https://github.com/github/spec-kit.git@v0.7.2
specify init my-project --ai claude

BMAD-METHOD

BMAD ("Breakthrough Method for Agile AI-Driven Development") is a different animal. It's a multi-agent framework with 43K+ stars at the time of writing — 12+ AI personas (Analyst, PM, Architect, Scrum Master, Developer, QA, UX Designer...) modeled as Markdown "Agent-as-Code" files. v6 hit stable recently after an extended alpha, with features like Scale Adaptive workflows, BMad-CORE engine, and a builder toolkit for custom agents.

The pipeline:

Analyst → PM (PRD) → Architect → Scrum Master (stories) → Developer → QA

Each handoff is a versioned artifact. Audit trail out of the box. Every decision is traceable from requirement to PR.

That structure is impressive when your deployment target is a SOC 2 audit, a consulting deliverable, or a multi-team platform. For a two-person startup, it's a trap. Here's why: BMAD is a process multiplier, not a process creator. If your team already thinks in PRDs, architecture docs, and sprint stories, BMAD will accelerate that and make it auditable. If your team doesn't have structured processes, BMAD won't conjure them — it'll reproduce your chaos across seven agents and you'll spend more time debugging agent coordination than writing code.

Concrete costs people forget about: more agents means more tokens per cycle. Handoff failures between personas are a real debugging surface. And when the Architect agent makes an assumption the PM agent didn't document, the Scrum Master propagates it into stories, and the Developer implements it confidently. You find out in QA, or worse, in production. The pipeline is only as good as the weakest handoff.

Getting started. Requires Node.js v20+. The interactive installer handles module selection and IDE-specific file generation:

npx bmad-method install

OpenSpec (Fission-AI)

Lightweight, brownfield-first SDD. npm package (@fission-ai/openspec, 77K+ downloads). GitHub repo. Works with 25+ AI assistants via slash commands and an AGENTS.md file that acts as a "README for robots."

OpenSpec's core idea is change-centric specs:

openspec/
  project.md                  ← project context
  specs/                      ← current system behavior
  changes/
    add-dark-mode/
      proposal.md             ← what's changing and why
      design.md               ← technical approach
      tasks.md                ← checklist
      specs/                  ← deltas: ADDED / MODIFIED / REMOVED

The delta markers are the thing that makes this work for existing codebases. You're forced to categorize every change as ADDED, MODIFIED, or REMOVED relative to what exists. That discipline prevents the agent from hallucinating new requirements onto existing behavior. When the change ships, the deltas merge into the main specs, so your system-level documentation compounds over time. That's the right model for brownfield, and it's a model Spec Kit doesn't natively emphasize.

Limitations are real: specs don't self-update during implementation. If the agent drifts (and it will — more on that below), you resync manually. There's no multi-agent orchestration; a single agent runs the whole flow. And for simple tasks — a bug fix, a copy change — the overhead of a full proposal-design-tasks cycle can feel like performing surgery with a forklift.

Getting started. Requires Node.js >= 20.19.0. Install globally and initialize inside your project:

npm install -g @fission-ai/openspec
openspec init

A different category: SDD as a product, not a framework

The three frameworks above are CLI tools you bolt onto your existing editor. There's another approach: products that bake SDD directly into their own environment. Two worth tracking:

Kiro (AWS) is a VS Code fork with spec-driven development built into the IDE itself. You describe a feature, Kiro generates requirements in EARS notation, produces a technical design, and breaks it into trackable tasks — all inside the editor, no CLI involved. Powered by Claude Sonnet via Amazon Bedrock, $20/month. If you're AWS-native and want the tightest possible integration between specs and implementation, Kiro removes the seams. The tradeoff is vendor lock: you adopt their IDE, their model pipeline, their ecosystem.

Augment Intent is a standalone desktop app (Mac, public beta as of early 2026) built around "living specs" — specifications that update themselves as agents work, solving the drift problem the CLI frameworks leave manual. Intent uses a coordinator/specialist/verifier agent architecture where multiple agents execute in parallel on isolated git worktrees, all sharing the same evolving spec. Pricing is credit-based ($20–200/month depending on tier), and it supports BYOA (Bring Your Own Agent — Claude Code, Codex, OpenCode) alongside Augment's own agents. The living-spec approach is the most interesting architectural bet in this space right now: if it works reliably, it makes the manual reconciliation step described later in this article unnecessary. It's still beta, though, and independent production validation is limited.

These aren't competitors to Spec Kit, BMAD, or OpenSpec — they're a different layer. A CLI framework gives you a spec workflow inside the tools you already use. Kiro and Intent ask you to move into their environment. Whether that trade is worth it depends on how much friction you're willing to accept for tighter integration.

Roll your own

If you already have project context documented (stack, standards, workflows), four custom slash commands and an AGENTS.md get you surprisingly far:

.claude/commands/
  plan-feature.md       → produce spec + design from intent
  break-into-tasks.md   → decompose spec into tasks
  implement.md          → execute one task within your conventions
  review-spec.md        → critique the spec for gaps
AGENTS.md               → project rules

About 80% of what the established frameworks do, shaped to your workflow. The other 20% — Spec Kit's presets, OpenSpec's delta markers, BMAD's agent handoffs — is the reason people use frameworks. Start custom if your workflow is idiosyncratic enough that the frameworks fight you. Otherwise, pick a framework and extend it.

The spec drift problem (and what to do about it)

This gets its own section because it's the single most common failure mode and none of the current frameworks handle it well.

Here's what happens: you write a spec. The agent starts implementing. Partway through task 3 of 8, the agent encounters something the spec didn't anticipate — a library API that doesn't work as expected, a database constraint that forces a different approach, an edge case the spec didn't cover. The agent adapts. It writes working code that solves the real problem. But the spec still describes the planned approach, not the actual one.

Now the spec is fiction. The next engineer who reads it (or the next agent that uses it as context) gets misled. As Amelia Wattenberger put it: a stale design doc misleads the next engineer who happens to read it; a stale spec misleads agents that don't know any better, and they'll execute a plan that no longer matches reality without flagging anything wrong.

This isn't a corner case. It's the default behavior.

What to do about it. There's no automation that fully solves this today. The practical approach is a post-implementation reconciliation step:

After the agent finishes implementation, run a comparison pass: "Read the spec. Read the code. List every place where they diverge."
For each divergence, decide: was the agent's adaptation correct? If yes, update the spec. If no, fix the code.
Commit the updated spec alongside the code diff.

OpenSpec has /opsx:sync for this. Spec Kit recently added a drift reconciliation extension (/speckit.reconcile). In BMAD, you'd do it manually via a QA agent review. None of these are automatic — you have to trigger them, and you have to review the output. That's overhead, and it's the overhead that most teams skip until their specs are six months out of date.

The emerging approach — what Augment Intent is built around — is bidirectional spec updates: agents write changes back to the spec as they work. That closes the loop in theory. Whether it holds up reliably across complex codebases is the open question, and it's the single biggest feature gap separating the CLI frameworks from the next generation of SDD tooling.

When the constitution fails: a real example

EPAM published a detailed case study of using Spec Kit on a brownfield codebase. One finding stands out. Their constitution.md contained an explicit rule: "NO try-catch blocks in route handlers — use global middleware." The rule was unambiguous. The agent ignored it and added try-catch blocks in router handlers anyway.

This isn't a Spec Kit bug. It's a model behavior issue: the agent was pattern-matching against what it had seen in millions of codebases where try-catch in handlers is the norm, and the constitution's single-line prohibition wasn't enough to override that prior. The fix was obvious in hindsight — reinforce the rule in the constitution with context explaining why (middleware-based error handling enables centralized logging and consistent error responses), not just what. Models follow "why" better than "don't."

The deeper lesson: a constitution isn't a config file. Writing "don't do X" isn't enough. You need "don't do X because Y, and instead do Z." The constitution that works is the one written as if you were onboarding a smart but literal-minded junior developer who has never seen your codebase. Because that's exactly what the agent is.

Mistakes that will cost you a sprint

SDD as waterfall. Gojko Adzic flagged this when Spec Kit launched. He's right. A 50-page spec you freeze before implementation is not SDD — it's BDUF with Markdown. Specs should change during implementation. The iterative loop is the point.

The three-page spec with no edge cases. Looks thorough. Covers the happy path beautifully. Says nothing about what happens when the input is malformed, the downstream service 500s, or the user's session expires mid-request. The agent implements exactly what's specified. You ship a demo. It breaks the first day in production.

Green tests, wrong behavior. Every acceptance criterion passes. Tests are green. But the solution doesn't actually solve the user's problem. Acceptance criteria are a proxy for intent, not intent itself. Add a "Why this matters" and "Non-goals" section to every spec so the agent stays grounded in the problem, not just the checklist.

Framework shopping. You can burn a sprint evaluating four frameworks. You will learn nothing that four weeks of actual use on real tickets wouldn't teach you faster. Pick from the TL;DR. Start. Reconsider in a month if you need to.

Closing

The cost of drafting specifications has collapsed. Tests, tickets, architecture docs, ADRs — artifacts that used to get skipped because they cost too much time are now cheap to produce in draft form. The review work didn't go away — but the activation energy for producing the document in the first place did. That's the change, and it's permanent regardless of which framework wins.

One thing worth saying plainly: SDD tooling is still early-stage. Patterns are emerging, not standardized, and most teams are still figuring out what "good" looks like in practice. The frameworks in this article are the best available answers right now — not settled ones.

If in doubt, start with OpenSpec. Invest an hour in your constitution. Wire up your MCPs so the agent can open PRs, update tickets, and run tests. And when the spec drifts from the code — not if, when — take the thirty minutes to reconcile them. That's where SDD succeeds or fails in practice, and it's the part no framework will do for you.

If you've shipped with any of these, the war stories are more useful than the docs. What broke?

DE

Source

This article was originally published by DEV Community and written by William Schnaider Torres Bermon.

Read original article on DEV Community

Back to Discover