Specification-Driven Development
The methodology at the heart of dark factory development. Instead of writing code, engineers write specifications — precise, machine-readable descriptions of what software should do. The bottleneck shifts from implementation speed to specification quality. Requires a fundamentally different engineering skill set.
Specification-driven development is the practice that separates Level 3+ from Level 0–2. Instead of engineers writing code, engineers write specifications — and agents implement them.
At Level 5, the specification is everything. The spec is the job. Implementation is automated.
What a Specification Looks Like
StrongDM’s approach: markdown files that describe the desired behavior, architecture constraints, and acceptance criteria.
Example structure:
# Feature: User session timeout
## Context
Our session management system currently has no timeout mechanism.
Users who leave sessions open indefinitely can create security risks.
## Desired behavior
- Sessions should expire after 24 hours of inactivity
- Expiry should be tracked server-side, not client-side
- Expired sessions should return 401 on next request
- Users should receive a notification 30 minutes before expiry (via existing notification service)
## Constraints
- Must work with existing PostgreSQL session store
- Cannot modify the session table schema (migration risk)
- Must not break existing session tests
- Performance: expiry check < 5ms per request
## Acceptance criteria
- [ ] Session expires after 24h inactivity
- [ ] 30-min warning notification fires correctly
- [ ] Expired sessions return 401, not 200 or 403
- [ ] Existing session tests all pass
- [ ] Load test shows < 5ms overhead per request
The agent reads this, implements it across multiple files, runs the acceptance tests. The engineer evaluates whether the criteria were met.
The New Bottleneck
At Level 0–2, the bottleneck is typing speed, cognitive load, and knowledge of APIs. At Level 4–5, the bottleneck is specification quality.
A bad specification produces bad software — not because the agent is bad at coding, but because the agent is good at implementing exactly what you said rather than what you meant.
The skills that matter:
- Domain understanding: You can only specify what you understand
- Precision: Ambiguous specs produce ambiguous software
- Edge case enumeration: Agents implement the happy path unless you specify the unhappy path
- Systems thinking: Cross-cutting concerns must be explicitly specified
- Customer understanding: The spec must capture user intent, not just technical behavior
Specification vs. Testing
The external scenario testing methodology (see External Scenario Testing) complements specification-driven development:
- The spec says what the software should do
- The scenarios verify that it actually does it, from the outside, in ways the agent didn’t see during development
Together, they create a feedback loop that agents can run autonomously: implement the spec, run the scenarios, iterate until green.
Learning to Specify
The hardest transition for experienced engineers: you’ve spent your career building the judgment to make implementation decisions in real-time. That judgment doesn’t disappear — it moves upstream, into specification.
The senior engineer who can write a specification that an agent can implement without clarification is worth dramatically more than the senior engineer who can implement anything but can’t articulate it.
Tools for Specification
- CLAUDE.md / AGENTS.md: Project-level context files that establish architectural constraints and conventions for the agent
- Acceptance criteria in markdown: Straightforward but effective
- BDD (Gherkin) format: Given/When/Then structure forces precision and maps directly to test scenarios
- SPARC framework: Specification → Pseudocode → Architecture → Refinement → Completion — a structured approach to specification that decomposes complexity
- Kiro: AWS spec-first IDE with canonical three-file structure (requirements.md, design.md, tasks.md)
Writing Specifications That Work: The Practitioner Guide
Addy Osmani’s “How to Write a Good Spec for AI Agents” is the most widely referenced practical guide. Key principles:
Structure Like a PRD
A good specification has distinct sections, not a prose description:
| Section | Purpose |
|---|---|
| Commands | How to run, test, build, lint |
| Testing framework | What test runner, what conventions |
| Project structure | What lives where — before the agent touches anything |
| Code style | One concrete example beats a paragraph of prose |
| Git workflow | Branch naming, commit format, PR expectations |
| Boundaries | What the agent must NEVER touch |
Goal-Oriented Language, Not Implementation-Prescriptive
Tell the agent what, not how:
# GOOD
Sessions should expire after 24 hours of inactivity.
Expired sessions must return 401 on the next request.
# BAD
Add a cron job that runs every hour to delete sessions
older than 86400 seconds from the sessions table.
The second form precludes better solutions the agent might find. The first form expresses intent — the agent chooses the mechanism.
The Three-Tier Permission Model
Osmani’s framework: every specification has three tiers of agent behavior.
| Tier | Language | Example |
|---|---|---|
| Always do | ”Always”, “Must" | "Always run the test suite before committing” |
| Ask first | ”Check with me before…”, “Confirm before…" | "Confirm before modifying the auth module” |
| Never do | ”Never”, “Do not”, “Never modify" | "Never modify the database schema without a migration” |
Boundaries (the “Never” tier) are the most important section of any spec. Agents optimize aggressively. If you don’t draw a boundary, they will cross it.
Build In Self-Checks
A spec without self-validation produces code that satisfies the letter, not the spirit:
## After implementing
- Compare implementation against each requirement in this spec
- Confirm all acceptance criteria are met
- List any requirements you were unable to fulfill and explain why
This forces the agent to audit its own output against the original intent — a lightweight version of LLM-as-judge evaluation.
LLM-as-Judge for Soft Criteria
Some requirements can’t be expressed as unit tests: “the API should feel intuitive,” “error messages should be helpful.” For these, a separate evaluation agent can review the output against the criteria — the same principle StrongDM uses for external scenario validation.
Conformance Suites
Advanced specification pattern: YAML-based language-independent test contracts that define expected behavior across multiple implementations.
# conformance/session-timeout.yaml
- case: "Session expires after inactivity"
given: session_idle_for: 86401s
expect: response_status: 401
- case: "Active session does not expire"
given: session_last_active: 1s_ago
expect: response_status: 200
These live outside the codebase — the same architectural choice StrongDM makes with external scenarios. The agent implements; the harness validates.
CLAUDE.md and AGENTS.md Best Practices
For project-level context files that persist across agent invocations:
- Keep under 300 lines — longer files get ignored or skimmed
- WHY/WHAT/HOW structure: Why this project exists, what it does, how it’s built
- Progressive disclosure: The most critical constraints first, details later
- Domain vocabulary: Define terms the agent won’t know from training (“in this codebase, ‘session’ means X, not Y”)
- Anti-patterns list: Common mistakes the agent makes, documented once and never repeated
The AGENTS.md standard extends this to a cross-vendor format that works with Claude Code, Cursor, Codex, Cline, and others.
Kiro: Specification-First IDE
Kiro (AWS) is the first IDE built around specification-first development as a first-class workflow. Its canonical structure:
- requirements.md: User stories in EARS format (Easy Approach to Requirements Syntax)
- design.md: Architecture decisions, component design, data flows
- tasks.md: Implementation checklist generated from requirements + design
Kiro adds Agent Hooks — triggers that fire on file save events — creating a specification-to-implementation loop that runs on every change.
The Brownfield Problem
Legacy systems without documented behavior cannot be specification-driven until you reverse-engineer their implicit specs. This is unglamorous work with no shortcut:
- Instrument the existing system to capture behavior
- Write specifications that describe what you observe
- Build test coverage against the specifications
- Only then introduce agents that implement against the specs
The specification is not a design document for new behavior — it’s a capture of existing behavior that the agent is allowed to modify within defined constraints.