methodology
Dark Factories

Specification-Driven Development

The methodology at the heart of dark factory development. Instead of writing code, engineers write specifications — precise, machine-readable descriptions of what software should do. The bottleneck shifts from implementation speed to specification quality. Requires a fundamentally different engineering skill set.

Specification-driven development is the practice that separates Level 3+ from Level 0–2. Instead of engineers writing code, engineers write specifications — and agents implement them.

At Level 5, the specification is everything. The spec is the job. Implementation is automated.

What a Specification Looks Like

StrongDM’s approach: markdown files that describe the desired behavior, architecture constraints, and acceptance criteria.

Example structure:

# Feature: User session timeout

## Context
Our session management system currently has no timeout mechanism.
Users who leave sessions open indefinitely can create security risks.

## Desired behavior
- Sessions should expire after 24 hours of inactivity
- Expiry should be tracked server-side, not client-side
- Expired sessions should return 401 on next request
- Users should receive a notification 30 minutes before expiry (via existing notification service)

## Constraints
- Must work with existing PostgreSQL session store
- Cannot modify the session table schema (migration risk)
- Must not break existing session tests
- Performance: expiry check < 5ms per request

## Acceptance criteria
- [ ] Session expires after 24h inactivity
- [ ] 30-min warning notification fires correctly
- [ ] Expired sessions return 401, not 200 or 403
- [ ] Existing session tests all pass
- [ ] Load test shows < 5ms overhead per request

The agent reads this, implements it across multiple files, runs the acceptance tests. The engineer evaluates whether the criteria were met.

The New Bottleneck

At Level 0–2, the bottleneck is typing speed, cognitive load, and knowledge of APIs. At Level 4–5, the bottleneck is specification quality.

A bad specification produces bad software — not because the agent is bad at coding, but because the agent is good at implementing exactly what you said rather than what you meant.

The skills that matter:

  • Domain understanding: You can only specify what you understand
  • Precision: Ambiguous specs produce ambiguous software
  • Edge case enumeration: Agents implement the happy path unless you specify the unhappy path
  • Systems thinking: Cross-cutting concerns must be explicitly specified
  • Customer understanding: The spec must capture user intent, not just technical behavior

Specification vs. Testing

The external scenario testing methodology (see External Scenario Testing) complements specification-driven development:

  • The spec says what the software should do
  • The scenarios verify that it actually does it, from the outside, in ways the agent didn’t see during development

Together, they create a feedback loop that agents can run autonomously: implement the spec, run the scenarios, iterate until green.

Learning to Specify

The hardest transition for experienced engineers: you’ve spent your career building the judgment to make implementation decisions in real-time. That judgment doesn’t disappear — it moves upstream, into specification.

The senior engineer who can write a specification that an agent can implement without clarification is worth dramatically more than the senior engineer who can implement anything but can’t articulate it.

Tools for Specification

  • CLAUDE.md / AGENTS.md: Project-level context files that establish architectural constraints and conventions for the agent
  • Acceptance criteria in markdown: Straightforward but effective
  • BDD (Gherkin) format: Given/When/Then structure forces precision and maps directly to test scenarios
  • SPARC framework: Specification → Pseudocode → Architecture → Refinement → Completion — a structured approach to specification that decomposes complexity
  • Kiro: AWS spec-first IDE with canonical three-file structure (requirements.md, design.md, tasks.md)

Writing Specifications That Work: The Practitioner Guide

Addy Osmani’s “How to Write a Good Spec for AI Agents” is the most widely referenced practical guide. Key principles:

Structure Like a PRD

A good specification has distinct sections, not a prose description:

SectionPurpose
CommandsHow to run, test, build, lint
Testing frameworkWhat test runner, what conventions
Project structureWhat lives where — before the agent touches anything
Code styleOne concrete example beats a paragraph of prose
Git workflowBranch naming, commit format, PR expectations
BoundariesWhat the agent must NEVER touch

Goal-Oriented Language, Not Implementation-Prescriptive

Tell the agent what, not how:

# GOOD
Sessions should expire after 24 hours of inactivity.
Expired sessions must return 401 on the next request.

# BAD
Add a cron job that runs every hour to delete sessions
older than 86400 seconds from the sessions table.

The second form precludes better solutions the agent might find. The first form expresses intent — the agent chooses the mechanism.

The Three-Tier Permission Model

Osmani’s framework: every specification has three tiers of agent behavior.

TierLanguageExample
Always do”Always”, “Must""Always run the test suite before committing”
Ask first”Check with me before…”, “Confirm before…""Confirm before modifying the auth module”
Never do”Never”, “Do not”, “Never modify""Never modify the database schema without a migration”

Boundaries (the “Never” tier) are the most important section of any spec. Agents optimize aggressively. If you don’t draw a boundary, they will cross it.

Build In Self-Checks

A spec without self-validation produces code that satisfies the letter, not the spirit:

## After implementing
- Compare implementation against each requirement in this spec
- Confirm all acceptance criteria are met
- List any requirements you were unable to fulfill and explain why

This forces the agent to audit its own output against the original intent — a lightweight version of LLM-as-judge evaluation.

LLM-as-Judge for Soft Criteria

Some requirements can’t be expressed as unit tests: “the API should feel intuitive,” “error messages should be helpful.” For these, a separate evaluation agent can review the output against the criteria — the same principle StrongDM uses for external scenario validation.

Conformance Suites

Advanced specification pattern: YAML-based language-independent test contracts that define expected behavior across multiple implementations.

# conformance/session-timeout.yaml
- case: "Session expires after inactivity"
  given: session_idle_for: 86401s
  expect: response_status: 401
- case: "Active session does not expire"
  given: session_last_active: 1s_ago
  expect: response_status: 200

These live outside the codebase — the same architectural choice StrongDM makes with external scenarios. The agent implements; the harness validates.

CLAUDE.md and AGENTS.md Best Practices

For project-level context files that persist across agent invocations:

  • Keep under 300 lines — longer files get ignored or skimmed
  • WHY/WHAT/HOW structure: Why this project exists, what it does, how it’s built
  • Progressive disclosure: The most critical constraints first, details later
  • Domain vocabulary: Define terms the agent won’t know from training (“in this codebase, ‘session’ means X, not Y”)
  • Anti-patterns list: Common mistakes the agent makes, documented once and never repeated

The AGENTS.md standard extends this to a cross-vendor format that works with Claude Code, Cursor, Codex, Cline, and others.

Kiro: Specification-First IDE

Kiro (AWS) is the first IDE built around specification-first development as a first-class workflow. Its canonical structure:

  • requirements.md: User stories in EARS format (Easy Approach to Requirements Syntax)
  • design.md: Architecture decisions, component design, data flows
  • tasks.md: Implementation checklist generated from requirements + design

Kiro adds Agent Hooks — triggers that fire on file save events — creating a specification-to-implementation loop that runs on every change.

The Brownfield Problem

Legacy systems without documented behavior cannot be specification-driven until you reverse-engineer their implicit specs. This is unglamorous work with no shortcut:

  1. Instrument the existing system to capture behavior
  2. Write specifications that describe what you observe
  3. Build test coverage against the specifications
  4. Only then introduce agents that implement against the specs

The specification is not a design document for new behavior — it’s a capture of existing behavior that the agent is allowed to modify within defined constraints.