Harness-driven, spec-to-code execution for Claude Code.
ADP is a Claude Code skill that turns a spec file into shipped, committed code through four adaptive phases — Specify → Design → Tasks → Execute — with feedforward guides (generated from your codebase) and feedback sensors (lint, typecheck, test) enforced at every boundary.
- Skill layer (
SKILL.md) — methodology the agent follows. - Runtime layer (
src/) — TypeScript helpers for loading guides, running sensors, and persisting pipeline state.
Built on TLC Spec-Driven (four-phase methodology) and Karpathy's agent coding skills (execution discipline). ADP adds a computational harness — live sensors, sprint scoring, stuck detection, and a feature-branch → PR workflow.
- Install
- Quick Start
- Methodology
- Directory Layout
- Commands
- Architecture
- Templates
- Development
- Influences
Install once per machine. The installer copies skill files to ~/.claude/skills/adp/
and installs the adp CLI globally via npm. Claude Code picks up the skill
automatically in every project.
curl -fsSL https://raw.githubusercontent.com/0xPuncker/adp/main/bin/install.sh | bashiwr -useb https://raw.githubusercontent.com/0xPuncker/adp/main/bin/install.ps1 | iexPowerShell execution policy: if
iexis blocked, run withSet-ExecutionPolicy -Scope Process Bypassfirst, or invoke viapowershell -ExecutionPolicy Bypass -Command "iwr ... | iex".
If you don't have Node 22+ and only want the skill methodology files:
ADP_SKILL_ONLY=1 curl -fsSL https://raw.githubusercontent.com/0xPuncker/adp/main/bin/install.sh | bashPowerShell:
$env:ADP_SKILL_ONLY = "1"
iwr -useb https://raw.githubusercontent.com/0xPuncker/adp/main/bin/install.ps1 | iexIf Node 22+ isn't available, build a standalone adp binary from a checkout:
git clone https://github.com/0xPuncker/adp.git
cd adp && npm install && npm run build
npm run build:standalone # produces dist/adp-<plat>-<arch>[.exe]The standalone binary excludes the TUI (
adp tui/adp-i). All other commands work normally.
To upgrade an existing install:
adp update # main branch
adp update --branch feat/foo # specific branchThis re-runs the platform-appropriate installer. The adp command picks
PowerShell on native Windows, bash everywhere else.
To remove ADP completely:
adp uninstall # confirms before removing
adp uninstall -y # skip confirmationRemoves:
~/.claude/skills/adp/— skill files and templates- The global
adpCLI (vianpm uninstall -g adp) - Any standalone binary at
~/.claude/skills/adp/bin/adp[.exe]
ls ~/.claude/skills/adp/SKILL.md && echo "ok"Then open Claude Code in any project and say adp init.
Both installers honour these environment variables:
| Variable | Default | Purpose |
|---|---|---|
CLAUDE_SKILLS_DIR |
~/.claude/skills |
Alternate skills root |
ADP_BRANCH |
main |
Branch, tag, or commit to install |
ADP_FORCE |
0 |
1 overwrites existing install without prompting |
ADP_SKILL_ONLY |
0 |
1 installs skill files only, skips CLI |
ADP_DRY_RUN |
0 |
1 prints actions without executing |
gitandcurlonPATH(for shell installer);gitandiwr(PowerShell)- Node.js ≥ 22 +
npm(for CLI install — skill-only mode skips this) - Claude Code with skill support (for the skill side)
git clone https://github.com/0xPuncker/adp.git
cd adp
npm install
npm run build
# Symlink your local copy into Claude Code's skills dir:
ln -s "$(pwd)" ~/.claude/skills/adpInside any target project:
You > adp init
Claude > detects stack, creates .adp/ + .specs/, writes harness.yaml, runs adp map
You > adp run payments
Claude > Specify → clarifying questions → spec.md
→ Design → design.md
→ Tasks → tasks.md (atomic, parallel-marked, REQ-traced)
→ Execute → build → sensors → commit, per task
→ Validate → REQ coverage + UAT
State persists between sessions. Stop with adp pause, continue with adp resume.
flowchart LR
req([feature request]) --> size{complexity?}
size -->|Small| quick[Quick Mode]
size -->|Medium| specM[Specify]
size -->|Large / Complex| specL[Specify]
specM --> execM[Execute]
specL --> design[Design]
design --> tasks[Tasks]
tasks --> execL[Execute]
quick --> validate[Validate]
execM --> validate
execL --> validate
validate --> done([shipped])
Phases auto-size to the scope of the work:
| Scope | Criteria | Phases |
|---|---|---|
| Small | ≤3 files, ≤1h, no new deps | Quick Mode only |
| Medium | Clear feature, <10 tasks | Specify → Execute → Validate |
| Large | Multi-component, 10+ tasks | All phases |
| Complex | Ambiguous / new domain | All + gray-area discussion + interactive UAT |
Every piece of work is traceable end-to-end:
flowchart TD
req["<b>REQ-01.2</b> <i>spec.md</i><br/>WHEN invalid email THEN 422"]
task["<b>TASK-05</b> <i>tasks.md</i><br/>Requirement: REQ-01.2<br/>Files: src/routes/auth.ts"]
sprint["<b>Sprint</b> <i>execution</i><br/>contract → build → sensors → score"]
commit["<b>commit</b><br/>feat(auth): validate email format<br/>SHA recorded in state.json"]
val["<b>validation.md</b><br/>REQ-01.2 ✓ covered by TASK-05"]
req --> task --> sprint --> commit --> val
Break the chain = validation failure.
Two layers protect every task:
- Feedforward — guides (
.adp/guides/*.md) are generated byadp mapfrom your codebase. They are injected into context before each phase so the agent sees this project's conventions, not a generic model prior. - Feedback — sensors (
.adp/harness.yaml) are real shell commands (typecheck, lint, test) run after every build. No commit until they pass. 3 failures on the same error ⇒ stuck detection ⇒ halt and ask the user.
Every task inside Execute flows through the same gated loop:
stateDiagram-v2
[*] --> Contract: sprint_start
Contract --> Build: state goal + verification
Build --> Sensors: code written
Sensors --> Score: typecheck ✓ lint ✓ test ✓
Sensors --> Fix: any sensor failed
Fix --> Sensors: retry (≤3x)
Fix --> Blocker: same error 3x
Score --> Commit: score recorded
Commit --> [*]: sprint_end
Blocker --> [*]: halt + log STATE.md
A failing sensor never auto-merges — the pipeline either retries, escalates, or halts and asks the user.
Autonomy is scoped to code, not infrastructure. Every shell command falls into one of three zones; the zone decides whether the agent may run it unprompted:
sequenceDiagram
participant A as Agent
participant U as You
participant S as Shell
Note over A: 🟢 Free — code + sensors + local git
A->>S: tsc --noEmit / eslint / vitest
S-->>A: pass/fail
A->>S: git add / git commit (local)
Note over A,U: 🟡 Gated — declared in harness.yaml actions:
A->>U: "run 'docker compose up -d postgres'?"
U-->>A: approve (once per session)
A->>S: docker compose up -d postgres
Note over A,U: 🔴 Always ask — destructive or externally visible
A->>U: "run 'flyctl deploy'?"
U-->>A: approve (every call)
A->>S: flyctl deploy
See SKILL.md → Methodology Rules → Action Zones for the full policy.
- Never fabricate. Resolve facts via Knowledge Verification Chain: codebase → project docs → Context7 MCP → web → flag uncertain.
- Scope lock. Touch only files listed in the current task. Out-of-scope
findings →
STATE.md → Deferred Ideas. - Fresh context per task. Re-read what the next task needs; drop history.
- Conventional Commits 1.0.0 — no proprietary trailers; traceability via
state.json. - Don't skip sensors. Never disable a check to make it pass — fix the code.
- Action zones. Free for code, gated for infra, always-ask for destructive state.
.adp/
├── state.json # Pipeline runtime state (machine-readable)
├── harness.yaml # Sensor commands (typecheck / lint / test)
└── guides/ # 7 feedforward guides, generated by `adp map`
├── stack.md
├── architecture.md
├── structure.md
├── conventions.md
├── testing.md
├── integrations.md
└── concerns.md
.specs/
├── HANDOFF.md # created by `adp pause` — resume pointer
├── project/
│ ├── PROJECT.md # Vision, goals, constraints
│ ├── ROADMAP.md # Milestones, features, status
│ └── STATE.md # Decisions, blockers, deferred ideas
├── features/
│ └── {feature-name}/
│ ├── spec.md # Requirements (REQ-NN with User Stories)
│ ├── context.md # Gray-area decisions (only if needed)
│ ├── design.md # Architecture (Large/Complex only)
│ ├── tasks.md # Atomic tasks (Medium+ only)
│ └── validation.md # REQ coverage check after Execute
└── quick/
└── NNN-slug/
├── TASK.md # Quick-mode task
└── SUMMARY.md # Quick-mode result
| Command | Purpose |
|---|---|
adp init |
Detect stack, create .adp/ + .specs/, write harness.yaml, run adp map |
adp map |
Analyze codebase, generate the 7 feedforward guides |
adp feature <request> |
Create feat/<slug>, seed .specs/features/<slug>/spec.md, and start Specify |
adp run <feature> |
Execute full pipeline for a feature |
adp auto-mode <feature> |
Maximum-autonomy variant of run — runtime + e2e sensors, adaptive 3-attempt retry, gated push/PR at the end |
adp status |
Show current sprint, phase, recent activity |
adp verify |
Run all sensors; report pass/fail |
adp pause |
Snapshot to HANDOFF.md; stop gracefully |
adp resume |
Read handoff + state; continue from the exact stopping point |
adp tui |
Open the live dashboard (sprint table, activity log, live agent panel) |
adp completions <shell> |
Print bash / zsh / fish completion script to stdout |
All commands are triggered in natural conversation with Claude Code — the agent
reads SKILL.md and executes them using its built-in tools (Read, Write, Edit,
Bash, Glob, Grep). There is no standalone CLI binary required.
The optional runtime library (src/) is exported for programmatic use.
adp tui includes a Live Agents panel that tails the current Claude Code
session's subagents/ JSONL files in real time (~100ms latency via chokidar).
Each sub-agent the orchestrator spawns — evaluator, contract reviewer, parallel
worktree workers — is classified, scored against your harness.yaml thresholds,
and rendered with elapsed time and prompt snippet.
- Wide terminal (≥120 cols): three-column dashboard (sprints | activity | live).
- Medium (90–119 cols): live panel hidden on the dashboard; press
4or run/liveto focus the panel. - Narrow (<90 cols): live panel hidden entirely.
If the active session JSONL can't be located, the panel renders a degraded banner and the rest of the dashboard keeps working.
adp auto-mode <feature> is the unattended variant of adp run. It detects
the project stack (TypeScript / Python / Rust / Go), installs the matching
reference harness from templates/harness/auto-mode-<stack>.yaml if missing,
overrides autonomy.clarify=never and output=minimal for the duration of
the run, and applies an adaptive 3-attempt retry policy on sensor and
evaluator failures (re-diagnose, target the cause, re-run — instead of
identical retries). Push and PR remain gated at the end of the run; the
user clicks once.
Halt conditions:
- A sensor fails 3 adaptive attempts on the same task.
- A gated action is denied.
- A git conflict cannot be auto-resolved.
- The evaluator scores below
min_scoreafter one fix-up sprint.
Auto-mode does not bypass the commit-msg hook, force-push, merge PRs, or
flip always_ask actions to auto-approve. Autonomy without a quality gate
is just fast wrong code — if evaluator.enabled: false, auto-mode refuses
to start.
Tab-complete commands, subcommands, feature slugs from .specs/features/,
template names from the installed skill, and flags. Install:
# bash
adp completions bash > /usr/local/etc/bash_completion.d/adp
# zsh
adp completions zsh > "${fpath[1]}/_adp"
# fish
adp completions fish > ~/.config/fish/completions/adp.fishSee completions/README.md for per-shell install paths and reload notes.
adp/
├── SKILL.md # Methodology the agent follows
├── README.md # You are here
├── templates/
│ ├── SPEC.md # Copy into .specs/features/{name}/spec.md
│ ├── harness/ # Reference harness.yaml per stack
│ │ ├── auto-mode-typescript.yaml
│ │ ├── auto-mode-python.yaml
│ │ ├── auto-mode-rust.yaml
│ │ └── auto-mode-go.yaml
│ ├── hooks/ # git hooks (commit-msg enforcer)
│ └── agents/ # evaluator / contract-reviewer / worktree
├── completions/ # bash / zsh / fish shell completions
├── src/
│ ├── index.ts # Public exports
│ ├── types.ts # Domain types (Sprint, Activity, PipelineState…)
│ ├── cli.ts # CLI entry (adp sensors / status / guides…)
│ ├── interactive.ts # Interactive REPL
│ ├── ui/ # Ink/React status TUI
│ ├── harness/
│ │ ├── engine.ts # Runs sensor commands, reports pass/fail
│ │ ├── config.ts # Loads .adp/harness.yaml
│ │ └── engine.test.ts
│ ├── context/
│ │ └── loader.ts # Loads guides + specs from .adp/ and .specs/
│ └── state/
│ ├── manager.ts # Reads/writes .adp/state.json
│ └── manager.test.ts
├── package.json
├── tsconfig.json
└── vitest.config.ts
harness/executes sensors.engine.tsspawns the shell commands listed inharness.yamlin configuredorder, captures stdout/stderr/exit code, and returns a structured result the agent can act on.context/loader.tsreads.adp/guides/and.specs/into an object the agent can pass to a sub-agent — enabling targeted context-injection instead of loading the whole project.state/manager.tsowns.adp/state.json— sprint lifecycle, activity log, blockers. All writes go through it for consistency.
| Layer | Tells agent | Executes | File |
|---|---|---|---|
| Skill | what to do | agent itself (Read/Write/Bash/…) | SKILL.md |
| Runtime | how to do it reliably | Node process | src/*.ts |
The skill is authoritative. The runtime is a convenience.
templates/ contains pre-filled scaffolds for every artifact ADP expects.
Copy them when bootstrapping, or let the skill create them for you.
| Template | Copies to | Purpose |
|---|---|---|
PROJECT.md |
.specs/project/PROJECT.md |
Vision, goals, non-goals, personas, stack, constraints |
ROADMAP.md |
.specs/project/ROADMAP.md |
Now / Next / Later / Done milestones with status legend |
STATE.md |
.specs/project/STATE.md |
Decisions, Blockers, Learnings, Deferred Ideas, Todos |
SPEC.md |
.specs/features/{name}/spec.md |
Feature spec with REQ-NN User Stories + WHEN/THEN criteria |
tasks.md |
.specs/features/{name}/tasks.md |
Atomic tasks with Requirement / Files / Reuses / Parallel / Commit |
HANDOFF.md |
.specs/HANDOFF.md |
Pause/resume snapshot — progress, sensors, next steps |
harness/auto-mode-typescript.yaml |
.adp/harness.yaml |
Auto-mode harness for TS/Node — start-server-and-test smoke + Playwright e2e |
harness/auto-mode-python.yaml |
.adp/harness.yaml |
Auto-mode harness for Python — pytest with FastAPI TestClient for smoke, pytest-playwright for e2e |
harness/auto-mode-rust.yaml |
.adp/harness.yaml |
Auto-mode harness for Rust — cargo test --test smoke spawns the server in a tokio task |
harness/auto-mode-go.yaml |
.adp/harness.yaml |
Auto-mode harness for Go — httptest.NewServer inside a Go test for smoke |
Bootstrap a new feature manually:
mkdir -p .specs/features/my-feature
cp adp/templates/SPEC.md .specs/features/my-feature/spec.md
cp adp/templates/tasks.md .specs/features/my-feature/tasks.mdOr the recommended path — let adp run my-feature generate them with the spec
filled in from your clarifying answers.
npm run build # tsc → dist/
npm run typecheck # tsc --noEmit
npm run lint # eslint
npm test # vitest run
npm run test:watch # vitest in watch modeSingle test:
npx vitest run src/harness/engine.test.ts
npx vitest run -t "passes on exit code 0"ADP is assembled from two complementary bodies of methodology, each contributing a distinct layer.
TLC Spec-Driven establishes the four-phase pipeline — Specify → Design → Tasks → Execute — that structures all ADP feature runs. It introduces the REQ-ID chain (requirements → tasks → commits → validation) and the feedforward guide pattern: generate context from your own codebase before each phase, not from a generic prior.
ADP extends it with a computational harness: live sensors (typecheck / lint / test) enforced after every sprint, evaluator sub-agents that grade the output with fresh context, stuck detection when the same error repeats three times, and a feature-branch → PR workflow that makes the methodology machine-enforceable rather than advisory.
Andrej Karpathy's coding guidelines encode four failure modes that LLMs consistently exhibit and the rules that prevent them. ADP adopts all four as hard execution-time constraints:
| Principle | Failure it prevents | Where ADP enforces it |
|---|---|---|
| Think before coding | Wrong assumptions baked in before checking | Clarification Gate — resolves from codebase → docs → industry standard before asking; at most one question per run |
| Simplicity first | Over-engineering beyond what the spec requires | Code Minimalism rule — no speculative features, no single-use abstractions, no impossible-scenario error handling |
| Surgical changes | Drive-by improvements that corrupt diffs and break review | Scope Lock + Code Minimalism — touch only task files; match existing style; clean up only your own mess |
| Goal-driven execution | Stopping at "it seems to work" | Sprint Contract acceptance criteria + sensor gate — build until every criterion is verifiable, not until effort is expended |
- TLC Spec-Driven — the four-phase spec-driven methodology (Specify → Design → Tasks → Execute) that ADP is built on.
- Karpathy's Agent Coding Skills — four LLM coding discipline rules (think before coding, simplicity first, surgical changes, goal-driven execution) adopted by ADP as hard constraints.
- Conventional Commits 1.0.0 — commit message convention used by ADP.
- Anthropic — Harness Design for Long-Running Apps — the evaluator-as-separate-agent principle that underpins ADP's QA layer.