A multi-agent system that produces high-quality software specifications through adversarial review. Eight specialised Claude agents collaborate and compete — drafting, reviewing through multiple lenses, revising, and judging convergence — while human gates ensure alignment at critical decision points.
The system automates the manual loop of running plan-spec then grill-spec repeatedly, grounded in research showing adversarial multi-agent review overcomes the "Degeneration-of-Thought" problem inherent in single-agent self-reflection.
Source Documents
|
v
[ DISCOVERY ] ──> Extract actors, scope, constraints, requirements
|
v
[ HUMAN GATE 1 ] ──> Confirm / correct requirements (up to 3 corrections)
|
v
[ DRAFTING ] ──> Produce spec + holdout test dataset from templates
|
v
[ HUMAN GATE 2 ] ──> Resolve ambiguity warnings (1 redraft allowed)
|
v
┌─────────────────────────────────┐
│ [ REVIEWING ] │
│ 4 parallel reviewer agents │ Adversarial review loop
│ 8 lenses across 4 groups │ (2-5 rounds, configurable)
│ | │
│ [ REVISING ] │
│ Address findings │
│ | │
│ [ JUDGING ] │
│ Convergence check │
│ Anti-gaming pre-checks │
└─────────────┬───────────────────┘
|
v
[ HUMAN GATE FINAL ] ──> Only if critical findings remain
|
v
[ FINALIZED ]
The dashboard displays this as a visual pipeline stepper showing all stages, with completed stages in green, the current stage pulsing, and future stages grayed out.
| Agent | Role | Lenses |
|---|---|---|
| Discovery | Extracts requirements from source documents | — |
| Drafter | Produces specification and holdout test data | — |
| Reviewer (Clarity) | Ambiguity, Incompleteness | AMB, INC |
| Reviewer (Consistency) | Consistency, Feasibility | CON, FEA |
| Reviewer (Security) | Security, Operability | SEC, OPS |
| Reviewer (Correctness) | Correctness, Complexity | COR, CPX |
| Reviser | Addresses findings from reviewers | — |
| Judge | Evaluates convergence, renders PASS/REVISE/BLOCK verdict | — |
The judge's PASS verdict is subject to deterministic anti-gaming checks:
- All CRITICAL findings must be closed or dismissed
- Revision change logs must reference every CRITICAL and MAJOR finding
- Minimum round count must be met
- Authority limits per round: max 2 severity downgrades, max 3 dismissals
- Cumulative escalation: total downgrades + dismissals > 5 triggers escalation
The workflow halts automatically when any limit is exceeded:
- Max rounds — round count exceeds configured maximum (default: 5)
- Max findings — cumulative finding count exceeds threshold (default: 60)
- Staleness — CRITICAL/MAJOR findings stuck for N consecutive rounds (default: 2)
- Wall clock — elapsed time exceeds budget (default: 60 minutes)
- Cost — cumulative API cost exceeds budget (default: $50)
- Go 1.21+ (built with 1.25)
- Claude CLI installed and authenticated
plan-specandgrill-specskill directories (see Configuration)
go build -o specworkflow ./cmd/specworkflow./specworkflow --config config.yaml --workspace ./workspaceOpen http://localhost:8080 for the dashboard.
| Flag | Default | Description |
|---|---|---|
--port |
8080 |
HTTP listen port |
--workspace |
./workspace |
Directory for spec files, uploads, and metrics |
--config |
(none) | Path to YAML configuration file |
--otel-port |
4317 |
gRPC OTLP receiver port for Claude Code telemetry (0 to disable) |
Create a config.yaml:
# Required: paths to skill directories
skill_paths:
plan_spec: "/path/to/.claude/skills/plan-spec"
grill_spec: "/path/to/.claude/skills/grill-spec"
# Optional: workflow limits (defaults shown)
max_rounds: 5 # Maximum review/revise iterations
min_rounds: 2 # Minimum iterations before acceptance
max_total_findings: 60 # Upper bound on cumulative findings
staleness_threshold: 2 # Consecutive stale rounds before halt
max_wall_clock_minutes: 60 # Time budget
max_cost_usd: 50.0 # Cost budget
max_gate_corrections: 3 # Max human corrections at Gate 1
max_gate2_redrafts: 1 # Max redrafts at Gate 2
max_retries: 2 # Agent retry attempts for transient failuresThe system requires two Claude skill directories containing the templates that govern spec structure and review criteria:
- plan-spec: Must contain
spec-template.md,bdd-template.md,test-dataset-template.md - grill-spec: Must contain
review-constitution.md,report-template.md
The web dashboard provides real-time visibility into workflow execution. Multiple workflows can run concurrently, each tracked independently.
- Controls — Active workflow list, start new workflows, upload source documents, assign documents to workflows, manage workspace
- Spec — View and diff spec versions as they evolve through rounds
- Issues — Track findings with severity/status/lens filtering and lifecycle management
- Convergence — Monitor review/revision convergence metrics and round history
- Messages — Filtered workflow log (OTEL, Orchestrator, Claude Runner, Agent Events, State Transitions)
A persistent top panel shows aggregate metrics updated in real-time via SSE:
- Pipeline stepper — visual chain of all workflow stages with progress indication
- Feature name, round number, workflow state badge
- Cost (from OTEL telemetry), elapsed wall clock time
- Token usage (input, output, cache read), API call count, agent cost
- Activity feed of individual tool and API events
Multiple workflows can execute concurrently, each processing a different feature:
- The active workflow list shows all running workflows with state badges
- Notification badges appear on workflows needing attention (at gate states)
- Click a workflow to switch context — the status panel, gates, and all tabs update
- Each workflow has isolated source documents, state, and artefacts
Gate panels appear when the workflow requires human input:
- Gate 1 — Review discovery output, answer open questions, provide corrections (editable inline fields), add reviewer comments
- Gate 2 — Resolve ambiguity warnings (accept/answer/defer per warning), add reviewer comments
- Gate Final — Approve or reject when critical findings persist after convergence
Workflows can be rewound to any previous stage (DISCOVERY, DRAFTING, REVIEWING, REVISING, JUDGING) while preserving prerequisite artefacts. Downstream outputs are deleted and the workflow resumes from the target stage. Available via the rewind controls on each workflow card.
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/workflow/start |
Start new workflow with feature name and source docs |
| POST | /api/workflow/cancel |
Cancel running workflow |
| GET | /api/workflow/status |
Poll workflow status (single or all) |
| POST | /api/workflow/resume |
Resume from ESCALATED/ERROR/paused state |
| POST | /api/workflow/rewind |
Rewind to target state and round |
| POST | /api/workflow/finalize |
Force transition to FINALIZED |
| POST | /api/workflow/reset |
Delete feature directory entirely |
| POST | /api/workflow/restart |
Stop, delete, and restart workflow |
| POST | /api/workflow/retry |
Clear stale state file |
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/upload |
Upload source documents to global library |
| GET | /api/uploads |
List uploaded files |
| POST | /api/workflow/{feature}/source-docs |
Assign documents to a workflow |
| GET | /api/workflow/{feature}/source-docs |
List documents assigned to a workflow |
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/tasks/{id}/approve |
Approve gate (with corrections/resolutions) |
| POST | /api/tasks/{id}/reject |
Reject gate (cancel workflow) |
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/workspace/features |
List all features with metadata |
| GET | /api/workspace/features/{name}/discovery |
Feature discovery output |
| GET | /api/workspace/features/{name}/state |
Feature workflow state |
| GET | /api/workspace/features/{name}/files/{f} |
Specific feature file |
| GET | /api/spec/* |
Spec versions, diffs, issues, convergence |
| GET | /api/metrics |
Persisted OTEL telemetry |
| GET | /api/messages |
Workflow log messages |
| GET | /api/logs/server |
Server log ring buffer |
| Method | Endpoint | Description |
|---|---|---|
| GET | /ws |
WebSocket event stream |
cmd/specworkflow/main.go CLI entry point, HTTP routing
internal/api/
workflow_handler.go HTTP handlers, WorkflowManager (concurrent map)
otel_receiver.go OTLP gRPC receiver for Claude telemetry
metrics_store.go SQLite persistence for telemetry
websocket.go WebSocket hub and broadcasting
spec_endpoints.go Spec/issue/convergence REST endpoints
upload.go Source document upload
log_stream.go Server log and message streaming
internal/specworkflow/
orchestrator.go Main workflow loop and state coordination
statemachine.go State machine with guarded transitions
orchestrator_discovery.go Discovery phase + Gate 1 handling
orchestrator_drafting.go Drafting phase + Gate 2 handling
orchestrator_review.go Review dispatch + revision + judging
orchestrator_finalize.go Finalization and output assembly
claude_runner.go Claude CLI subprocess execution
review_dispatch.go Parallel reviewer dispatch with retry
prompts.go Prompt construction for all agents
convergence.go Anti-gaming pre-checks and convergence
breakers.go Circuit breaker evaluation
issues.go Issue tracker with lifecycle transitions
skills.go Skill template loading and caching
persistence.go Atomic state persistence
recovery.go Agent failure detection and retry
resume.go Crash/restart recovery
rewind.go Workflow rewind to previous stages
security.go Prompt injection mitigation
config.go Configuration parsing and validation
types.go Core type definitions
events.go Event system (14 event types)
team.go Agent team definition
static/
index.html Dashboard HTML
app.js Dashboard JavaScript (SPA)
style.css Dashboard styles
State is persisted to workspace/specs/{feature}/workflow-state.json via atomic write (temp file + rename). On server restart, the system resumes from the persisted state:
- Gate states — Restored automatically; gate panels reappear in the dashboard
- Agent states — If agent output exists on disk, the step is skipped (crash recovery); otherwise the agent is re-dispatched
OTEL telemetry from Claude Code is persisted to workspace/metrics.db (SQLite, WAL mode):
- Aggregate counters per feature: tokens, cost, API calls — upserted on every OTEL update
- Individual events: tool invocations and API calls with duration, cost, timestamp
- 90-day retention with automatic cleanup on startup
- Survives browser refresh and server restart — in-memory accumulators restored from SQLite
workspace/
metrics.db SQLite telemetry database
source-docs/ Uploaded reference documents (global library)
specs/{feature}/
source-docs/ Per-workflow document copies
workflow-state.json Persisted workflow state
workflow-log.jsonl Structured workflow log
discovery-output.json Discovery agent output
gate1-corrections.json Human corrections (if any)
user-answers.json Human answers to open questions
human-comments.json Free-text reviewer comments
drafter-output.json Drafter agent output
spec-v0.md Initial spec draft
spec-v{N}.md Revised spec (per round)
{feature}-holdouts.md Holdout test dataset
review-{a,b,c,d}-round-{N}.json Reviewer outputs per round
merged-findings-round-{N}.json Merged findings per round
revision-round-{N}.json Revision output per round
judge-round-{N}.json Judge output per round
go test ./...32 test files cover all major components including the state machine, orchestrator, convergence protocol, circuit breakers, issue lifecycle, agent output validation, prompt construction, persistence, recovery, resume, rewind, security, configuration, and all HTTP/WebSocket handlers.
internal/specworkflow/— Core workflow engine (pure Go, no HTTP dependencies)internal/api/— HTTP/WebSocket/gRPC layer (depends on specworkflow)cmd/specworkflow/— CLI entry point (depends on api)static/— Dashboard frontend (vanilla JS, no build step)
- Add the lens code to
lensGroupMapinprompts.go - Add the lens description to
review-constitution.mdin the grill-spec skill - Assign the lens to a reviewer group (or create a new reviewer in
team.go)
The dashboard receives real-time updates via 14 WebSocket event types:
spec_version, issue_update, convergence_update, gate_request, gate_response, circuit_breaker, agent_error, state_transition, agent_dispatch, agent_complete, workflow_status, agent_metrics, agent_tool_event, agent_api_event
Proprietary. All rights reserved.