Adversarial Spec System

A multi-agent system that produces high-quality software specifications through adversarial review. Eight specialised Claude agents collaborate and compete — drafting, reviewing through multiple lenses, revising, and judging convergence — while human gates ensure alignment at critical decision points.

The system automates the manual loop of running plan-spec then grill-spec repeatedly, grounded in research showing adversarial multi-agent review overcomes the "Degeneration-of-Thought" problem inherent in single-agent self-reflection.

How It Works

Source Documents
       |
       v
  [ DISCOVERY ]  ──>  Extract actors, scope, constraints, requirements
       |
       v
  [ HUMAN GATE 1 ]  ──>  Confirm / correct requirements (up to 3 corrections)
       |
       v
  [ DRAFTING ]  ──>  Produce spec + holdout test dataset from templates
       |
       v
  [ HUMAN GATE 2 ]  ──>  Resolve ambiguity warnings (1 redraft allowed)
       |
       v
  ┌─────────────────────────────────┐
  │  [ REVIEWING ]                  │
  │    4 parallel reviewer agents   │  Adversarial review loop
  │    8 lenses across 4 groups     │  (2-5 rounds, configurable)
  │            |                    │
  │  [ REVISING ]                   │
  │    Address findings             │
  │            |                    │
  │  [ JUDGING ]                    │
  │    Convergence check            │
  │    Anti-gaming pre-checks       │
  └─────────────┬───────────────────┘
                |
                v
  [ HUMAN GATE FINAL ]  ──>  Only if critical findings remain
                |
                v
          [ FINALIZED ]

The dashboard displays this as a visual pipeline stepper showing all stages, with completed stages in green, the current stage pulsing, and future stages grayed out.

Agents

Agent	Role	Lenses
Discovery	Extracts requirements from source documents	—
Drafter	Produces specification and holdout test data	—
Reviewer (Clarity)	Ambiguity, Incompleteness	AMB, INC
Reviewer (Consistency)	Consistency, Feasibility	CON, FEA
Reviewer (Security)	Security, Operability	SEC, OPS
Reviewer (Correctness)	Correctness, Complexity	COR, CPX
Reviser	Addresses findings from reviewers	—
Judge	Evaluates convergence, renders PASS/REVISE/BLOCK verdict	—

Convergence Protocol

The judge's PASS verdict is subject to deterministic anti-gaming checks:

All CRITICAL findings must be closed or dismissed
Revision change logs must reference every CRITICAL and MAJOR finding
Minimum round count must be met
Authority limits per round: max 2 severity downgrades, max 3 dismissals
Cumulative escalation: total downgrades + dismissals > 5 triggers escalation

Circuit Breakers

The workflow halts automatically when any limit is exceeded:

Max rounds — round count exceeds configured maximum (default: 5)
Max findings — cumulative finding count exceeds threshold (default: 60)
Staleness — CRITICAL/MAJOR findings stuck for N consecutive rounds (default: 2)
Wall clock — elapsed time exceeds budget (default: 60 minutes)
Cost — cumulative API cost exceeds budget (default: $50)

Quick Start

Prerequisites

Go 1.21+ (built with 1.25)
Claude CLI installed and authenticated
plan-spec and grill-spec skill directories (see Configuration)

Build

go build -o specworkflow ./cmd/specworkflow

Run

./specworkflow --config config.yaml --workspace ./workspace

Open http://localhost:8080 for the dashboard.

CLI Flags

Flag	Default	Description
`--port`	`8080`	HTTP listen port
`--workspace`	`./workspace`	Directory for spec files, uploads, and metrics
`--config`	(none)	Path to YAML configuration file
`--otel-port`	`4317`	gRPC OTLP receiver port for Claude Code telemetry (0 to disable)

Configuration

Create a config.yaml:

# Required: paths to skill directories
skill_paths:
  plan_spec: "/path/to/.claude/skills/plan-spec"
  grill_spec: "/path/to/.claude/skills/grill-spec"

# Optional: workflow limits (defaults shown)
max_rounds: 5              # Maximum review/revise iterations
min_rounds: 2              # Minimum iterations before acceptance
max_total_findings: 60     # Upper bound on cumulative findings
staleness_threshold: 2     # Consecutive stale rounds before halt
max_wall_clock_minutes: 60 # Time budget
max_cost_usd: 50.0         # Cost budget
max_gate_corrections: 3    # Max human corrections at Gate 1
max_gate2_redrafts: 1      # Max redrafts at Gate 2
max_retries: 2             # Agent retry attempts for transient failures

Skill Directories

The system requires two Claude skill directories containing the templates that govern spec structure and review criteria:

plan-spec: Must contain spec-template.md, bdd-template.md, test-dataset-template.md
grill-spec: Must contain review-constitution.md, report-template.md

Dashboard

The web dashboard provides real-time visibility into workflow execution. Multiple workflows can run concurrently, each tracked independently.

Tabs

Controls — Active workflow list, start new workflows, upload source documents, assign documents to workflows, manage workspace
Spec — View and diff spec versions as they evolve through rounds
Issues — Track findings with severity/status/lens filtering and lifecycle management
Convergence — Monitor review/revision convergence metrics and round history
Messages — Filtered workflow log (OTEL, Orchestrator, Claude Runner, Agent Events, State Transitions)

Workflow Status Panel

A persistent top panel shows aggregate metrics updated in real-time via SSE:

Pipeline stepper — visual chain of all workflow stages with progress indication
Feature name, round number, workflow state badge
Cost (from OTEL telemetry), elapsed wall clock time
Token usage (input, output, cache read), API call count, agent cost
Activity feed of individual tool and API events

Multi-Workflow Support

Multiple workflows can execute concurrently, each processing a different feature:

The active workflow list shows all running workflows with state badges
Notification badges appear on workflows needing attention (at gate states)
Click a workflow to switch context — the status panel, gates, and all tabs update
Each workflow has isolated source documents, state, and artefacts

Human Gates

Gate panels appear when the workflow requires human input:

Gate 1 — Review discovery output, answer open questions, provide corrections (editable inline fields), add reviewer comments
Gate 2 — Resolve ambiguity warnings (accept/answer/defer per warning), add reviewer comments
Gate Final — Approve or reject when critical findings persist after convergence

Workflow Rewind

Workflows can be rewound to any previous stage (DISCOVERY, DRAFTING, REVIEWING, REVISING, JUDGING) while preserving prerequisite artefacts. Downstream outputs are deleted and the workflow resumes from the target stage. Available via the rewind controls on each workflow card.

API Reference

Workflow Lifecycle

Method	Endpoint	Description
POST	`/api/workflow/start`	Start new workflow with feature name and source docs
POST	`/api/workflow/cancel`	Cancel running workflow
GET	`/api/workflow/status`	Poll workflow status (single or all)
POST	`/api/workflow/resume`	Resume from ESCALATED/ERROR/paused state
POST	`/api/workflow/rewind`	Rewind to target state and round
POST	`/api/workflow/finalize`	Force transition to FINALIZED
POST	`/api/workflow/reset`	Delete feature directory entirely
POST	`/api/workflow/restart`	Stop, delete, and restart workflow
POST	`/api/workflow/retry`	Clear stale state file

Source Documents

Method	Endpoint	Description
POST	`/api/upload`	Upload source documents to global library
GET	`/api/uploads`	List uploaded files
POST	`/api/workflow/{feature}/source-docs`	Assign documents to a workflow
GET	`/api/workflow/{feature}/source-docs`	List documents assigned to a workflow

Gates

Method	Endpoint	Description
POST	`/api/tasks/{id}/approve`	Approve gate (with corrections/resolutions)
POST	`/api/tasks/{id}/reject`	Reject gate (cancel workflow)

Data Access

Method	Endpoint	Description
GET	`/api/workspace/features`	List all features with metadata
GET	`/api/workspace/features/{name}/discovery`	Feature discovery output
GET	`/api/workspace/features/{name}/state`	Feature workflow state
GET	`/api/workspace/features/{name}/files/{f}`	Specific feature file
GET	`/api/spec/*`	Spec versions, diffs, issues, convergence
GET	`/api/metrics`	Persisted OTEL telemetry
GET	`/api/messages`	Workflow log messages
GET	`/api/logs/server`	Server log ring buffer

Real-Time

Method	Endpoint	Description
GET	`/ws`	WebSocket event stream

Architecture

cmd/specworkflow/main.go          CLI entry point, HTTP routing
internal/api/
  workflow_handler.go             HTTP handlers, WorkflowManager (concurrent map)
  otel_receiver.go                OTLP gRPC receiver for Claude telemetry
  metrics_store.go                SQLite persistence for telemetry
  websocket.go                    WebSocket hub and broadcasting
  spec_endpoints.go               Spec/issue/convergence REST endpoints
  upload.go                       Source document upload
  log_stream.go                   Server log and message streaming
internal/specworkflow/
  orchestrator.go                 Main workflow loop and state coordination
  statemachine.go                 State machine with guarded transitions
  orchestrator_discovery.go       Discovery phase + Gate 1 handling
  orchestrator_drafting.go        Drafting phase + Gate 2 handling
  orchestrator_review.go          Review dispatch + revision + judging
  orchestrator_finalize.go        Finalization and output assembly
  claude_runner.go                Claude CLI subprocess execution
  review_dispatch.go              Parallel reviewer dispatch with retry
  prompts.go                      Prompt construction for all agents
  convergence.go                  Anti-gaming pre-checks and convergence
  breakers.go                     Circuit breaker evaluation
  issues.go                       Issue tracker with lifecycle transitions
  skills.go                       Skill template loading and caching
  persistence.go                  Atomic state persistence
  recovery.go                     Agent failure detection and retry
  resume.go                       Crash/restart recovery
  rewind.go                       Workflow rewind to previous stages
  security.go                     Prompt injection mitigation
  config.go                       Configuration parsing and validation
  types.go                        Core type definitions
  events.go                       Event system (14 event types)
  team.go                         Agent team definition
static/
  index.html                      Dashboard HTML
  app.js                          Dashboard JavaScript (SPA)
  style.css                       Dashboard styles

Persistence

Workflow State

State is persisted to workspace/specs/{feature}/workflow-state.json via atomic write (temp file + rename). On server restart, the system resumes from the persisted state:

Gate states — Restored automatically; gate panels reappear in the dashboard
Agent states — If agent output exists on disk, the step is skipped (crash recovery); otherwise the agent is re-dispatched

Telemetry Metrics

OTEL telemetry from Claude Code is persisted to workspace/metrics.db (SQLite, WAL mode):

Aggregate counters per feature: tokens, cost, API calls — upserted on every OTEL update
Individual events: tool invocations and API calls with duration, cost, timestamp
90-day retention with automatic cleanup on startup
Survives browser refresh and server restart — in-memory accumulators restored from SQLite

Workspace Layout

workspace/
  metrics.db                       SQLite telemetry database
  source-docs/                     Uploaded reference documents (global library)
  specs/{feature}/
    source-docs/                   Per-workflow document copies
    workflow-state.json            Persisted workflow state
    workflow-log.jsonl             Structured workflow log
    discovery-output.json          Discovery agent output
    gate1-corrections.json         Human corrections (if any)
    user-answers.json              Human answers to open questions
    human-comments.json            Free-text reviewer comments
    drafter-output.json            Drafter agent output
    spec-v0.md                     Initial spec draft
    spec-v{N}.md                   Revised spec (per round)
    {feature}-holdouts.md          Holdout test dataset
    review-{a,b,c,d}-round-{N}.json  Reviewer outputs per round
    merged-findings-round-{N}.json    Merged findings per round
    revision-round-{N}.json        Revision output per round
    judge-round-{N}.json           Judge output per round

Testing

go test ./...

32 test files cover all major components including the state machine, orchestrator, convergence protocol, circuit breakers, issue lifecycle, agent output validation, prompt construction, persistence, recovery, resume, rewind, security, configuration, and all HTTP/WebSocket handlers.

Development

Project Structure

internal/specworkflow/ — Core workflow engine (pure Go, no HTTP dependencies)
internal/api/ — HTTP/WebSocket/gRPC layer (depends on specworkflow)
cmd/specworkflow/ — CLI entry point (depends on api)
static/ — Dashboard frontend (vanilla JS, no build step)

Adding a Review Lens

Add the lens code to lensGroupMap in prompts.go
Add the lens description to review-constitution.md in the grill-spec skill
Assign the lens to a reviewer group (or create a new reviewer in team.go)

WebSocket Events

The dashboard receives real-time updates via 14 WebSocket event types:

spec_version, issue_update, convergence_update, gate_request, gate_response, circuit_breaker, agent_error, state_transition, agent_dispatch, agent_complete, workflow_status, agent_metrics, agent_tool_event, agent_api_event

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.beads		.beads
.claude/skills		.claude/skills
.tasks		.tasks
cmd/specworkflow		cmd/specworkflow
docs		docs
internal		internal
static		static
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md
config.yaml		config.yaml
go.mod		go.mod
go.sum		go.sum

Folders and files

Latest commit

History

Repository files navigation

Adversarial Spec System

How It Works

Agents

Convergence Protocol

Circuit Breakers

Quick Start

Prerequisites

Build

Run

CLI Flags

Configuration

Skill Directories

Dashboard

Tabs

Workflow Status Panel

Multi-Workflow Support

Human Gates

Workflow Rewind

API Reference

Workflow Lifecycle

Source Documents

Gates

Data Access

Real-Time

Architecture

Persistence

Workflow State

Telemetry Metrics

Workspace Layout

Testing

Development

Project Structure

Adding a Review Lens

WebSocket Events

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages