Skip to content

feat: standalone CLI + test harness + Diátaxis docs + benchmarks#2

Merged
mabry1985 merged 1 commit into
mainfrom
feat/cli-tests-docs
May 17, 2026
Merged

feat: standalone CLI + test harness + Diátaxis docs + benchmarks#2
mabry1985 merged 1 commit into
mainfrom
feat/cli-tests-docs

Conversation

@mabry1985
Copy link
Copy Markdown

@mabry1985 mabry1985 commented May 17, 2026

Summary

  • Standalone CLIdecision-record runs the full planning pipeline against any OpenAI-compatible endpoint. Phase state machine drives sub-agents (scoping, deciding, lens-rotating skeptic, decomposer), pauses at human sign-off gates, hands off to Linear or filesystem.
  • Test harness — reusable MCP stdio client + tmp-project helper + script-replay mock OpenAI client. 50 tests green in 210ms (48 unit + 2 full-pipeline flow).
  • Diátaxis docsdocs/ reorganized into tutorials / how-to / reference / explanation. First-user tutorial walks the roguelike benchmark prompt end-to-end. Joel's upstream canon preserved as explanation.
  • Benchmark harnessbenchmarks/roguelike-ai-poc/ with the canonical prompt, reference artifacts, and a run.sh for regression checks as the system evolves.
  • CI — GitHub Actions runs typecheck + build + tests on Node 20 + 22 for push and PR.

What you can do now

cd server && npm install && npm run build

export OPENAI_API_KEY=sk-…
node dist/cli.js --idea "your idea here" --effort poc --yes

Or use as a Claude Code plugin via the existing .claude-plugin/plugin.json.

Test plan

  • npm run typecheck clean
  • npm test — 50/50 pass in 210ms
  • Manually dogfooded against the roguelike-ai-poc benchmark (artifacts in benchmarks/)
  • CI runs green on Node 20 + 22

Known follow-ups (not in this PR)

  • Marketplace publishing of the plugin
  • Live Linear export test against a real workspace
  • Reconciliation logic for interrupted Linear exports
  • Duplicate sign-off entry on handoff (cosmetic; noted during dogfood)
  • Per-knob gate-override CLI flags (currently must edit project.json)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added a standalone CLI tool for generating decision records and project plans through an automated pipeline.
    • Added a benchmarking framework with a roguelike AI POC reference implementation.
    • Added GitHub Actions workflow for continuous testing.
  • Documentation

    • Reorganized documentation using the Diátaxis framework with tutorials, how-to guides, reference materials, and explanations.
    • Added comprehensive guides for installation, CLI usage, provider configuration, and Linear handoff workflows.
  • Tests

    • Added end-to-end pipeline flow tests and unit tests for gate logic and schemas.
    • Added test helpers for mock LLM interactions and temporary project management.
  • Chores

    • Updated CLI entry point configuration and build setup.

Review Change Stack

- Add `decision-record` CLI (second bin alongside MCP server) that drives the
  full planning pipeline against any OpenAI-compatible endpoint
  (OPENAI_API_KEY + OPENAI_BASE_URL — works with OpenAI, OpenRouter, Ollama,
  vLLM, LiteLLM). Phase state machine, sub-agents (scoping, deciding,
  lens-rotating skeptic, decomposer), checkpointed control flow, PRD ingestion,
  resume support.

- Add reusable test harness: event-driven MCP stdio client, disposable
  tmp-project helper, script-replay mock OpenAI client. 50 tests across unit
  (gate eval + schemas, 48 tests) and flow (full pipeline + skeptic-block path,
  2 tests). All green in 210ms.

- Reorganize docs/ into Diátaxis quadrants: tutorials/, how-to/, reference/,
  explanation/. New first-user tutorial walks the roguelike benchmark prompt
  end-to-end. Five how-to guides cover install, run, providers, Linear handoff,
  and gate calibration. Four reference pages document the CLI, MCP tools, data
  model, and gate matrix. Three explanation pages cover design rationale, the
  five-phase pipeline, and Joel's canonical material (preserved from
  upstream-canon).

- Add benchmarks/ with the roguelike-ai-poc canonical prompt + reference
  artifacts + a run.sh for regression checks as the system evolves.

- Add GitHub Actions CI (.github/workflows/test.yml) that runs typecheck,
  build, and the test matrix on Node 20 + 22 for every push and PR.

- Minor fixes: add semver regex to PipelineState.schema_version; add cli.ts
  entry to tsup config; bug fix in orchestrator where pre-advance gate check
  treated sign-off as a blocker.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@mabry1985 mabry1985 merged commit 08f763e into main May 17, 2026
2 of 3 checks passed
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 17, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 068a6616-d5eb-485c-ac29-90f605c4ce1a

📥 Commits

Reviewing files that changed from the base of the PR and between d8b8887 and 63d661a.

⛔ Files ignored due to path filters (3)
  • CONTRIBUTING.md is excluded by !*.md
  • README.md is excluded by !*.md
  • server/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (71)
  • .github/workflows/test.yml
  • CITATION.cff
  • LICENSE
  • benchmarks/README.md
  • benchmarks/roguelike-ai-poc/prompt.md
  • benchmarks/roguelike-ai-poc/reference/decisions/0001-choose-the-implementation-language.json
  • benchmarks/roguelike-ai-poc/reference/decisions/0001-choose-the-implementation-language.md
  • benchmarks/roguelike-ai-poc/reference/decisions/0002-define-the-world-representation-and-renderer.json
  • benchmarks/roguelike-ai-poc/reference/decisions/0002-define-the-world-representation-and-renderer.md
  • benchmarks/roguelike-ai-poc/reference/decisions/0003-define-the-agent-action-contract.json
  • benchmarks/roguelike-ai-poc/reference/decisions/0003-define-the-agent-action-contract.md
  • benchmarks/roguelike-ai-poc/reference/decisions/0004-define-the-tick-loop-and-termination-conditions.json
  • benchmarks/roguelike-ai-poc/reference/decisions/0004-define-the-tick-loop-and-termination-conditions.md
  • benchmarks/roguelike-ai-poc/reference/events.jsonl
  • benchmarks/roguelike-ai-poc/reference/index.html
  • benchmarks/roguelike-ai-poc/reference/project.json
  • benchmarks/roguelike-ai-poc/reference/project.md
  • benchmarks/roguelike-ai-poc/reference/tasks/T0001-bootstrap-repository.json
  • benchmarks/roguelike-ai-poc/reference/tasks/T0001-bootstrap-repository.md
  • benchmarks/roguelike-ai-poc/reference/tasks/T0002-implement-world-module-tile-grid-entity-dict.json
  • benchmarks/roguelike-ai-poc/reference/tasks/T0002-implement-world-module-tile-grid-entity-dict.md
  • benchmarks/roguelike-ai-poc/reference/tasks/T0003-implement-frame-renderer.json
  • benchmarks/roguelike-ai-poc/reference/tasks/T0003-implement-frame-renderer.md
  • benchmarks/roguelike-ai-poc/reference/tasks/T0004-implement-openai-agent-client.json
  • benchmarks/roguelike-ai-poc/reference/tasks/T0004-implement-openai-agent-client.md
  • benchmarks/roguelike-ai-poc/reference/tasks/T0005-implement-action-handlers-and-termination-checks.json
  • benchmarks/roguelike-ai-poc/reference/tasks/T0005-implement-action-handlers-and-termination-checks.md
  • benchmarks/roguelike-ai-poc/reference/tasks/T0006-implement-the-tick-based-game-loop.json
  • benchmarks/roguelike-ai-poc/reference/tasks/T0006-implement-the-tick-based-game-loop.md
  • benchmarks/roguelike-ai-poc/reference/tasks/T0007-implement-cli-entry-script.json
  • benchmarks/roguelike-ai-poc/reference/tasks/T0007-implement-cli-entry-script.md
  • benchmarks/roguelike-ai-poc/run.sh
  • docs/README.md
  • docs/architecture.md
  • docs/explanation/design-rationale.md
  • docs/explanation/the-five-phases.md
  • docs/explanation/why-decision-records.md
  • docs/how-to/calibrate-gates.md
  • docs/how-to/configure-providers.md
  • docs/how-to/handoff-to-linear.md
  • docs/how-to/install.md
  • docs/how-to/run-the-cli.md
  • docs/quickstart.md
  • docs/reference/cli.md
  • docs/reference/data-model.md
  • docs/reference/gates.md
  • docs/reference/mcp-tools.md
  • docs/tutorials/your-first-plan.md
  • docs/usage.md
  • server/package.json
  • server/src/cli.ts
  • server/src/cli/agents/deciding.ts
  • server/src/cli/agents/decomposer.ts
  • server/src/cli/agents/scoping.ts
  • server/src/cli/agents/skeptic.ts
  • server/src/cli/checkpoints.ts
  • server/src/cli/index.ts
  • server/src/cli/orchestrator.ts
  • server/src/cli/prd.ts
  • server/src/llm/agent.ts
  • server/src/llm/client.ts
  • server/src/llm/tools.ts
  • server/src/schemas/index.ts
  • server/tests/flow-poc-pipeline.test.ts
  • server/tests/helpers/index.ts
  • server/tests/helpers/mcp-client.ts
  • server/tests/helpers/mock-openai.ts
  • server/tests/helpers/tmp-project.ts
  • server/tests/unit-gate.test.ts
  • server/tests/unit-schemas.test.ts
  • server/tsup.config.ts

Walkthrough

This PR implements a complete CLI-based decision-record planning tool that orchestrates a five-phase LLM-driven pipeline (intake → scoping → deciding → decomposing → handing-off), with persistent JSON state, hard gates for phase transitions, antagonistic lens-based decision review, and support for handoffs to Linear or filesystem targets. It includes comprehensive documentation, a roguelike AI POC benchmark, and end-to-end tests.

Changes

Decision-Record Planning Pipeline

Layer / File(s) Summary
Documentation and benchmark reference
docs/README.md, docs/explanation/, docs/how-to/, docs/reference/, docs/tutorials/, benchmarks/
Reorganized docs following Diátaxis framework with tutorials, how-to guides, reference, and explanation sections. Added complete roguelike AI POC benchmark including four decision records, seven task specifications, project manifest, events log, and HTML rendering that serves as a reference for the planning pipeline output.
LLM client infrastructure and agent runner
server/src/llm/client.ts, server/src/llm/agent.ts, server/src/llm/tools.ts
OpenAI client configuration that resolves from environment/overrides, agent runner that executes tool-calling loops up to max iterations, records tool calls and results, and tool listing/execution utilities that filter available tools, validate inputs with Zod, inject cwd context, and parse results.
CLI entry point and argument parsing
server/src/cli.ts, server/src/cli/index.ts, server/src/cli/checkpoints.ts, server/src/cli/prd.ts
Main CLI entrypoint that parses arguments (--idea, --prd, --effort, model overrides, --resume, --yes), loads PRD hints, resolves LLM config, and invokes the pipeline. Checkpoint helpers provide confirm/ask user interaction and colored stderr output for progress display.
Pipeline orchestration
server/src/cli/orchestrator.ts
Orchestrator that reads initial project state, loops through phases (intake → scoping → deciding → decomposing → handing-off) until handed-off or error. Dispatches phase agents, evaluates gate passability, manages human sign-off prompts, and coordinates Linear or filesystem exports with dry-run preview.
Phase agents
server/src/cli/agents/scoping.ts, server/src/cli/agents/deciding.ts, server/src/cli/agents/decomposer.ts, server/src/cli/agents/skeptic.ts
Specialized LLM agents: scoping synthesizes MVP boundary from project/PRD via dr_update_scope; deciding proposes and selects positions via tool calls; decomposing builds task graph and validates with dr_validate_graph; skeptic performs antagonistic lens-based review of decisions with block/pass verdicts.
Schema validation and build config
server/src/schemas/index.ts, server/package.json, server/tsup.config.ts
Zod schemas for projects, decisions, tasks, pipeline state, and events with semver validation. Updated package.json to add decision-record CLI binary, split test scripts (unit/flow), and added openai ^6.38.0. Updated tsup to build CLI as separate bundle alongside MCP server.
Test infrastructure
server/tests/helpers/
MCP server subprocess client with JSON-RPC 2.0 wrapper for test tool invocation, mock OpenAI SDK that consumes scripted responses, and temporary project directory helpers with JSONL event log reading.
Gate and schema unit tests
server/tests/unit-gate.test.ts, server/tests/unit-schemas.test.ts
Comprehensive unit tests validating preset-based gate configs (poc/mvp/full), all five phase transitions with gate pass/fail cases, and JSON schema parsing for all entity types including nested structures and validation constraints.
End-to-end integration tests
server/tests/flow-poc-pipeline.test.ts
Two flow tests: (1) full happy-path pipeline execution with mock LLM through all five phases, verifying artifact creation, decision reviews, and event log; (2) decision rejection when skeptic lens blocks and user denies override.
GitHub Actions CI
.github/workflows/test.yml
Automated test runner on push/PR to main with Node 20/22 matrix, npm caching, typecheck, build, and test execution (unit and flow).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/cli-tests-docs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant