feat: standalone CLI + test harness + Diátaxis docs + benchmarks by mabry1985 · Pull Request #2 · protoLabsAI/protoLedger

mabry1985 · 2026-05-17T04:29:59Z

Summary

Standalone CLI — decision-record runs the full planning pipeline against any OpenAI-compatible endpoint. Phase state machine drives sub-agents (scoping, deciding, lens-rotating skeptic, decomposer), pauses at human sign-off gates, hands off to Linear or filesystem.
Test harness — reusable MCP stdio client + tmp-project helper + script-replay mock OpenAI client. 50 tests green in 210ms (48 unit + 2 full-pipeline flow).
Diátaxis docs — docs/ reorganized into tutorials / how-to / reference / explanation. First-user tutorial walks the roguelike benchmark prompt end-to-end. Joel's upstream canon preserved as explanation.
Benchmark harness — benchmarks/roguelike-ai-poc/ with the canonical prompt, reference artifacts, and a run.sh for regression checks as the system evolves.
CI — GitHub Actions runs typecheck + build + tests on Node 20 + 22 for push and PR.

What you can do now

cd server && npm install && npm run build

export OPENAI_API_KEY=sk-…
node dist/cli.js --idea "your idea here" --effort poc --yes

Or use as a Claude Code plugin via the existing .claude-plugin/plugin.json.

Test plan

npm run typecheck clean
npm test — 50/50 pass in 210ms
Manually dogfooded against the roguelike-ai-poc benchmark (artifacts in benchmarks/)
CI runs green on Node 20 + 22

Known follow-ups (not in this PR)

Marketplace publishing of the plugin
Live Linear export test against a real workspace
Reconciliation logic for interrupted Linear exports
Duplicate sign-off entry on handoff (cosmetic; noted during dogfood)
Per-knob gate-override CLI flags (currently must edit project.json)

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added a standalone CLI tool for generating decision records and project plans through an automated pipeline.
- Added a benchmarking framework with a roguelike AI POC reference implementation.
- Added GitHub Actions workflow for continuous testing.
Documentation
- Reorganized documentation using the Diátaxis framework with tutorials, how-to guides, reference materials, and explanations.
- Added comprehensive guides for installation, CLI usage, provider configuration, and Linear handoff workflows.
Tests
- Added end-to-end pipeline flow tests and unit tests for gate logic and schemas.
- Added test helpers for mock LLM interactions and temporary project management.
Chores
- Updated CLI entry point configuration and build setup.

- Add `decision-record` CLI (second bin alongside MCP server) that drives the full planning pipeline against any OpenAI-compatible endpoint (OPENAI_API_KEY + OPENAI_BASE_URL — works with OpenAI, OpenRouter, Ollama, vLLM, LiteLLM). Phase state machine, sub-agents (scoping, deciding, lens-rotating skeptic, decomposer), checkpointed control flow, PRD ingestion, resume support. - Add reusable test harness: event-driven MCP stdio client, disposable tmp-project helper, script-replay mock OpenAI client. 50 tests across unit (gate eval + schemas, 48 tests) and flow (full pipeline + skeptic-block path, 2 tests). All green in 210ms. - Reorganize docs/ into Diátaxis quadrants: tutorials/, how-to/, reference/, explanation/. New first-user tutorial walks the roguelike benchmark prompt end-to-end. Five how-to guides cover install, run, providers, Linear handoff, and gate calibration. Four reference pages document the CLI, MCP tools, data model, and gate matrix. Three explanation pages cover design rationale, the five-phase pipeline, and Joel's canonical material (preserved from upstream-canon). - Add benchmarks/ with the roguelike-ai-poc canonical prompt + reference artifacts + a run.sh for regression checks as the system evolves. - Add GitHub Actions CI (.github/workflows/test.yml) that runs typecheck, build, and the test matrix on Node 20 + 22 for every push and PR. - Minor fixes: add semver regex to PipelineState.schema_version; add cli.ts entry to tsup config; bug fix in orchestrator where pre-advance gate check treated sign-off as a blocker. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

coderabbitai · 2026-05-17T04:30:13Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 068a6616-d5eb-485c-ac29-90f605c4ce1a

📥 Commits

Reviewing files that changed from the base of the PR and between d8b8887 and 63d661a.

⛔ Files ignored due to path filters (3)

CONTRIBUTING.md is excluded by !*.md
README.md is excluded by !*.md
server/package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (71)

.github/workflows/test.yml
CITATION.cff
LICENSE
benchmarks/README.md
benchmarks/roguelike-ai-poc/prompt.md
benchmarks/roguelike-ai-poc/reference/decisions/0001-choose-the-implementation-language.json
benchmarks/roguelike-ai-poc/reference/decisions/0001-choose-the-implementation-language.md
benchmarks/roguelike-ai-poc/reference/decisions/0002-define-the-world-representation-and-renderer.json
benchmarks/roguelike-ai-poc/reference/decisions/0002-define-the-world-representation-and-renderer.md
benchmarks/roguelike-ai-poc/reference/decisions/0003-define-the-agent-action-contract.json
benchmarks/roguelike-ai-poc/reference/decisions/0003-define-the-agent-action-contract.md
benchmarks/roguelike-ai-poc/reference/decisions/0004-define-the-tick-loop-and-termination-conditions.json
benchmarks/roguelike-ai-poc/reference/decisions/0004-define-the-tick-loop-and-termination-conditions.md
benchmarks/roguelike-ai-poc/reference/events.jsonl
benchmarks/roguelike-ai-poc/reference/index.html
benchmarks/roguelike-ai-poc/reference/project.json
benchmarks/roguelike-ai-poc/reference/project.md
benchmarks/roguelike-ai-poc/reference/tasks/T0001-bootstrap-repository.json
benchmarks/roguelike-ai-poc/reference/tasks/T0001-bootstrap-repository.md
benchmarks/roguelike-ai-poc/reference/tasks/T0002-implement-world-module-tile-grid-entity-dict.json
benchmarks/roguelike-ai-poc/reference/tasks/T0002-implement-world-module-tile-grid-entity-dict.md
benchmarks/roguelike-ai-poc/reference/tasks/T0003-implement-frame-renderer.json
benchmarks/roguelike-ai-poc/reference/tasks/T0003-implement-frame-renderer.md
benchmarks/roguelike-ai-poc/reference/tasks/T0004-implement-openai-agent-client.json
benchmarks/roguelike-ai-poc/reference/tasks/T0004-implement-openai-agent-client.md
benchmarks/roguelike-ai-poc/reference/tasks/T0005-implement-action-handlers-and-termination-checks.json
benchmarks/roguelike-ai-poc/reference/tasks/T0005-implement-action-handlers-and-termination-checks.md
benchmarks/roguelike-ai-poc/reference/tasks/T0006-implement-the-tick-based-game-loop.json
benchmarks/roguelike-ai-poc/reference/tasks/T0006-implement-the-tick-based-game-loop.md
benchmarks/roguelike-ai-poc/reference/tasks/T0007-implement-cli-entry-script.json
benchmarks/roguelike-ai-poc/reference/tasks/T0007-implement-cli-entry-script.md
benchmarks/roguelike-ai-poc/run.sh
docs/README.md
docs/architecture.md
docs/explanation/design-rationale.md
docs/explanation/the-five-phases.md
docs/explanation/why-decision-records.md
docs/how-to/calibrate-gates.md
docs/how-to/configure-providers.md
docs/how-to/handoff-to-linear.md
docs/how-to/install.md
docs/how-to/run-the-cli.md
docs/quickstart.md
docs/reference/cli.md
docs/reference/data-model.md
docs/reference/gates.md
docs/reference/mcp-tools.md
docs/tutorials/your-first-plan.md
docs/usage.md
server/package.json
server/src/cli.ts
server/src/cli/agents/deciding.ts
server/src/cli/agents/decomposer.ts
server/src/cli/agents/scoping.ts
server/src/cli/agents/skeptic.ts
server/src/cli/checkpoints.ts
server/src/cli/index.ts
server/src/cli/orchestrator.ts
server/src/cli/prd.ts
server/src/llm/agent.ts
server/src/llm/client.ts
server/src/llm/tools.ts
server/src/schemas/index.ts
server/tests/flow-poc-pipeline.test.ts
server/tests/helpers/index.ts
server/tests/helpers/mcp-client.ts
server/tests/helpers/mock-openai.ts
server/tests/helpers/tmp-project.ts
server/tests/unit-gate.test.ts
server/tests/unit-schemas.test.ts
server/tsup.config.ts

Walkthrough

This PR implements a complete CLI-based decision-record planning tool that orchestrates a five-phase LLM-driven pipeline (intake → scoping → deciding → decomposing → handing-off), with persistent JSON state, hard gates for phase transitions, antagonistic lens-based decision review, and support for handoffs to Linear or filesystem targets. It includes comprehensive documentation, a roguelike AI POC benchmark, and end-to-end tests.

Changes

Decision-Record Planning Pipeline

Layer / File(s)	Summary
Documentation and benchmark reference `docs/README.md`, `docs/explanation/`, `docs/how-to/`, `docs/reference/`, `docs/tutorials/`, `benchmarks/`	Reorganized docs following Diátaxis framework with tutorials, how-to guides, reference, and explanation sections. Added complete roguelike AI POC benchmark including four decision records, seven task specifications, project manifest, events log, and HTML rendering that serves as a reference for the planning pipeline output.
LLM client infrastructure and agent runner `server/src/llm/client.ts`, `server/src/llm/agent.ts`, `server/src/llm/tools.ts`	OpenAI client configuration that resolves from environment/overrides, agent runner that executes tool-calling loops up to max iterations, records tool calls and results, and tool listing/execution utilities that filter available tools, validate inputs with Zod, inject cwd context, and parse results.
CLI entry point and argument parsing `server/src/cli.ts`, `server/src/cli/index.ts`, `server/src/cli/checkpoints.ts`, `server/src/cli/prd.ts`	Main CLI entrypoint that parses arguments (--idea, --prd, --effort, model overrides, --resume, --yes), loads PRD hints, resolves LLM config, and invokes the pipeline. Checkpoint helpers provide confirm/ask user interaction and colored stderr output for progress display.
Pipeline orchestration `server/src/cli/orchestrator.ts`	Orchestrator that reads initial project state, loops through phases (intake → scoping → deciding → decomposing → handing-off) until handed-off or error. Dispatches phase agents, evaluates gate passability, manages human sign-off prompts, and coordinates Linear or filesystem exports with dry-run preview.
Phase agents `server/src/cli/agents/scoping.ts`, `server/src/cli/agents/deciding.ts`, `server/src/cli/agents/decomposer.ts`, `server/src/cli/agents/skeptic.ts`	Specialized LLM agents: scoping synthesizes MVP boundary from project/PRD via `dr_update_scope`; deciding proposes and selects positions via tool calls; decomposing builds task graph and validates with `dr_validate_graph`; skeptic performs antagonistic lens-based review of decisions with block/pass verdicts.
Schema validation and build config `server/src/schemas/index.ts`, `server/package.json`, `server/tsup.config.ts`	Zod schemas for projects, decisions, tasks, pipeline state, and events with semver validation. Updated package.json to add `decision-record` CLI binary, split test scripts (unit/flow), and added `openai ^6.38.0`. Updated tsup to build CLI as separate bundle alongside MCP server.
Test infrastructure `server/tests/helpers/`	MCP server subprocess client with JSON-RPC 2.0 wrapper for test tool invocation, mock OpenAI SDK that consumes scripted responses, and temporary project directory helpers with JSONL event log reading.
Gate and schema unit tests `server/tests/unit-gate.test.ts`, `server/tests/unit-schemas.test.ts`	Comprehensive unit tests validating preset-based gate configs (poc/mvp/full), all five phase transitions with gate pass/fail cases, and JSON schema parsing for all entity types including nested structures and validation constraints.
End-to-end integration tests `server/tests/flow-poc-pipeline.test.ts`	Two flow tests: (1) full happy-path pipeline execution with mock LLM through all five phases, verifying artifact creation, decision reviews, and event log; (2) decision rejection when skeptic lens blocks and user denies override.
GitHub Actions CI `.github/workflows/test.yml`	Automated test runner on push/PR to main with Node 20/22 matrix, npm caching, typecheck, build, and test execution (unit and flow).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/cli-tests-docs

mabry1985 merged commit 08f763e into main May 17, 2026
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: standalone CLI + test harness + Diátaxis docs + benchmarks#2

feat: standalone CLI + test harness + Diátaxis docs + benchmarks#2
mabry1985 merged 1 commit into
mainfrom
feat/cli-tests-docs

mabry1985 commented May 17, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

Uh oh!

coderabbitai Bot commented May 17, 2026 •

edited

Loading

Review failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mabry1985 commented May 17, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What you can do now

Test plan

Known follow-ups (not in this PR)

Summary by CodeRabbit

Uh oh!

Uh oh!

coderabbitai Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mabry1985 commented May 17, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 17, 2026 •

edited

Loading