feat: standalone CLI + test harness + Diátaxis docs + benchmarks#2
Merged
Conversation
- Add `decision-record` CLI (second bin alongside MCP server) that drives the full planning pipeline against any OpenAI-compatible endpoint (OPENAI_API_KEY + OPENAI_BASE_URL — works with OpenAI, OpenRouter, Ollama, vLLM, LiteLLM). Phase state machine, sub-agents (scoping, deciding, lens-rotating skeptic, decomposer), checkpointed control flow, PRD ingestion, resume support. - Add reusable test harness: event-driven MCP stdio client, disposable tmp-project helper, script-replay mock OpenAI client. 50 tests across unit (gate eval + schemas, 48 tests) and flow (full pipeline + skeptic-block path, 2 tests). All green in 210ms. - Reorganize docs/ into Diátaxis quadrants: tutorials/, how-to/, reference/, explanation/. New first-user tutorial walks the roguelike benchmark prompt end-to-end. Five how-to guides cover install, run, providers, Linear handoff, and gate calibration. Four reference pages document the CLI, MCP tools, data model, and gate matrix. Three explanation pages cover design rationale, the five-phase pipeline, and Joel's canonical material (preserved from upstream-canon). - Add benchmarks/ with the roguelike-ai-poc canonical prompt + reference artifacts + a run.sh for regression checks as the system evolves. - Add GitHub Actions CI (.github/workflows/test.yml) that runs typecheck, build, and the test matrix on Node 20 + 22 for every push and PR. - Minor fixes: add semver regex to PipelineState.schema_version; add cli.ts entry to tsup config; bug fix in orchestrator where pre-advance gate check treated sign-off as a blocker. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: ⛔ Files ignored due to path filters (3)
📒 Files selected for processing (71)
WalkthroughThis PR implements a complete CLI-based decision-record planning tool that orchestrates a five-phase LLM-driven pipeline (intake → scoping → deciding → decomposing → handing-off), with persistent JSON state, hard gates for phase transitions, antagonistic lens-based decision review, and support for handoffs to Linear or filesystem targets. It includes comprehensive documentation, a roguelike AI POC benchmark, and end-to-end tests. ChangesDecision-Record Planning Pipeline
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
decision-recordruns the full planning pipeline against any OpenAI-compatible endpoint. Phase state machine drives sub-agents (scoping, deciding, lens-rotating skeptic, decomposer), pauses at human sign-off gates, hands off to Linear or filesystem.docs/reorganized into tutorials / how-to / reference / explanation. First-user tutorial walks the roguelike benchmark prompt end-to-end. Joel's upstream canon preserved as explanation.benchmarks/roguelike-ai-poc/with the canonical prompt, reference artifacts, and arun.shfor regression checks as the system evolves.What you can do now
Or use as a Claude Code plugin via the existing
.claude-plugin/plugin.json.Test plan
npm run typecheckcleannpm test— 50/50 pass in 210msbenchmarks/)Known follow-ups (not in this PR)
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Tests
Chores