diff --git a/.claude/commands/pr-review-loop.md b/.claude/commands/pr-review-loop.md new file mode 100644 index 000000000..3ee17dc7c --- /dev/null +++ b/.claude/commands/pr-review-loop.md @@ -0,0 +1,683 @@ +# PR Review Loop Command + +**Command**: `/pr-review-loop` +**Description**: Automated continuous PR review monitoring and fixing workflow +**Category**: Development Workflow +**Model**: sonnet (for plan mode compatibility) + +--- + +## Overview + +This command implements an automated loop that: +1. Monitors a PR for new review comments +2. Validates comments and creates implementation plans +3. Implements approved fixes +4. Commits and pushes changes +5. Waits 10 minutes before checking again + +The loop continues indefinitely until manually stopped or the PR is closed. + +--- + +## Usage + +``` +/pr-review-loop [pr-number] +``` + +**Arguments**: +- `pr-number` (optional): The PR number to monitor. If not provided, uses the current branch's PR. + +**Examples**: +``` +/pr-review-loop 9 +/pr-review-loop +``` + +--- + +## Workflow Steps + +### Phase 1: Retrieve Latest Comments (5-10 seconds) + +1. **Fetch PR data** using `gh pr view [number] --json comments,reviews,url,title,body,updatedAt` +2. **Create review snapshot** in workspace: + - File: `.ii/workspaces/{workspace_id}/pr-review-{number}-{timestamp}.md` + - Include: All comments, reviews, URLs, timestamps + - Pipe raw JSON output directly into file for reference + +3. **Parse and validate comments**: + - Extract actionable feedback + - Identify security issues (P1) + - Identify bugs and improvements (P2) + - Filter out non-actionable comments (questions, praise, etc.) + +4. **Create prioritized issue list**: + - Mark each comment as: `CRITICAL`, `HIGH`, `MEDIUM`, `LOW`, or `INFO_ONLY` + - Add validity assessment: `VALID`, `NEEDS_CLARIFICATION`, `INVALID`, `ALREADY_FIXED` + - Add action status: `TODO`, `IN_PROGRESS`, `DONE`, `WONT_FIX` + +5. **Update review document** with analysis: + ```markdown + # PR Review Analysis: PR #{number} + + **Retrieved**: {timestamp} + **Last Updated**: {pr.updatedAt} + **Total Comments**: {count} + **Actionable Issues**: {count} + + ## Summary + - Critical Issues: {count} + - High Priority: {count} + - Medium Priority: {count} + - Low Priority: {count} + - Informational: {count} + + ## Actionable Issues + + ### CRITICAL - Must Fix Immediately + 1. [VALID] [TODO] Comment by @user at file.ts:123 + - Issue: {description} + - Impact: {impact} + - Fix Required: {yes/no} + + ### HIGH - Should Fix Soon + ... + + ### MEDIUM - Can Fix Later + ... + + ### LOW - Nice to Have + ... + + ## Non-Actionable Comments + - [INFO_ONLY] Comment by @user: {text} + - [ALREADY_FIXED] Comment by @user: Fixed in commit {sha} + ``` + +### Phase 2: Enter Plan Mode (User Approval Required) + +1. **Present findings to user**: + ``` + 📋 Found {count} actionable issues in PR #{number}: + - {critical_count} CRITICAL + - {high_count} HIGH PRIORITY + - {medium_count} MEDIUM PRIORITY + - {low_count} LOW PRIORITY + + Review document created: pr-review-{number}-{timestamp}.md + + Entering plan mode to design fixes... + ``` + +2. **Enter plan mode** using `EnterPlanMode` tool + +3. **In plan mode**: + - Read the review document + - Validate each issue's legitimacy + - Design implementation approach for valid issues + - Create phased implementation plan: + - Phase 1: Critical security issues + - Phase 2: High-priority bugs + - Phase 3: Medium-priority improvements + - Phase 4: Low-priority enhancements + - Estimate effort and complexity + - Identify dependencies between fixes + - Write plan to plan file + +4. **Exit plan mode** and present plan to user for approval + +5. **User decision point**: + - ✅ **Approve**: Proceed to Phase 3 (Implementation) + - ❌ **Reject**: Skip to Phase 4 (Wait and repeat) + - 🔄 **Revise**: Return to plan mode with feedback + +### Phase 3: Implementation (After User Approval) + +1. **Create todo list** from approved plan: + ```typescript + TodoWrite({ + todos: [ + { content: "Fix P1-1: Security issue in file.ts", status: "pending", activeForm: "Fixing P1-1" }, + { content: "Fix P1-2: Another security issue", status: "pending", activeForm: "Fixing P1-2" }, + // ... all planned fixes + { content: "Commit and push changes", status: "pending", activeForm: "Committing changes" }, + ] + }) + ``` + +2. **Execute implementation**: + - Work through todos in priority order + - Mark each as `in_progress` before starting + - Mark as `completed` after finishing + - Handle errors gracefully (mark as `pending` with notes) + +3. **Run implementation** (same approach as previous session): + - Use multiple parallel agents for independent fixes + - Group related fixes by file/feature + - Ensure TypeScript compilation passes + - Preserve old code as comments for rollback + +4. **Verify changes**: + - Run `git status` to see changed files + - Run `git diff` to review changes + - Ensure no unintended modifications + +### Phase 4: Commit and Push + +1. **Stage changes**: + ```bash + git add [files] + ``` + +2. **Create descriptive commit message**: + ``` + fix(scope): address PR #{number} review comments + + This commit addresses {count} issues identified in PR review: + + ## Critical Fixes (P1) + - Fix 1: {description} + - Fix 2: {description} + + ## High Priority Fixes (P2) + - Fix 3: {description} + - Fix 4: {description} + + ## Related + - PR: #{number} + - Review: pr-review-{number}-{timestamp}.md + + Co-Authored-By: Claude Sonnet 4.5 + ``` + +3. **Commit and push**: + ```bash + git commit -m "{message}" + git push origin {branch} + ``` + +4. **Create follow-up comment on PR** (optional): + ```bash + gh pr comment {number} --body "✅ Addressed {count} review comments in commit {sha}: + - Fixed: {list of issues} + + Changes pushed to {branch}. Ready for re-review." + ``` + +### Phase 5: Wait and Monitor + +1. **Inform user**: + ``` + ✅ Changes committed and pushed. + + Waiting 10 minutes before checking for new comments... + Press Ctrl+C to stop the loop. + ``` + +2. **Sleep for 10 minutes**: + ```bash + sleep 600 + ``` + - Use timeout of 610 seconds (10 min 10 sec) to allow for graceful completion + - This allows manual interruption if needed + +3. **After sleep, check for updates**: + ```bash + gh pr view {number} --json updatedAt + ``` + - Compare `updatedAt` with previous timestamp + - If changed, proceed to Phase 1 + - If unchanged, sleep again (exponential backoff optional) + +4. **Loop back to Phase 1** + +--- + +## Implementation Instructions + +### For Claude Agent + +When this command is invoked, you should: + +1. **Parse the PR number**: + - If provided as argument, use it + - If not provided, detect from current branch: `gh pr list --head $(git branch --show-current) --json number` + +2. **Initialize loop state**: + - Store PR number, branch name, workspace ID + - Create initial review document + - Track last check timestamp + +3. **Execute loop**: + ``` + LOOP: + Phase 1: Retrieve and analyze comments + Phase 2: Enter plan mode (wait for user approval) + IF user approves plan: + Phase 3: Implement fixes + Phase 4: Commit and push + Phase 5: Sleep 10 minutes + GOTO LOOP + ``` + +4. **Handle interruptions**: + - If user cancels (Ctrl+C during sleep), exit gracefully + - If PR is closed/merged, exit with message + - If git push fails, ask user for guidance + +5. **State management**: + - Each iteration creates new review document with timestamp + - Track which comments have been addressed (by commit SHA) + - Don't re-fix already fixed issues + +### Review Document Format + +**File**: `.ii/workspaces/{workspace_id}/pr-review-{number}-{timestamp}.md` + +```markdown +# PR Review: #{number} - {title} + +**Retrieved**: {ISO timestamp} +**PR Last Updated**: {ISO timestamp} +**Branch**: {branch} +**URL**: {url} + +--- + +## Raw Data (for reference) + +
+Click to expand raw JSON + +```json +{raw gh pr view output} +``` + +
+ +--- + +## Analysis + +### Summary + +| Category | Count | +|----------|-------| +| Total Comments | {n} | +| Critical Issues | {n} | +| High Priority | {n} | +| Medium Priority | {n} | +| Low Priority | {n} | +| Informational | {n} | + +### Issues by Priority + +#### CRITICAL (P1) - {count} + +##### Issue 1: {title} +- **File**: {file}:{line} +- **Author**: @{username} +- **Posted**: {timestamp} +- **Validity**: VALID | NEEDS_CLARIFICATION | INVALID | ALREADY_FIXED +- **Status**: TODO | IN_PROGRESS | DONE | WONT_FIX +- **Priority**: CRITICAL +- **Impact**: {description} + +**Comment**: +> {original comment text} + +**Analysis**: +{your analysis of whether this needs fixing, why, and how} + +**Fix Required**: YES | NO | MAYBE + +**Fix Approach** (if applicable): +{brief description of how to fix} + +--- + +#### HIGH (P2) - {count} + +... same format ... + +--- + +#### MEDIUM (P3) - {count} + +... same format ... + +--- + +#### LOW (P4) - {count} + +... same format ... + +--- + +### Non-Actionable Comments + +#### Informational +- Comment by @user: {text} + +#### Already Fixed +- Comment by @user: Fixed in commit {sha} + +#### Invalid/Not Applicable +- Comment by @user: {reason why not applicable} + +--- + +## Previous Iterations + +### Iteration 1 - {timestamp} +- Issues Found: {n} +- Issues Fixed: {n} +- Commit: {sha} + +### Iteration 2 - {timestamp} +... + +``` + +--- + +## Safety Mechanisms + +### Rate Limiting +- Minimum 10 minutes between iterations +- Maximum 6 iterations per hour +- Exponential backoff if no changes detected + +### Error Handling +- If `gh` command fails, retry once, then skip iteration +- If TypeScript compilation fails, stop and ask user +- If git push fails, stop and ask user +- If plan is rejected, skip implementation and wait + +### User Control +- User can stop loop at any time (Ctrl+C) +- User must approve each plan before implementation +- User can choose to skip iterations + +### Conflict Detection +- Before each iteration, check for local uncommitted changes +- If found, ask user to commit or stash before continuing +- Check if branch has diverged from remote + +--- + +## Example Session + +``` +User: /pr-review-loop 9 +Agent: 🔄 Starting PR Review Loop for PR #9 + +[Phase 1: Retrieve Comments] +Fetching PR data... +✅ Retrieved 36 comments and 2 reviews +Created review document: pr-review-9-2026-02-02T12-30-00.md + +Analysis: +- 3 CRITICAL issues (P1) +- 10 HIGH priority issues (P2) +- 15 MEDIUM priority issues (P3) +- 8 LOW priority issues (P4) + +[Phase 2: Plan Mode] +📋 Entering plan mode to design fixes for 28 actionable issues... + +[Plan created - waiting for user approval] + +User: Approve + +Agent: ✅ Plan approved. Proceeding with implementation... + +[Phase 3: Implementation] +Starting implementation with 5 parallel agents... +- Agent 1: P1 security fixes (files: security-utils.ts, command-executor.ts) +- Agent 2: P1 path traversal (files: file-utils.ts) +- Agent 3: P2 backend fixes (files: command-executor.ts, speckit.ts) +- Agent 4: P2 frontend memory leaks (files: use-command-output.ts) +- Agent 5: P2 frontend state bugs (files: 8 component files) + +✅ All agents completed successfully +✅ TypeScript compilation: PASSING + +[Phase 4: Commit and Push] +Staging 12 files... +Creating commit message... +✅ Committed: 035ba37 +✅ Pushed to origin/001-speckit-ui-integration + +💬 Posted comment on PR #9: "Addressed 28 review comments in commit 035ba37" + +[Phase 5: Wait] +✅ Iteration 1 complete. Fixed 28 issues. + +⏳ Waiting 10 minutes before next check... +Press Ctrl+C to stop the loop. + +[10 minutes later...] + +[Phase 1: Retrieve Comments] +Checking for new comments... +✅ PR updated: 2 new comments found +Created review document: pr-review-9-2026-02-02T12-40-00.md + +Analysis: +- 0 CRITICAL issues +- 1 HIGH priority issue (P2) +- 1 MEDIUM priority issue (P3) + +[Phase 2: Plan Mode] +📋 Entering plan mode... + +...and so on... +``` + +--- + +## Configuration Options + +The command supports these optional configurations: + +### Environment Variables + +Set in your shell before running: + +```bash +export PR_REVIEW_LOOP_INTERVAL=600 # Seconds between checks (default: 600) +export PR_REVIEW_LOOP_MAX_ITERATIONS=100 # Max iterations before auto-stop (default: unlimited) +export PR_REVIEW_LOOP_AUTO_APPROVE=false # Skip plan approval (DANGEROUS, default: false) +export PR_REVIEW_LOOP_VERBOSE=true # Detailed logging (default: false) +``` + +### Inline Options + +Pass options with the command: + +``` +/pr-review-loop 9 --interval=300 --max-iterations=10 --verbose +``` + +--- + +## Stopping the Loop + +### Manual Stop +- Press `Ctrl+C` during the sleep phase +- Send `SIGTERM` to the process +- Close the terminal/session + +### Automatic Stop Conditions +- PR is closed or merged +- Maximum iterations reached (if configured) +- Consecutive errors exceed threshold (3) +- No changes detected for 1 hour (exponential backoff) + +### Graceful Shutdown +When stopped, the agent will: +1. Finish current phase (if mid-implementation) +2. Commit any pending changes (if user approves) +3. Save state to resume later +4. Create final report summarizing all iterations + +--- + +## State Persistence + +The loop maintains state across sessions in: + +``` +.ii/workspaces/{workspace_id}/pr-review-loop-state.json +``` + +**State includes**: +- PR number and branch +- Last check timestamp +- Iteration count +- Fixed issues (by commit SHA) +- Pending issues +- Error history + +**Resume command**: +``` +/pr-review-loop-resume 9 +``` + +This continues from the last saved state. + +--- + +## Advanced Features + +### Smart Comment Tracking + +The agent tracks which comments have been addressed: + +```json +{ + "pr": 9, + "comments": { + "IC_kwDOQ9wyV87kgEht": { + "status": "fixed", + "commit": "035ba37", + "iteration": 1, + "timestamp": "2026-02-02T12:35:00Z" + } + } +} +``` + +This prevents re-fixing already addressed issues. + +### Exponential Backoff + +If no new comments are found: +- Iteration 1-3: 10 minutes +- Iteration 4-6: 20 minutes +- Iteration 7-9: 30 minutes +- Iteration 10+: 60 minutes (max) + +Reset to 10 minutes when new comments appear. + +### Batch Processing + +If many issues are found (>20), offer to batch them: + +``` +Found 45 issues. Process as: +1. All at once (may take 2-3 hours) +2. Batch by priority (P1 first, then P2, etc.) +3. Batch by file/feature (related changes together) + +Choose option: +``` + +--- + +## Error Recovery + +### Compilation Errors +If TypeScript compilation fails: +1. Show errors to user +2. Ask: Retry fix | Skip this iteration | Stop loop +3. If retry: Create new plan focusing on errors + +### Git Conflicts +If push fails due to conflicts: +1. Fetch latest changes +2. Attempt rebase +3. If conflicts, ask user to resolve manually +4. Pause loop until resolved + +### API Rate Limits +If GitHub API rate limit hit: +1. Calculate time until reset +2. Sleep until reset + 1 minute +3. Resume loop + +--- + +## Best Practices + +### When to Use +✅ During active code review phase +✅ When expecting multiple review rounds +✅ For large PRs with many reviewers +✅ When working across timezones + +### When NOT to Use +❌ For PRs with <5 comments +❌ When rapid iteration is needed (use manual workflow) +❌ During active development (conflicts likely) +❌ When PR is nearly ready to merge + +### Tips +- Run in a dedicated terminal window +- Use tmux/screen for persistence +- Monitor the first 2-3 iterations manually +- Set reasonable max iterations +- Keep PR focused to minimize comment churn + +--- + +## Troubleshooting + +### Loop Not Starting +- Check PR number exists: `gh pr view 9` +- Verify gh CLI authentication: `gh auth status` +- Ensure on correct branch: `git branch --show-current` + +### Changes Not Pushing +- Check git credentials +- Verify branch is not protected +- Check for uncommitted local changes + +### Too Many Iterations +- Increase sleep interval +- Set max iterations limit +- Check if comments are being properly marked as fixed + +### Plan Always Rejected +- Review plan quality +- Check if issues are actually valid +- Consider manual workflow for complex scenarios + +--- + +## Related Commands + +- `/pr-review` - One-time PR review (no loop) +- `/fix-review-comments` - Implement specific comment fixes +- `/pr-status` - Check current PR status +- `/pr-review-loop-resume` - Resume stopped loop + +--- + +**Created**: 2026-02-02 +**Version**: 1.0 +**Model Required**: sonnet (for plan mode support) +**Experimental**: Yes - monitor first few iterations closely diff --git a/.claude/commands/speckit.analyze.md b/.claude/commands/speckit.analyze.md new file mode 100644 index 000000000..98b04b0c8 --- /dev/null +++ b/.claude/commands/speckit.analyze.md @@ -0,0 +1,184 @@ +--- +description: Perform a non-destructive cross-artifact consistency and quality analysis across spec.md, plan.md, and tasks.md after task generation. +--- + +## User Input + +```text +$ARGUMENTS +``` + +You **MUST** consider the user input before proceeding (if not empty). + +## Goal + +Identify inconsistencies, duplications, ambiguities, and underspecified items across the three core artifacts (`spec.md`, `plan.md`, `tasks.md`) before implementation. This command MUST run only after `/speckit.tasks` has successfully produced a complete `tasks.md`. + +## Operating Constraints + +**STRICTLY READ-ONLY**: Do **not** modify any files. Output a structured analysis report. Offer an optional remediation plan (user must explicitly approve before any follow-up editing commands would be invoked manually). + +**Constitution Authority**: The project constitution (`.specify/memory/constitution.md`) is **non-negotiable** within this analysis scope. Constitution conflicts are automatically CRITICAL and require adjustment of the spec, plan, or tasks—not dilution, reinterpretation, or silent ignoring of the principle. If a principle itself needs to change, that must occur in a separate, explicit constitution update outside `/speckit.analyze`. + +## Execution Steps + +### 1. Initialize Analysis Context + +Run `.specify/scripts/bash/check-prerequisites.sh --json --require-tasks --include-tasks` once from repo root and parse JSON for FEATURE_DIR and AVAILABLE_DOCS. Derive absolute paths: + +- SPEC = FEATURE_DIR/spec.md +- PLAN = FEATURE_DIR/plan.md +- TASKS = FEATURE_DIR/tasks.md + +Abort with an error message if any required file is missing (instruct the user to run missing prerequisite command). +For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot"). + +### 2. Load Artifacts (Progressive Disclosure) + +Load only the minimal necessary context from each artifact: + +**From spec.md:** + +- Overview/Context +- Functional Requirements +- Non-Functional Requirements +- User Stories +- Edge Cases (if present) + +**From plan.md:** + +- Architecture/stack choices +- Data Model references +- Phases +- Technical constraints + +**From tasks.md:** + +- Task IDs +- Descriptions +- Phase grouping +- Parallel markers [P] +- Referenced file paths + +**From constitution:** + +- Load `.specify/memory/constitution.md` for principle validation + +### 3. Build Semantic Models + +Create internal representations (do not include raw artifacts in output): + +- **Requirements inventory**: Each functional + non-functional requirement with a stable key (derive slug based on imperative phrase; e.g., "User can upload file" → `user-can-upload-file`) +- **User story/action inventory**: Discrete user actions with acceptance criteria +- **Task coverage mapping**: Map each task to one or more requirements or stories (inference by keyword / explicit reference patterns like IDs or key phrases) +- **Constitution rule set**: Extract principle names and MUST/SHOULD normative statements + +### 4. Detection Passes (Token-Efficient Analysis) + +Focus on high-signal findings. Limit to 50 findings total; aggregate remainder in overflow summary. + +#### A. Duplication Detection + +- Identify near-duplicate requirements +- Mark lower-quality phrasing for consolidation + +#### B. Ambiguity Detection + +- Flag vague adjectives (fast, scalable, secure, intuitive, robust) lacking measurable criteria +- Flag unresolved placeholders (TODO, TKTK, ???, ``, etc.) + +#### C. Underspecification + +- Requirements with verbs but missing object or measurable outcome +- User stories missing acceptance criteria alignment +- Tasks referencing files or components not defined in spec/plan + +#### D. Constitution Alignment + +- Any requirement or plan element conflicting with a MUST principle +- Missing mandated sections or quality gates from constitution + +#### E. Coverage Gaps + +- Requirements with zero associated tasks +- Tasks with no mapped requirement/story +- Non-functional requirements not reflected in tasks (e.g., performance, security) + +#### F. Inconsistency + +- Terminology drift (same concept named differently across files) +- Data entities referenced in plan but absent in spec (or vice versa) +- Task ordering contradictions (e.g., integration tasks before foundational setup tasks without dependency note) +- Conflicting requirements (e.g., one requires Next.js while other specifies Vue) + +### 5. Severity Assignment + +Use this heuristic to prioritize findings: + +- **CRITICAL**: Violates constitution MUST, missing core spec artifact, or requirement with zero coverage that blocks baseline functionality +- **HIGH**: Duplicate or conflicting requirement, ambiguous security/performance attribute, untestable acceptance criterion +- **MEDIUM**: Terminology drift, missing non-functional task coverage, underspecified edge case +- **LOW**: Style/wording improvements, minor redundancy not affecting execution order + +### 6. Produce Compact Analysis Report + +Output a Markdown report (no file writes) with the following structure: + +## Specification Analysis Report + +| ID | Category | Severity | Location(s) | Summary | Recommendation | +|----|----------|----------|-------------|---------|----------------| +| A1 | Duplication | HIGH | spec.md:L120-134 | Two similar requirements ... | Merge phrasing; keep clearer version | + +(Add one row per finding; generate stable IDs prefixed by category initial.) + +**Coverage Summary Table:** + +| Requirement Key | Has Task? | Task IDs | Notes | +|-----------------|-----------|----------|-------| + +**Constitution Alignment Issues:** (if any) + +**Unmapped Tasks:** (if any) + +**Metrics:** + +- Total Requirements +- Total Tasks +- Coverage % (requirements with >=1 task) +- Ambiguity Count +- Duplication Count +- Critical Issues Count + +### 7. Provide Next Actions + +At end of report, output a concise Next Actions block: + +- If CRITICAL issues exist: Recommend resolving before `/speckit.implement` +- If only LOW/MEDIUM: User may proceed, but provide improvement suggestions +- Provide explicit command suggestions: e.g., "Run /speckit.specify with refinement", "Run /speckit.plan to adjust architecture", "Manually edit tasks.md to add coverage for 'performance-metrics'" + +### 8. Offer Remediation + +Ask the user: "Would you like me to suggest concrete remediation edits for the top N issues?" (Do NOT apply them automatically.) + +## Operating Principles + +### Context Efficiency + +- **Minimal high-signal tokens**: Focus on actionable findings, not exhaustive documentation +- **Progressive disclosure**: Load artifacts incrementally; don't dump all content into analysis +- **Token-efficient output**: Limit findings table to 50 rows; summarize overflow +- **Deterministic results**: Rerunning without changes should produce consistent IDs and counts + +### Analysis Guidelines + +- **NEVER modify files** (this is read-only analysis) +- **NEVER hallucinate missing sections** (if absent, report them accurately) +- **Prioritize constitution violations** (these are always CRITICAL) +- **Use examples over exhaustive rules** (cite specific instances, not generic patterns) +- **Report zero issues gracefully** (emit success report with coverage statistics) + +## Context + +$ARGUMENTS diff --git a/.claude/commands/speckit.checklist.md b/.claude/commands/speckit.checklist.md new file mode 100644 index 000000000..970e6c9ed --- /dev/null +++ b/.claude/commands/speckit.checklist.md @@ -0,0 +1,294 @@ +--- +description: Generate a custom checklist for the current feature based on user requirements. +--- + +## Checklist Purpose: "Unit Tests for English" + +**CRITICAL CONCEPT**: Checklists are **UNIT TESTS FOR REQUIREMENTS WRITING** - they validate the quality, clarity, and completeness of requirements in a given domain. + +**NOT for verification/testing**: + +- ❌ NOT "Verify the button clicks correctly" +- ❌ NOT "Test error handling works" +- ❌ NOT "Confirm the API returns 200" +- ❌ NOT checking if code/implementation matches the spec + +**FOR requirements quality validation**: + +- ✅ "Are visual hierarchy requirements defined for all card types?" (completeness) +- ✅ "Is 'prominent display' quantified with specific sizing/positioning?" (clarity) +- ✅ "Are hover state requirements consistent across all interactive elements?" (consistency) +- ✅ "Are accessibility requirements defined for keyboard navigation?" (coverage) +- ✅ "Does the spec define what happens when logo image fails to load?" (edge cases) + +**Metaphor**: If your spec is code written in English, the checklist is its unit test suite. You're testing whether the requirements are well-written, complete, unambiguous, and ready for implementation - NOT whether the implementation works. + +## User Input + +```text +$ARGUMENTS +``` + +You **MUST** consider the user input before proceeding (if not empty). + +## Execution Steps + +1. **Setup**: Run `.specify/scripts/bash/check-prerequisites.sh --json` from repo root and parse JSON for FEATURE_DIR and AVAILABLE_DOCS list. + - All file paths must be absolute. + - For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot"). + +2. **Clarify intent (dynamic)**: Derive up to THREE initial contextual clarifying questions (no pre-baked catalog). They MUST: + - Be generated from the user's phrasing + extracted signals from spec/plan/tasks + - Only ask about information that materially changes checklist content + - Be skipped individually if already unambiguous in `$ARGUMENTS` + - Prefer precision over breadth + + Generation algorithm: + 1. Extract signals: feature domain keywords (e.g., auth, latency, UX, API), risk indicators ("critical", "must", "compliance"), stakeholder hints ("QA", "review", "security team"), and explicit deliverables ("a11y", "rollback", "contracts"). + 2. Cluster signals into candidate focus areas (max 4) ranked by relevance. + 3. Identify probable audience & timing (author, reviewer, QA, release) if not explicit. + 4. Detect missing dimensions: scope breadth, depth/rigor, risk emphasis, exclusion boundaries, measurable acceptance criteria. + 5. Formulate questions chosen from these archetypes: + - Scope refinement (e.g., "Should this include integration touchpoints with X and Y or stay limited to local module correctness?") + - Risk prioritization (e.g., "Which of these potential risk areas should receive mandatory gating checks?") + - Depth calibration (e.g., "Is this a lightweight pre-commit sanity list or a formal release gate?") + - Audience framing (e.g., "Will this be used by the author only or peers during PR review?") + - Boundary exclusion (e.g., "Should we explicitly exclude performance tuning items this round?") + - Scenario class gap (e.g., "No recovery flows detected—are rollback / partial failure paths in scope?") + + Question formatting rules: + - If presenting options, generate a compact table with columns: Option | Candidate | Why It Matters + - Limit to A–E options maximum; omit table if a free-form answer is clearer + - Never ask the user to restate what they already said + - Avoid speculative categories (no hallucination). If uncertain, ask explicitly: "Confirm whether X belongs in scope." + + Defaults when interaction impossible: + - Depth: Standard + - Audience: Reviewer (PR) if code-related; Author otherwise + - Focus: Top 2 relevance clusters + + Output the questions (label Q1/Q2/Q3). After answers: if ≥2 scenario classes (Alternate / Exception / Recovery / Non-Functional domain) remain unclear, you MAY ask up to TWO more targeted follow‑ups (Q4/Q5) with a one-line justification each (e.g., "Unresolved recovery path risk"). Do not exceed five total questions. Skip escalation if user explicitly declines more. + +3. **Understand user request**: Combine `$ARGUMENTS` + clarifying answers: + - Derive checklist theme (e.g., security, review, deploy, ux) + - Consolidate explicit must-have items mentioned by user + - Map focus selections to category scaffolding + - Infer any missing context from spec/plan/tasks (do NOT hallucinate) + +4. **Load feature context**: Read from FEATURE_DIR: + - spec.md: Feature requirements and scope + - plan.md (if exists): Technical details, dependencies + - tasks.md (if exists): Implementation tasks + + **Context Loading Strategy**: + - Load only necessary portions relevant to active focus areas (avoid full-file dumping) + - Prefer summarizing long sections into concise scenario/requirement bullets + - Use progressive disclosure: add follow-on retrieval only if gaps detected + - If source docs are large, generate interim summary items instead of embedding raw text + +5. **Generate checklist** - Create "Unit Tests for Requirements": + - Create `FEATURE_DIR/checklists/` directory if it doesn't exist + - Generate unique checklist filename: + - Use short, descriptive name based on domain (e.g., `ux.md`, `api.md`, `security.md`) + - Format: `[domain].md` + - If file exists, append to existing file + - Number items sequentially starting from CHK001 + - Each `/speckit.checklist` run creates a NEW file (never overwrites existing checklists) + + **CORE PRINCIPLE - Test the Requirements, Not the Implementation**: + Every checklist item MUST evaluate the REQUIREMENTS THEMSELVES for: + - **Completeness**: Are all necessary requirements present? + - **Clarity**: Are requirements unambiguous and specific? + - **Consistency**: Do requirements align with each other? + - **Measurability**: Can requirements be objectively verified? + - **Coverage**: Are all scenarios/edge cases addressed? + + **Category Structure** - Group items by requirement quality dimensions: + - **Requirement Completeness** (Are all necessary requirements documented?) + - **Requirement Clarity** (Are requirements specific and unambiguous?) + - **Requirement Consistency** (Do requirements align without conflicts?) + - **Acceptance Criteria Quality** (Are success criteria measurable?) + - **Scenario Coverage** (Are all flows/cases addressed?) + - **Edge Case Coverage** (Are boundary conditions defined?) + - **Non-Functional Requirements** (Performance, Security, Accessibility, etc. - are they specified?) + - **Dependencies & Assumptions** (Are they documented and validated?) + - **Ambiguities & Conflicts** (What needs clarification?) + + **HOW TO WRITE CHECKLIST ITEMS - "Unit Tests for English"**: + + ❌ **WRONG** (Testing implementation): + - "Verify landing page displays 3 episode cards" + - "Test hover states work on desktop" + - "Confirm logo click navigates home" + + ✅ **CORRECT** (Testing requirements quality): + - "Are the exact number and layout of featured episodes specified?" [Completeness] + - "Is 'prominent display' quantified with specific sizing/positioning?" [Clarity] + - "Are hover state requirements consistent across all interactive elements?" [Consistency] + - "Are keyboard navigation requirements defined for all interactive UI?" [Coverage] + - "Is the fallback behavior specified when logo image fails to load?" [Edge Cases] + - "Are loading states defined for asynchronous episode data?" [Completeness] + - "Does the spec define visual hierarchy for competing UI elements?" [Clarity] + + **ITEM STRUCTURE**: + Each item should follow this pattern: + - Question format asking about requirement quality + - Focus on what's WRITTEN (or not written) in the spec/plan + - Include quality dimension in brackets [Completeness/Clarity/Consistency/etc.] + - Reference spec section `[Spec §X.Y]` when checking existing requirements + - Use `[Gap]` marker when checking for missing requirements + + **EXAMPLES BY QUALITY DIMENSION**: + + Completeness: + - "Are error handling requirements defined for all API failure modes? [Gap]" + - "Are accessibility requirements specified for all interactive elements? [Completeness]" + - "Are mobile breakpoint requirements defined for responsive layouts? [Gap]" + + Clarity: + - "Is 'fast loading' quantified with specific timing thresholds? [Clarity, Spec §NFR-2]" + - "Are 'related episodes' selection criteria explicitly defined? [Clarity, Spec §FR-5]" + - "Is 'prominent' defined with measurable visual properties? [Ambiguity, Spec §FR-4]" + + Consistency: + - "Do navigation requirements align across all pages? [Consistency, Spec §FR-10]" + - "Are card component requirements consistent between landing and detail pages? [Consistency]" + + Coverage: + - "Are requirements defined for zero-state scenarios (no episodes)? [Coverage, Edge Case]" + - "Are concurrent user interaction scenarios addressed? [Coverage, Gap]" + - "Are requirements specified for partial data loading failures? [Coverage, Exception Flow]" + + Measurability: + - "Are visual hierarchy requirements measurable/testable? [Acceptance Criteria, Spec §FR-1]" + - "Can 'balanced visual weight' be objectively verified? [Measurability, Spec §FR-2]" + + **Scenario Classification & Coverage** (Requirements Quality Focus): + - Check if requirements exist for: Primary, Alternate, Exception/Error, Recovery, Non-Functional scenarios + - For each scenario class, ask: "Are [scenario type] requirements complete, clear, and consistent?" + - If scenario class missing: "Are [scenario type] requirements intentionally excluded or missing? [Gap]" + - Include resilience/rollback when state mutation occurs: "Are rollback requirements defined for migration failures? [Gap]" + + **Traceability Requirements**: + - MINIMUM: ≥80% of items MUST include at least one traceability reference + - Each item should reference: spec section `[Spec §X.Y]`, or use markers: `[Gap]`, `[Ambiguity]`, `[Conflict]`, `[Assumption]` + - If no ID system exists: "Is a requirement & acceptance criteria ID scheme established? [Traceability]" + + **Surface & Resolve Issues** (Requirements Quality Problems): + Ask questions about the requirements themselves: + - Ambiguities: "Is the term 'fast' quantified with specific metrics? [Ambiguity, Spec §NFR-1]" + - Conflicts: "Do navigation requirements conflict between §FR-10 and §FR-10a? [Conflict]" + - Assumptions: "Is the assumption of 'always available podcast API' validated? [Assumption]" + - Dependencies: "Are external podcast API requirements documented? [Dependency, Gap]" + - Missing definitions: "Is 'visual hierarchy' defined with measurable criteria? [Gap]" + + **Content Consolidation**: + - Soft cap: If raw candidate items > 40, prioritize by risk/impact + - Merge near-duplicates checking the same requirement aspect + - If >5 low-impact edge cases, create one item: "Are edge cases X, Y, Z addressed in requirements? [Coverage]" + + **🚫 ABSOLUTELY PROHIBITED** - These make it an implementation test, not a requirements test: + - ❌ Any item starting with "Verify", "Test", "Confirm", "Check" + implementation behavior + - ❌ References to code execution, user actions, system behavior + - ❌ "Displays correctly", "works properly", "functions as expected" + - ❌ "Click", "navigate", "render", "load", "execute" + - ❌ Test cases, test plans, QA procedures + - ❌ Implementation details (frameworks, APIs, algorithms) + + **✅ REQUIRED PATTERNS** - These test requirements quality: + - ✅ "Are [requirement type] defined/specified/documented for [scenario]?" + - ✅ "Is [vague term] quantified/clarified with specific criteria?" + - ✅ "Are requirements consistent between [section A] and [section B]?" + - ✅ "Can [requirement] be objectively measured/verified?" + - ✅ "Are [edge cases/scenarios] addressed in requirements?" + - ✅ "Does the spec define [missing aspect]?" + +6. **Structure Reference**: Generate the checklist following the canonical template in `.specify/templates/checklist-template.md` for title, meta section, category headings, and ID formatting. If template is unavailable, use: H1 title, purpose/created meta lines, `##` category sections containing `- [ ] CHK### ` lines with globally incrementing IDs starting at CHK001. + +7. **Report**: Output full path to created checklist, item count, and remind user that each run creates a new file. Summarize: + - Focus areas selected + - Depth level + - Actor/timing + - Any explicit user-specified must-have items incorporated + +**Important**: Each `/speckit.checklist` command invocation creates a checklist file using short, descriptive names unless file already exists. This allows: + +- Multiple checklists of different types (e.g., `ux.md`, `test.md`, `security.md`) +- Simple, memorable filenames that indicate checklist purpose +- Easy identification and navigation in the `checklists/` folder + +To avoid clutter, use descriptive types and clean up obsolete checklists when done. + +## Example Checklist Types & Sample Items + +**UX Requirements Quality:** `ux.md` + +Sample items (testing the requirements, NOT the implementation): + +- "Are visual hierarchy requirements defined with measurable criteria? [Clarity, Spec §FR-1]" +- "Is the number and positioning of UI elements explicitly specified? [Completeness, Spec §FR-1]" +- "Are interaction state requirements (hover, focus, active) consistently defined? [Consistency]" +- "Are accessibility requirements specified for all interactive elements? [Coverage, Gap]" +- "Is fallback behavior defined when images fail to load? [Edge Case, Gap]" +- "Can 'prominent display' be objectively measured? [Measurability, Spec §FR-4]" + +**API Requirements Quality:** `api.md` + +Sample items: + +- "Are error response formats specified for all failure scenarios? [Completeness]" +- "Are rate limiting requirements quantified with specific thresholds? [Clarity]" +- "Are authentication requirements consistent across all endpoints? [Consistency]" +- "Are retry/timeout requirements defined for external dependencies? [Coverage, Gap]" +- "Is versioning strategy documented in requirements? [Gap]" + +**Performance Requirements Quality:** `performance.md` + +Sample items: + +- "Are performance requirements quantified with specific metrics? [Clarity]" +- "Are performance targets defined for all critical user journeys? [Coverage]" +- "Are performance requirements under different load conditions specified? [Completeness]" +- "Can performance requirements be objectively measured? [Measurability]" +- "Are degradation requirements defined for high-load scenarios? [Edge Case, Gap]" + +**Security Requirements Quality:** `security.md` + +Sample items: + +- "Are authentication requirements specified for all protected resources? [Coverage]" +- "Are data protection requirements defined for sensitive information? [Completeness]" +- "Is the threat model documented and requirements aligned to it? [Traceability]" +- "Are security requirements consistent with compliance obligations? [Consistency]" +- "Are security failure/breach response requirements defined? [Gap, Exception Flow]" + +## Anti-Examples: What NOT To Do + +**❌ WRONG - These test implementation, not requirements:** + +```markdown +- [ ] CHK001 - Verify landing page displays 3 episode cards [Spec §FR-001] +- [ ] CHK002 - Test hover states work correctly on desktop [Spec §FR-003] +- [ ] CHK003 - Confirm logo click navigates to home page [Spec §FR-010] +- [ ] CHK004 - Check that related episodes section shows 3-5 items [Spec §FR-005] +``` + +**✅ CORRECT - These test requirements quality:** + +```markdown +- [ ] CHK001 - Are the number and layout of featured episodes explicitly specified? [Completeness, Spec §FR-001] +- [ ] CHK002 - Are hover state requirements consistently defined for all interactive elements? [Consistency, Spec §FR-003] +- [ ] CHK003 - Are navigation requirements clear for all clickable brand elements? [Clarity, Spec §FR-010] +- [ ] CHK004 - Is the selection criteria for related episodes documented? [Gap, Spec §FR-005] +- [ ] CHK005 - Are loading state requirements defined for asynchronous episode data? [Gap] +- [ ] CHK006 - Can "visual hierarchy" requirements be objectively measured? [Measurability, Spec §FR-001] +``` + +**Key Differences:** + +- Wrong: Tests if the system works correctly +- Correct: Tests if the requirements are written correctly +- Wrong: Verification of behavior +- Correct: Validation of requirement quality +- Wrong: "Does it do X?" +- Correct: "Is X clearly specified?" diff --git a/.claude/commands/speckit.clarify.md b/.claude/commands/speckit.clarify.md new file mode 100644 index 000000000..6b28dae10 --- /dev/null +++ b/.claude/commands/speckit.clarify.md @@ -0,0 +1,181 @@ +--- +description: Identify underspecified areas in the current feature spec by asking up to 5 highly targeted clarification questions and encoding answers back into the spec. +handoffs: + - label: Build Technical Plan + agent: speckit.plan + prompt: Create a plan for the spec. I am building with... +--- + +## User Input + +```text +$ARGUMENTS +``` + +You **MUST** consider the user input before proceeding (if not empty). + +## Outline + +Goal: Detect and reduce ambiguity or missing decision points in the active feature specification and record the clarifications directly in the spec file. + +Note: This clarification workflow is expected to run (and be completed) BEFORE invoking `/speckit.plan`. If the user explicitly states they are skipping clarification (e.g., exploratory spike), you may proceed, but must warn that downstream rework risk increases. + +Execution steps: + +1. Run `.specify/scripts/bash/check-prerequisites.sh --json --paths-only` from repo root **once** (combined `--json --paths-only` mode / `-Json -PathsOnly`). Parse minimal JSON payload fields: + - `FEATURE_DIR` + - `FEATURE_SPEC` + - (Optionally capture `IMPL_PLAN`, `TASKS` for future chained flows.) + - If JSON parsing fails, abort and instruct user to re-run `/speckit.specify` or verify feature branch environment. + - For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot"). + +2. Load the current spec file. Perform a structured ambiguity & coverage scan using this taxonomy. For each category, mark status: Clear / Partial / Missing. Produce an internal coverage map used for prioritization (do not output raw map unless no questions will be asked). + + Functional Scope & Behavior: + - Core user goals & success criteria + - Explicit out-of-scope declarations + - User roles / personas differentiation + + Domain & Data Model: + - Entities, attributes, relationships + - Identity & uniqueness rules + - Lifecycle/state transitions + - Data volume / scale assumptions + + Interaction & UX Flow: + - Critical user journeys / sequences + - Error/empty/loading states + - Accessibility or localization notes + + Non-Functional Quality Attributes: + - Performance (latency, throughput targets) + - Scalability (horizontal/vertical, limits) + - Reliability & availability (uptime, recovery expectations) + - Observability (logging, metrics, tracing signals) + - Security & privacy (authN/Z, data protection, threat assumptions) + - Compliance / regulatory constraints (if any) + + Integration & External Dependencies: + - External services/APIs and failure modes + - Data import/export formats + - Protocol/versioning assumptions + + Edge Cases & Failure Handling: + - Negative scenarios + - Rate limiting / throttling + - Conflict resolution (e.g., concurrent edits) + + Constraints & Tradeoffs: + - Technical constraints (language, storage, hosting) + - Explicit tradeoffs or rejected alternatives + + Terminology & Consistency: + - Canonical glossary terms + - Avoided synonyms / deprecated terms + + Completion Signals: + - Acceptance criteria testability + - Measurable Definition of Done style indicators + + Misc / Placeholders: + - TODO markers / unresolved decisions + - Ambiguous adjectives ("robust", "intuitive") lacking quantification + + For each category with Partial or Missing status, add a candidate question opportunity unless: + - Clarification would not materially change implementation or validation strategy + - Information is better deferred to planning phase (note internally) + +3. Generate (internally) a prioritized queue of candidate clarification questions (maximum 5). Do NOT output them all at once. Apply these constraints: + - Maximum of 10 total questions across the whole session. + - Each question must be answerable with EITHER: + - A short multiple‑choice selection (2–5 distinct, mutually exclusive options), OR + - A one-word / short‑phrase answer (explicitly constrain: "Answer in <=5 words"). + - Only include questions whose answers materially impact architecture, data modeling, task decomposition, test design, UX behavior, operational readiness, or compliance validation. + - Ensure category coverage balance: attempt to cover the highest impact unresolved categories first; avoid asking two low-impact questions when a single high-impact area (e.g., security posture) is unresolved. + - Exclude questions already answered, trivial stylistic preferences, or plan-level execution details (unless blocking correctness). + - Favor clarifications that reduce downstream rework risk or prevent misaligned acceptance tests. + - If more than 5 categories remain unresolved, select the top 5 by (Impact * Uncertainty) heuristic. + +4. Sequential questioning loop (interactive): + - Present EXACTLY ONE question at a time. + - For multiple‑choice questions: + - **Analyze all options** and determine the **most suitable option** based on: + - Best practices for the project type + - Common patterns in similar implementations + - Risk reduction (security, performance, maintainability) + - Alignment with any explicit project goals or constraints visible in the spec + - Present your **recommended option prominently** at the top with clear reasoning (1-2 sentences explaining why this is the best choice). + - Format as: `**Recommended:** Option [X] - ` + - Then render all options as a Markdown table: + + | Option | Description | + |--------|-------------| + | A |