feat: split action plan into Refactors + Human Review sections

VibeWriter User · claude · VibeWriter User · commit 8ca433ce9862 · 2026-03-18T07:56:49.000-04:00
The final report's action plan now produces two distinct lists:
1. Recommended Refactors - tiered by priority, automatable without judgment
2. Requires Human Review - features/UI/UX needing human decisions

No item limits - all recommendations included for easy copy-paste.

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;
diff --git a/.claude/memory/report-generation.md b/.claude/memory/report-generation.md
@@ -32,7 +32,8 @@ Assumes CLAUDE.md loaded. Report logic in `src/report.js`, action plan logic in
 - Error, Attempts, Suggestion
 
 ## NightyTidy Action Plan ← Inline, only if generated (headings downgraded from consolidation.js output)
-### Critical / High / Medium / Low
+### Recommended Refactors (Critical / High / Medium / Low tiers)
+### Requires Human Review (features, UI/UX changes needing human judgment)
 
 ## How to Undo This Run
 - Claude Code instruction + git command
diff --git a/src/prompts/specials/consolidation.md b/src/prompts/specials/consolidation.md
@@ -1,14 +1,19 @@
 You just completed a multi-step automated codebase improvement run. Below are the outputs from each step — what was analyzed, changed, and recommended.
 
-Your task is to produce a **consolidated, prioritized action plan** of recommendations that still need to be done.
+Your task is to produce a **consolidated, prioritized action plan** split into two sections:
+1. **Refactors** — improvements the AI can do automatically without human judgment
+2. **Human Review** — features, UI/UX changes, and product decisions requiring human input
 
 ## Instructions
 
 1. Review each step's output to extract actionable recommendations, suggestions, and identified issues.
 2. **Check the current codebase** — read the relevant files to determine which recommendations have ALREADY been implemented by previous steps in this run.
 3. **Deduplicate** — if multiple steps flagged the same issue, consolidate into one recommendation.
-4. **Tier** the remaining (not-yet-implemented) items by importance.
-5. Output the action plan in the exact format below.
+4. **Categorize** each item:
+   - **Refactors**: Code cleanup, bug fixes, security patches, performance improvements, test additions, error handling, architectural improvements — anything that has a clear "right answer" and can be implemented without product decisions.
+   - **Human Review**: New features, UI/UX changes, workflow modifications, user-facing behavior changes, product strategy suggestions — anything that requires understanding user needs or making trade-offs that affect the product direction.
+5. **Prioritize** within each section by importance (Critical → High → Medium → Low).
+6. Output the action plan in the exact format below.
 
 ## Output Format
 
@@ -17,45 +22,65 @@ Your task is to produce a **consolidated, prioritized action plan** of recommend
 
 > Generated from a {N}-step improvement run. Items below have been verified as **not yet implemented** in the current codebase.
 
-## Critical
+## Recommended Refactors
 
-<!-- Security vulnerabilities, data loss risks, breaking bugs, blocking issues -->
+These improvements have clear implementations and can be done automatically in a future run.
 
-### [Short, specific title]
-- **What**: [Concrete action — reference specific files, functions, or patterns]
-- **Value**: [Why this matters — plain language, one sentence]
-- **Impact**: [Which files/modules/areas are affected]
-- **Risk**: [Low / Medium / High — risk of implementing this change, and why]
+### Critical
+<!-- Security vulnerabilities, data loss risks, breaking bugs -->
+(items or "No items at this priority level.")
 
-## High
+### High
+<!-- Reliability, performance, error handling, code quality gaps -->
+(items)
 
-<!-- Reliability, performance, error handling, significant code quality gaps -->
+### Medium
+<!-- Maintainability, test coverage, architectural improvements -->
+(items)
 
-(same item format)
+### Low
+<!-- Polish, style, minor optimizations -->
+(items)
 
-## Medium
+---
 
-<!-- Maintainability, test coverage gaps, refactoring opportunities, minor UX issues -->
+## Requires Human Review
 
-(same item format)
+These suggestions involve product decisions, user experience changes, or feature additions that need human judgment.
 
-## Low
+### [Short, specific title]
+- **What**: [Concrete suggestion — reference specific areas or user flows]
+- **Why**: [The problem this solves or opportunity it creates]
+- **Trade-offs**: [What considerations or decisions are involved]
+- **Effort**: [Small / Medium / Large — rough implementation scope]
 
-<!-- Polish, style improvements, nice-to-haves, minor optimizations -->
+(repeat for each item, ordered by potential value)
 
-(same item format)
+---
 
 ## Summary
 
-[One sentence on overall codebase health. One sentence on the single highest-value next action.]
+[One sentence on overall codebase health. One sentence on the top refactor priority. One sentence on the most valuable human-review item.]
 ```
 
+## Item Formats
+
+**For Refactors** (each item):
+- **[Short, specific title]**: [Concrete action — reference specific files, functions, or patterns]. Value: [Why this matters]. Impact: [Which areas affected]. Risk: [Low/Medium/High].
+
+**For Human Review** (each item):
+### [Short, specific title]
+- **What**: [Concrete suggestion]
+- **Why**: [Problem or opportunity]
+- **Trade-offs**: [Decisions involved]
+- **Effort**: [Small/Medium/Large]
+
 ## Rules
 
 - Do NOT include anything already implemented in the codebase — verify by reading files.
 - Do NOT include vague advice like "add more tests" — be specific about WHAT to test and WHERE.
 - Each recommendation MUST reference specific files, functions, or code patterns.
 - Deduplicate ruthlessly — one item per distinct issue, even if multiple steps found it.
-- Maximum **5 items per tier** (20 items total). Prioritize ruthlessly.
-- If a tier has zero items, include the heading with a note: *No items at this priority level.*
+- Include ALL items — no limits. The human needs the complete list for easy copy-paste.
+- If a section has zero items, include the heading with a note: *No items in this category.*
 - Output ONLY the markdown document. No preamble, no commentary, no code fences wrapping the whole document.
diff --git a/src/prompts/specials/report.md b/src/prompts/specials/report.md
@@ -25,12 +25,20 @@ Instead of technical terms, describe what the change DOES for the person: "I mad
 
 ## Part 2: Action Plan
 
-Review the step outputs provided below to extract actionable recommendations that still need to be done.
+Review the step outputs provided below to extract actionable recommendations that still need to be done. Split them into two categories:
+
+1. **Recommended Refactors** — improvements with clear implementations that can be automated
+2. **Requires Human Review** — features, UI/UX changes, and product decisions needing human input
+
+### Instructions
 
 1. Review each step's output to extract actionable recommendations, suggestions, and identified issues.
 2. **Check the current codebase** — read the relevant files to determine which recommendations have ALREADY been implemented by previous steps in this run.
 3. **Deduplicate** — if multiple steps flagged the same issue, consolidate into one recommendation.
-4. **Tier** the remaining (not-yet-implemented) items by importance.
+4. **Categorize** each item:
+   - **Refactors**: Code cleanup, bug fixes, security patches, performance improvements, test additions, error handling, architectural improvements — anything with a clear "right answer" that can be implemented without product decisions.
+   - **Human Review**: New features, UI/UX changes, workflow modifications, user-facing behavior changes, product strategy suggestions — anything requiring understanding user needs or making trade-offs.
+5. **Prioritize** refactors by importance (Critical → High → Medium → Low). Order human review items by potential value.
 
 Structure the action plan as:
 
@@ -39,33 +47,50 @@ Structure the action plan as:
 
 > Generated from a {N}-step improvement run. Items below have been verified as **not yet implemented** in the current codebase.
 
-### Critical
-<!-- Security vulnerabilities, data loss risks, breaking bugs, blocking issues -->
+### Recommended Refactors
+
+These improvements have clear implementations and can be done automatically in a future run.
+
+#### Critical
+<!-- Security vulnerabilities, data loss risks, breaking bugs -->
 (items or "No items at this priority level.")
 
-### High
-<!-- Reliability, performance, error handling, significant code quality gaps -->
+#### High
+<!-- Reliability, performance, error handling, code quality gaps -->
 (items)
 
-### Medium
-<!-- Maintainability, test coverage gaps, refactoring opportunities, minor UX issues -->
+#### Medium
+<!-- Maintainability, test coverage, architectural improvements -->
 (items)
 
-### Low
-<!-- Polish, style improvements, nice-to-haves, minor optimizations -->
+#### Low
+<!-- Polish, style, minor optimizations -->
 (items)
 
+---
+
+### Requires Human Review
+
+These suggestions involve product decisions, user experience changes, or feature additions that need human judgment.
+
+(items ordered by potential value)
+
+---
+
 ### Summary
-[One sentence on overall codebase health. One sentence on the single highest-value next action.]
+[One sentence on overall codebase health. One sentence on the top refactor priority. One sentence on the most valuable human-review item.]
 ```
 
-Each item uses this format:
+**Refactor item format:**
 - **[Short, specific title]**: [Concrete action — reference specific files, functions, or patterns]. Value: [Why this matters]. Impact: [Which areas affected]. Risk: [Low/Medium/High].
 
+**Human Review item format:**
+- **[Short, specific title]**: [Concrete suggestion]. Why: [Problem or opportunity]. Trade-offs: [Decisions involved]. Effort: [Small/Medium/Large].
+
 Rules:
 - Do NOT include anything already implemented — verify by reading files
 - Be specific — reference files, functions, patterns. No vague advice like "add more tests"
-- Maximum 5 items per tier (20 total)
+- Include ALL items — no limits. The human needs the complete list for easy copy-paste
 - Deduplicate ruthlessly
 
 ## Part 3: Write the Report File
diff --git a/test/consolidation.test.js b/test/consolidation.test.js
@@ -15,7 +15,7 @@ vi.mock('../src/executor.js', () => ({
 }));
 
 vi.mock('../src/prompts/loader.js', () => ({
-  CONSOLIDATION_PROMPT: 'Mock consolidation prompt template with Critical High Medium Low tiers and consolidated, prioritized action plan instructions.',
+  CONSOLIDATION_PROMPT: 'Mock consolidation prompt template with Recommended Refactors (Critical High Medium Low tiers) and Requires Human Review sections for consolidated, prioritized action plan instructions.',
   reloadSteps: vi.fn(),
 }));
 
@@ -87,10 +87,8 @@ describe('buildConsolidationPrompt', () => {
     const prompt = buildConsolidationPrompt(results);
 
     expect(prompt).toContain('consolidated, prioritized action plan');
-    expect(prompt).toContain('Critical');
-    expect(prompt).toContain('High');
-    expect(prompt).toContain('Medium');
-    expect(prompt).toContain('Low');
+    expect(prompt).toContain('Recommended Refactors');
+    expect(prompt).toContain('Human Review');
   });
 
   it('handles null/undefined output gracefully', () => {
@@ -110,30 +108,30 @@ describe('generateActionPlan', () => {
     const mockCost = { costUSD: 0.05, inputTokens: 1000, outputTokens: 500 };
     runPrompt.mockResolvedValue({
       success: true,
-      output: '# NightyTidy Action Plan\n\n## Critical\n\nNo items.',
+      output: '# NightyTidy Action Plan\n\n## Recommended Refactors\n\nNo items.',
       cost: mockCost,
     });
 
     const results = makeResults({ completedCount: 2, failedCount: 0 });
     const { text, cost } = await generateActionPlan(results, '/fake/project', {});
 
     expect(text).toContain('## NightyTidy Action Plan');
-    expect(text).toContain('### Critical');
+    expect(text).toContain('### Recommended Refactors');
     expect(text).toContain('No items.');
     expect(cost).toEqual(mockCost);
   });
 
   it('downgrades heading levels in returned text', async () => {
     runPrompt.mockResolvedValue({
       success: true,
-      output: '# NightyTidy Action Plan\n\n## Critical\n\n### 1. Some item',
+      output: '# NightyTidy Action Plan\n\n## Recommended Refactors\n\n### Critical',
       cost: null,
     });
 
     const results = makeResults({ completedCount: 1, failedCount: 0 });
     const { text } = await generateActionPlan(results, '/fake/project', {});
 
-    expect(text).toBe('## NightyTidy Action Plan\n\n### Critical\n\n#### 1. Some item');
+    expect(text).toBe('## NightyTidy Action Plan\n\n### Recommended Refactors\n\n#### Critical');
   });
 
   it('returns null text when Claude returns failure', async () => {