NightyTidy — Audit Recommendations Backlog

Generated 2026-03-09 from 52 audit reports spanning documentation, testing, security, performance, code quality, architecture, DevOps, frontend, and strategic analysis. 45 of ~68 unique recommendations were already implemented. This document contains the remaining actionable items.

How to Use This Document

Each recommendation includes enough context to implement without reading the original audit report. Work through tiers in order. Mark items done by changing [ ] to [x] and noting the commit hash.

Tier 1: High Codebase Impact

These directly improve code quality, reliability, or correctness. All are worth the effort regardless of size.

1. Track last-attempt output in `runPrompt()` on retry exhaustion

File: src/claude.js
Problem: When all retry attempts are exhausted, runPrompt() returns output: '' (empty string) instead of the last attempt's actual output. This loses diagnostic information — the NIGHTYTIDY-REPORT.md for a failed step shows nothing useful.
Fix: Before the final return after the retry loop, preserve lastResult.output instead of returning empty string. The result object already has the output field — just stop overwriting it with ''.
Impact: Failed steps become debuggable from the report instead of requiring log diving.
Tests: Update claude.test.js — the "all retries exhausted" test should assert that output contains the last attempt's stdout, not empty string.

2. Upgrade Vitest to v3 or v4

File: package.json, vitest.config.js, potentially test files
Problem: Vitest v2 depends on vite which depends on esbuild <=0.24.2 (GHSA-67mh-4wv8-2f99). This produces 6 moderate npm audit findings. Every audit report and every npm audit run flags it. Dev-only but noisy.
Fix: npm install vitest@latest and fix any breaking changes. The custom strip-shebang plugin in vitest.config.js may need adjustment. Run the full test suite and npm run test:ci to verify coverage thresholds still pass.
Impact: Clean npm audit. Removes recurring noise from every security review.
Tests: All existing tests must pass. Coverage thresholds (90% statements, 80% branches, 80% functions) must hold.

3. Continue decomposing `cli.js` `run()`

File: src/cli.js
Problem: run() is ~450 lines and the single largest function in the codebase. It handles the full lifecycle: argument parsing, pre-checks, step selection, git setup, execution, abort handling, reporting, and merging. Shared mutable state (spinner, runStarted, tagName, runBranch, originalBranch) makes it hard to test individual phases.
Fix: Extract three phase functions:
- setupGitAndPreChecks(projectDir, git, options) — lock acquisition through pre-check completion
- executeRunFlow(steps, git, options, callbacks) — step execution loop with spinner management
- finalizeRun(results, git, metadata) — report generation, merge, cleanup
Pass shared state as a context object rather than relying on closure variables. The handleAbortedRun function already exists as a partial extraction — continue this pattern.
Impact: Each phase becomes independently testable. The main run() becomes a readable pipeline of 5-6 function calls.
Tests: Existing cli.test.js and cli-extended.test.js tests (58 total) must continue passing. Add unit tests for each extracted function.
Constraint: Do NOT change the error contract — run() must remain the top-level try/catch that catches everything.

4. Encapsulate `dashboard.js` module state

File: src/dashboard.js
Problem: 8 module-level let variables (server, sseClients, currentState, tuiProcess, csrfToken, progressFilePath, urlFilePath, outputBuffer) are scattered throughout the file and mutated by 4+ functions. This makes state transitions hard to reason about and debug.

Fix: Consolidate into a single dashboardState object:

let dashboardState = {
  server: null,
  sseClients: [],
  currentState: null,
  tuiProcess: null,
  csrfToken: null,
  progressFilePath: null,
  urlFilePath: null,
  outputBuffer: '',
};

Update all reads/writes to use dashboardState.server, etc. Add a resetDashboardState() function for test cleanup.

Impact: Single place to inspect/log all dashboard state. Easier mocking in tests. State transitions become explicit.
Tests: Update dashboard.test.js and dashboard-extended.test.js mock patterns to work with the state object.

5. Extract `createLoggerMock()` helper for tests

File: test/helpers/mocks.js (add to existing file)
Problem: The identical logger mock shape ({ info: vi.fn(), warn: vi.fn(), error: vi.fn(), debug: vi.fn(), initLogger: vi.fn() }) is copy-pasted across 16+ test files. When the logger API changes (e.g., adding a new level), every test file needs updating.
Fix: Add to test/helpers/mocks.js:
```
export function createLoggerMock() {
  return {
    initLogger: vi.fn(),
    info: vi.fn(),
    warn: vi.fn(),
    error: vi.fn(),
    debug: vi.fn(),
  };
}
```
Each test file still needs its own vi.mock('../src/logger.js', ...) call at the top level (Vitest requirement — cannot be shared), but the factory ensures the shape is consistent. Update test files to use createLoggerMock() inside their vi.mock factory functions.
Impact: DRY. Single source of truth for mock logger shape. Less breakage surface when logger API evolves.
Tests: All existing tests must pass unchanged in behavior.

6. Read progress JSON synchronously on `dashboard-standalone` startup

File: src/dashboard-standalone.js
Problem: When a client connects to the standalone dashboard before the first poll interval fires (within ~500ms of server start), currentState is null and the client sees a blank dashboard. The progress JSON file already exists (written by --init-run before spawning the dashboard process).

Fix: In the server.listen() callback, add a synchronous read of the progress file before the first setInterval poll:

server.listen(port, '127.0.0.1', () => {
  // Populate state immediately so early-connecting clients get data
  try {
    const raw = readFileSync(progressFile, 'utf8');
    currentState = JSON.parse(raw);
    lastRawJson = raw;
  } catch { /* file may not exist yet in edge cases */ }
  // ... existing setInterval poll
});

Impact: Eliminates the blank-dashboard flash on connect. Zero-risk change.
Tests: Add a test in dashboard-extended.test.js or similar verifying that an SSE client connecting immediately after server start receives state data.

7. Add Gitleaks to CI

File: .github/workflows/ci.yml
Problem: No automated secret detection in git history. If someone accidentally commits an API key or credential, nothing catches it until a human reviews.

Fix: Add a job to ci.yml:

secret-scan:
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
      with:
        fetch-depth: 0
    - uses: gitleaks/gitleaks-action@v2
      env:
        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

Impact: Automated safety net. Catches credential leaks on every push/PR.
Tests: No code tests needed. Verify the CI job runs green on a clean repo.

8. Add characterization tests for `checkDiskSpace()`

File: test/checks-extended.test.js (or new test/checks-diskspace.test.js)
Problem: checkDiskSpace() in src/checks.js is the deepest-nested function in the codebase (4-5 levels: Windows PowerShell path, wmic fallback path, Unix df path). It has no direct unit tests — only the integration-level runPreChecks tests cover it indirectly. Refactoring this function without characterization tests is risky.
Fix: Write tests that mock child_process.execSync to return known stdout for each platform path:
- PowerShell Get-PSDrive happy path (sufficient space)
- PowerShell Get-PSDrive low space (triggers warning)
- PowerShell Get-PSDrive critical space (throws)
- PowerShell failure → wmic fallback path
- Unix df happy path
- Non-numeric output handling (graceful skip)
Impact: Enables safe refactoring of the deepest-nested, most platform-specific code. Tests document the actual behavior.
Tests: These ARE the tests. Aim for 6-8 test cases covering all branches.

9. Split integration tests into `test:fast`

File: package.json
Problem: git.test.js, git-extended.test.js, and integration.test.js create real git repos in temp directories, adding ~30 seconds to every test run. During development, this slows the feedback loop.

Fix: Add to package.json scripts:

"test:fast": "vitest run --exclude='**/git.test.js' --exclude='**/git-extended.test.js' --exclude='**/integration.test.js' --exclude='**/integration-extended.test.js'"

Keep npm test running everything. test:fast is for rapid iteration.

Impact: Sub-5-second unit test feedback loop during development. Full suite still runs in CI and before commits.
Tests: Verify test:fast excludes the right files and all remaining tests pass.

Tier 2: Medium Codebase Impact

Real improvements that won't cause problems if deferred.

10. Validate `id` parameter in `handleKillProcess`

File: gui/server.js
Problem: handleKillProcess reads id from the request body but doesn't validate its presence. If id is undefined, processes.get(undefined) returns undefined, and the function returns { ok: true } (the "already dead" path). Silent success on bad input.
Fix: Add if (!id) return sendJson(res, 400, { ok: false, error: 'No process id provided' }); before the processes.get(id) call.
Impact: Correct HTTP semantics. Prevents silent bugs in GUI client code.
Tests: Add a test case in gui-server.test.js for POST /api/kill-process with empty body.

11. Allowlist environment variables for Claude subprocess

File: src/env.js, src/claude.js
Problem: cleanEnv() uses a blocklist approach — it copies all environment variables and deletes specific ones (CLAUDECODE, CLAUDE_CODE_*). This forwards potentially sensitive env vars to the Claude Code subprocess. An allowlist is more defensible.

Fix: Change cleanEnv() to build a new env object from a whitelist:

const ALLOWED_ENV_KEYS = ['PATH', 'HOME', 'USERPROFILE', 'TEMP', 'TMP', 'NIGHTYTIDY_LOG_LEVEL', 'SHELL', 'TERM', 'LANG', 'SystemRoot', 'APPDATA', 'LOCALAPPDATA'];
export function cleanEnv() {
  const env = {};
  for (const key of ALLOWED_ENV_KEYS) {
    if (process.env[key] !== undefined) env[key] = process.env[key];
  }
  return env;
}

Impact: Defense-in-depth. Prevents accidental leakage of AWS_SECRET_ACCESS_KEY, database URLs, etc. to AI subprocess.
Risk: May break Claude Code if it needs specific env vars not in the allowlist. Test thoroughly. Consider a hybrid approach: allowlist + log any unknown vars that were excluded.
Tests: Update claude.test.js env-related tests. Add tests verifying sensitive vars are excluded.

12. Complete remaining accessibility fixes

Files: gui/resources/index.html, src/dashboard-html.js
Problem: Some ARIA attributes were added (progressbar role, heading fixes) but several remain:
- Error message elements (.error-msg) lack role="alert"
- Loading spinner (#setup-loading) lacks role="status"
- Status badge (#running-status-badge) lacks role="status" and aria-live="polite"
- Screen sections lack aria-labelledby pointing to their headings
- Dashboard reconnecting banner lacks role="alert"
Fix: Add the missing ARIA attributes to each element. All are simple attribute additions.
Impact: Screen reader users get proper announcements for errors, loading states, and status changes.
Tests: No functional tests needed. Consider a snapshot test or manual screen reader verification.

13. Lazy `import()` for `@inquirer/checkbox`

File: src/cli.js
Problem: @inquirer/checkbox is statically imported at the top of cli.js but only used in selectSteps() (interactive mode). Non-interactive commands (--list, --setup, --init-run, --run-step, --finish-run) pay the import cost unnecessarily.

Fix: Change the static import to a dynamic import inside selectSteps():

// Remove: import checkbox from '@inquirer/checkbox';
// In selectSteps():
const { default: checkbox } = await import('@inquirer/checkbox');

Impact: ~50-100ms faster startup for non-interactive commands.
Risk: Low. selectSteps() is already async. Test that interactive mode still works.
Tests: Existing cli-extended.test.js tests for --list, --steps, etc. should pass. Manual test of interactive selection.

14. Add `:active` state to buttons CSS

File: gui/resources/styles.css
Problem: No visual press feedback on any button in the GUI. Buttons have hover states but no active states.

Fix: Add:

.btn:active, .btn-secondary:active, .link-btn:active {
  transform: scale(0.98);
  opacity: 0.85;
}
.stop-btn:active {
  transform: scale(0.98);
  opacity: 0.85;
}

Impact: Tactile feedback. Users feel the click registered.
Tests: Visual verification only.

15. Decompose `runOnce()` in `claude.js`

File: src/claude.js
Problem: runOnce() contains duplicated spawn-and-watch logic for the child process (normal path and potential fallback). Extracting a watchChild(child, timeoutMs, signal) helper would reduce the function size and clarify intent.
Fix: Extract the stdout/stderr collection, timeout handling, close event listener, and error event listener into a helper that returns a Promise resolving to { code, stdout, stderr, timedOut }. Call it from runOnce().
Impact: Smaller, more readable function. Easier to add future spawn options.
Constraint: Do NOT change the error contract — runPrompt() must still never throw and must return { success, output, error, exitCode, duration, attempts }.
Tests: All claude.test.js tests (25) must pass.

Feature Additions

These are product roadmap items, not code quality fixes. They add new functionality for end users. Sorted by product impact.

F1. Publish to npm registry + publish workflow

Impact: Critical for distribution. Cannot acquire users without npm install -g nightytidy. The package.json is already configured with bin, name, and description. Needs a release.yml GitHub Actions workflow triggered on v* tags that runs npm publish.
Prerequisite: None. Ready to do now.

F2. Quick Sweep presets

Impact: High. A 4-8 hour run is too big a commitment for first-time users. Define named presets: "Quick Clean" (8 high-impact steps, ~1h), "Deep Dive" (all 33, ~8h), "Security Focus" (steps 8,9,21,22), "Test Hardening" (steps 2-6). Add --preset quick flag.
Prerequisite: None. Can use existing --steps mechanism internally.

F3. `.nightytidyrc` configuration file

Impact: Medium. Standard expectation for CLI tools. Support per-project defaults: preferred steps, timeout, preset, log level. Load from project root.
Prerequisite: None. Documented as known tech debt in CLAUDE.md.

F4. Step categories/tags in manifest

Impact: Medium. Add a category field to manifest.json entries (e.g., "testing", "security", "performance", "architecture", "documentation", "devops", "ux"). Enable --category security flag and grouped display in interactive selection and --list output.
Prerequisite: None. manifest.json and loader.js are straightforward to extend.

F5. `--resume` for interrupted runs

Impact: Medium. When a run is Ctrl+C'd or a step fails, users must re-run all completed steps. A --resume flag that reads nightytidy-run-state.json and skips already-completed steps saves hours. The orchestrator mode already has this data structure — interactive mode just needs to leverage it.
Prerequisite: Orchestrator state file infrastructure (already exists).

F6. Run history + trend tracking

Impact: Medium. Persist run results to nightytidy-history.json (step pass/fail, duration, date). Enable a --history command showing trend data. Powers future features: health scoring, step recommendations, time estimation.
Prerequisite: None.

F7. GitHub Actions workflow template

Impact: Medium. A reusable workflow or action that runs NightyTidy on a schedule (weekly cron) or on PR merge. Opens team adoption. The orchestrator mode's JSON API makes this straightforward.
Prerequisite: npm publish (F1).

F8. Custom prompt packs

Impact: Medium. Allow loading prompts from a user-specified directory (e.g., .nightytidy/prompts/) in addition to the built-in 33. Enables domain-specific improvements (React, Django, Rust) and community contribution.
Prerequisite: Configuration file (F3) for specifying custom prompt directory.

F9. Cost estimation before run

Impact: Medium. Show estimated Claude Code API cost before a run starts. Use historical per-step durations from run history, multiply by approximate token rate. "Estimated cost: $15-25 for 12 steps (~2h)."
Prerequisite: Run history (F6).

F10. Per-step diffs in report

Impact: Medium. Capture git diff after each step and include in NIGHTYTIDY-REPORT.md. Shows exactly what each step changed. Makes the report 10x more useful for review.
Prerequisite: None.

F11. Slack/Discord webhook notifications

Impact: Low. Extend notifications beyond desktop (which may be missed for overnight runs). Add a --webhook-url flag or config option that POSTs run results to a webhook endpoint.
Prerequisite: Configuration file (F3).

F12. MCP server

Impact: Low (future platform play). Expose NightyTidy's step library and run capabilities as MCP tools. Any AI assistant (Claude, Codex, Cline) could discover and invoke NightyTidy.
Prerequisite: npm publish (F1), stable API.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NightyTidy — Audit Recommendations Backlog

How to Use This Document

Tier 1: High Codebase Impact

1. Track last-attempt output in `runPrompt()` on retry exhaustion

2. Upgrade Vitest to v3 or v4

3. Continue decomposing `cli.js` `run()`

4. Encapsulate `dashboard.js` module state

5. Extract `createLoggerMock()` helper for tests

6. Read progress JSON synchronously on `dashboard-standalone` startup

7. Add Gitleaks to CI

8. Add characterization tests for `checkDiskSpace()`

9. Split integration tests into `test:fast`

Tier 2: Medium Codebase Impact

10. Validate `id` parameter in `handleKillProcess`

11. Allowlist environment variables for Claude subprocess

12. Complete remaining accessibility fixes

13. Lazy `import()` for `@inquirer/checkbox`

14. Add `:active` state to buttons CSS

15. Decompose `runOnce()` in `claude.js`

Feature Additions

F1. Publish to npm registry + publish workflow

F2. Quick Sweep presets

F3. `.nightytidyrc` configuration file

F4. Step categories/tags in manifest

F5. `--resume` for interrupted runs

F6. Run history + trend tracking

F7. GitHub Actions workflow template

F8. Custom prompt packs

F9. Cost estimation before run

F10. Per-step diffs in report

F11. Slack/Discord webhook notifications

F12. MCP server

FilesExpand file tree

AUDIT-RECOMMENDATIONS.md

Latest commit

History

AUDIT-RECOMMENDATIONS.md

File metadata and controls

NightyTidy — Audit Recommendations Backlog

How to Use This Document

Tier 1: High Codebase Impact

1. Track last-attempt output in runPrompt() on retry exhaustion

2. Upgrade Vitest to v3 or v4

3. Continue decomposing cli.js run()

4. Encapsulate dashboard.js module state

5. Extract createLoggerMock() helper for tests

6. Read progress JSON synchronously on dashboard-standalone startup

7. Add Gitleaks to CI

8. Add characterization tests for checkDiskSpace()

9. Split integration tests into test:fast

Tier 2: Medium Codebase Impact

10. Validate id parameter in handleKillProcess

11. Allowlist environment variables for Claude subprocess

12. Complete remaining accessibility fixes

13. Lazy import() for @inquirer/checkbox

14. Add :active state to buttons CSS

15. Decompose runOnce() in claude.js

Feature Additions

F1. Publish to npm registry + publish workflow

F2. Quick Sweep presets

F3. .nightytidyrc configuration file

F4. Step categories/tags in manifest

F5. --resume for interrupted runs

F6. Run history + trend tracking

F7. GitHub Actions workflow template

F8. Custom prompt packs

F9. Cost estimation before run

F10. Per-step diffs in report

F11. Slack/Discord webhook notifications

F12. MCP server

1. Track last-attempt output in `runPrompt()` on retry exhaustion

3. Continue decomposing `cli.js` `run()`

4. Encapsulate `dashboard.js` module state

5. Extract `createLoggerMock()` helper for tests

6. Read progress JSON synchronously on `dashboard-standalone` startup

8. Add characterization tests for `checkDiskSpace()`

9. Split integration tests into `test:fast`

10. Validate `id` parameter in `handleKillProcess`

13. Lazy `import()` for `@inquirer/checkbox`

14. Add `:active` state to buttons CSS

15. Decompose `runOnce()` in `claude.js`

F3. `.nightytidyrc` configuration file

F5. `--resume` for interrupted runs