Generated 2026-03-09 from 52 audit reports spanning documentation, testing, security, performance, code quality, architecture, DevOps, frontend, and strategic analysis. 45 of ~68 unique recommendations were already implemented. This document contains the remaining actionable items.
Each recommendation includes enough context to implement without reading the original audit report. Work through tiers in order. Mark items done by changing [ ] to [x] and noting the commit hash.
These directly improve code quality, reliability, or correctness. All are worth the effort regardless of size.
- File:
src/claude.js - Problem: When all retry attempts are exhausted,
runPrompt()returnsoutput: ''(empty string) instead of the last attempt's actual output. This loses diagnostic information — the NIGHTYTIDY-REPORT.md for a failed step shows nothing useful. - Fix: Before the final return after the retry loop, preserve
lastResult.outputinstead of returning empty string. The result object already has theoutputfield — just stop overwriting it with''. - Impact: Failed steps become debuggable from the report instead of requiring log diving.
- Tests: Update
claude.test.js— the "all retries exhausted" test should assert that output contains the last attempt's stdout, not empty string.
- File:
package.json,vitest.config.js, potentially test files - Problem: Vitest v2 depends on
vitewhich depends onesbuild <=0.24.2(GHSA-67mh-4wv8-2f99). This produces 6 moderate npm audit findings. Every audit report and everynpm auditrun flags it. Dev-only but noisy. - Fix:
npm install vitest@latestand fix any breaking changes. The customstrip-shebangplugin invitest.config.jsmay need adjustment. Run the full test suite andnpm run test:cito verify coverage thresholds still pass. - Impact: Clean
npm audit. Removes recurring noise from every security review. - Tests: All existing tests must pass. Coverage thresholds (90% statements, 80% branches, 80% functions) must hold.
-
File:
src/cli.js -
Problem:
run()is ~450 lines and the single largest function in the codebase. It handles the full lifecycle: argument parsing, pre-checks, step selection, git setup, execution, abort handling, reporting, and merging. Shared mutable state (spinner,runStarted,tagName,runBranch,originalBranch) makes it hard to test individual phases. -
Fix: Extract three phase functions:
setupGitAndPreChecks(projectDir, git, options)— lock acquisition through pre-check completionexecuteRunFlow(steps, git, options, callbacks)— step execution loop with spinner managementfinalizeRun(results, git, metadata)— report generation, merge, cleanup
Pass shared state as a context object rather than relying on closure variables. The
handleAbortedRunfunction already exists as a partial extraction — continue this pattern. -
Impact: Each phase becomes independently testable. The main
run()becomes a readable pipeline of 5-6 function calls. -
Tests: Existing
cli.test.jsandcli-extended.test.jstests (58 total) must continue passing. Add unit tests for each extracted function. -
Constraint: Do NOT change the error contract —
run()must remain the top-level try/catch that catches everything.
- File:
src/dashboard.js - Problem: 8 module-level
letvariables (server,sseClients,currentState,tuiProcess,csrfToken,progressFilePath,urlFilePath,outputBuffer) are scattered throughout the file and mutated by 4+ functions. This makes state transitions hard to reason about and debug. - Fix: Consolidate into a single
dashboardStateobject:Update all reads/writes to uselet dashboardState = { server: null, sseClients: [], currentState: null, tuiProcess: null, csrfToken: null, progressFilePath: null, urlFilePath: null, outputBuffer: '', };
dashboardState.server, etc. Add aresetDashboardState()function for test cleanup. - Impact: Single place to inspect/log all dashboard state. Easier mocking in tests. State transitions become explicit.
- Tests: Update
dashboard.test.jsanddashboard-extended.test.jsmock patterns to work with the state object.
- File:
test/helpers/mocks.js(add to existing file) - Problem: The identical logger mock shape (
{ info: vi.fn(), warn: vi.fn(), error: vi.fn(), debug: vi.fn(), initLogger: vi.fn() }) is copy-pasted across 16+ test files. When the logger API changes (e.g., adding a new level), every test file needs updating. - Fix: Add to
test/helpers/mocks.js:Each test file still needs its ownexport function createLoggerMock() { return { initLogger: vi.fn(), info: vi.fn(), warn: vi.fn(), error: vi.fn(), debug: vi.fn(), }; }
vi.mock('../src/logger.js', ...)call at the top level (Vitest requirement — cannot be shared), but the factory ensures the shape is consistent. Update test files to usecreateLoggerMock()inside theirvi.mockfactory functions. - Impact: DRY. Single source of truth for mock logger shape. Less breakage surface when logger API evolves.
- Tests: All existing tests must pass unchanged in behavior.
- File:
src/dashboard-standalone.js - Problem: When a client connects to the standalone dashboard before the first poll interval fires (within ~500ms of server start),
currentStateisnulland the client sees a blank dashboard. The progress JSON file already exists (written by--init-runbefore spawning the dashboard process). - Fix: In the
server.listen()callback, add a synchronous read of the progress file before the firstsetIntervalpoll:server.listen(port, '127.0.0.1', () => { // Populate state immediately so early-connecting clients get data try { const raw = readFileSync(progressFile, 'utf8'); currentState = JSON.parse(raw); lastRawJson = raw; } catch { /* file may not exist yet in edge cases */ } // ... existing setInterval poll });
- Impact: Eliminates the blank-dashboard flash on connect. Zero-risk change.
- Tests: Add a test in
dashboard-extended.test.jsor similar verifying that an SSE client connecting immediately after server start receives state data.
- File:
.github/workflows/ci.yml - Problem: No automated secret detection in git history. If someone accidentally commits an API key or credential, nothing catches it until a human reviews.
- Fix: Add a job to
ci.yml:secret-scan: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 with: fetch-depth: 0 - uses: gitleaks/gitleaks-action@v2 env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- Impact: Automated safety net. Catches credential leaks on every push/PR.
- Tests: No code tests needed. Verify the CI job runs green on a clean repo.
- File:
test/checks-extended.test.js(or newtest/checks-diskspace.test.js) - Problem:
checkDiskSpace()insrc/checks.jsis the deepest-nested function in the codebase (4-5 levels: Windows PowerShell path, wmic fallback path, Unix df path). It has no direct unit tests — only the integration-levelrunPreCheckstests cover it indirectly. Refactoring this function without characterization tests is risky. - Fix: Write tests that mock
child_process.execSyncto return known stdout for each platform path:- PowerShell
Get-PSDrivehappy path (sufficient space) - PowerShell
Get-PSDrivelow space (triggers warning) - PowerShell
Get-PSDrivecritical space (throws) - PowerShell failure → wmic fallback path
- Unix
dfhappy path - Non-numeric output handling (graceful skip)
- PowerShell
- Impact: Enables safe refactoring of the deepest-nested, most platform-specific code. Tests document the actual behavior.
- Tests: These ARE the tests. Aim for 6-8 test cases covering all branches.
- File:
package.json - Problem:
git.test.js,git-extended.test.js, andintegration.test.jscreate real git repos in temp directories, adding ~30 seconds to every test run. During development, this slows the feedback loop. - Fix: Add to
package.jsonscripts:Keep"test:fast": "vitest run --exclude='**/git.test.js' --exclude='**/git-extended.test.js' --exclude='**/integration.test.js' --exclude='**/integration-extended.test.js'"
npm testrunning everything.test:fastis for rapid iteration. - Impact: Sub-5-second unit test feedback loop during development. Full suite still runs in CI and before commits.
- Tests: Verify
test:fastexcludes the right files and all remaining tests pass.
Real improvements that won't cause problems if deferred.
- File:
gui/server.js - Problem:
handleKillProcessreadsidfrom the request body but doesn't validate its presence. Ifidis undefined,processes.get(undefined)returnsundefined, and the function returns{ ok: true }(the "already dead" path). Silent success on bad input. - Fix: Add
if (!id) return sendJson(res, 400, { ok: false, error: 'No process id provided' });before theprocesses.get(id)call. - Impact: Correct HTTP semantics. Prevents silent bugs in GUI client code.
- Tests: Add a test case in
gui-server.test.jsfor POST/api/kill-processwith empty body.
- File:
src/env.js,src/claude.js - Problem:
cleanEnv()uses a blocklist approach — it copies all environment variables and deletes specific ones (CLAUDECODE,CLAUDE_CODE_*). This forwards potentially sensitive env vars to the Claude Code subprocess. An allowlist is more defensible. - Fix: Change
cleanEnv()to build a new env object from a whitelist:const ALLOWED_ENV_KEYS = ['PATH', 'HOME', 'USERPROFILE', 'TEMP', 'TMP', 'NIGHTYTIDY_LOG_LEVEL', 'SHELL', 'TERM', 'LANG', 'SystemRoot', 'APPDATA', 'LOCALAPPDATA']; export function cleanEnv() { const env = {}; for (const key of ALLOWED_ENV_KEYS) { if (process.env[key] !== undefined) env[key] = process.env[key]; } return env; }
- Impact: Defense-in-depth. Prevents accidental leakage of
AWS_SECRET_ACCESS_KEY, database URLs, etc. to AI subprocess. - Risk: May break Claude Code if it needs specific env vars not in the allowlist. Test thoroughly. Consider a hybrid approach: allowlist + log any unknown vars that were excluded.
- Tests: Update
claude.test.jsenv-related tests. Add tests verifying sensitive vars are excluded.
- Files:
gui/resources/index.html,src/dashboard-html.js - Problem: Some ARIA attributes were added (progressbar role, heading fixes) but several remain:
- Error message elements (
.error-msg) lackrole="alert" - Loading spinner (
#setup-loading) lacksrole="status" - Status badge (
#running-status-badge) lacksrole="status"andaria-live="polite" - Screen sections lack
aria-labelledbypointing to their headings - Dashboard reconnecting banner lacks
role="alert"
- Error message elements (
- Fix: Add the missing ARIA attributes to each element. All are simple attribute additions.
- Impact: Screen reader users get proper announcements for errors, loading states, and status changes.
- Tests: No functional tests needed. Consider a snapshot test or manual screen reader verification.
- File:
src/cli.js - Problem:
@inquirer/checkboxis statically imported at the top ofcli.jsbut only used inselectSteps()(interactive mode). Non-interactive commands (--list,--setup,--init-run,--run-step,--finish-run) pay the import cost unnecessarily. - Fix: Change the static import to a dynamic import inside
selectSteps():// Remove: import checkbox from '@inquirer/checkbox'; // In selectSteps(): const { default: checkbox } = await import('@inquirer/checkbox');
- Impact: ~50-100ms faster startup for non-interactive commands.
- Risk: Low.
selectSteps()is already async. Test that interactive mode still works. - Tests: Existing
cli-extended.test.jstests for--list,--steps, etc. should pass. Manual test of interactive selection.
- File:
gui/resources/styles.css - Problem: No visual press feedback on any button in the GUI. Buttons have hover states but no active states.
- Fix: Add:
.btn:active, .btn-secondary:active, .link-btn:active { transform: scale(0.98); opacity: 0.85; } .stop-btn:active { transform: scale(0.98); opacity: 0.85; }
- Impact: Tactile feedback. Users feel the click registered.
- Tests: Visual verification only.
- File:
src/claude.js - Problem:
runOnce()contains duplicated spawn-and-watch logic for the child process (normal path and potential fallback). Extracting awatchChild(child, timeoutMs, signal)helper would reduce the function size and clarify intent. - Fix: Extract the stdout/stderr collection, timeout handling, close event listener, and error event listener into a helper that returns a Promise resolving to
{ code, stdout, stderr, timedOut }. Call it fromrunOnce(). - Impact: Smaller, more readable function. Easier to add future spawn options.
- Constraint: Do NOT change the error contract —
runPrompt()must still never throw and must return{ success, output, error, exitCode, duration, attempts }. - Tests: All
claude.test.jstests (25) must pass.
These are product roadmap items, not code quality fixes. They add new functionality for end users. Sorted by product impact.
- Impact: Critical for distribution. Cannot acquire users without
npm install -g nightytidy. The package.json is already configured withbin,name, anddescription. Needs arelease.ymlGitHub Actions workflow triggered onv*tags that runsnpm publish. - Prerequisite: None. Ready to do now.
- Impact: High. A 4-8 hour run is too big a commitment for first-time users. Define named presets: "Quick Clean" (8 high-impact steps, ~1h), "Deep Dive" (all 33, ~8h), "Security Focus" (steps 8,9,21,22), "Test Hardening" (steps 2-6). Add
--preset quickflag. - Prerequisite: None. Can use existing
--stepsmechanism internally.
- Impact: Medium. Standard expectation for CLI tools. Support per-project defaults: preferred steps, timeout, preset, log level. Load from project root.
- Prerequisite: None. Documented as known tech debt in CLAUDE.md.
- Impact: Medium. Add a
categoryfield tomanifest.jsonentries (e.g., "testing", "security", "performance", "architecture", "documentation", "devops", "ux"). Enable--category securityflag and grouped display in interactive selection and--listoutput. - Prerequisite: None.
manifest.jsonandloader.jsare straightforward to extend.
- Impact: Medium. When a run is Ctrl+C'd or a step fails, users must re-run all completed steps. A
--resumeflag that readsnightytidy-run-state.jsonand skips already-completed steps saves hours. The orchestrator mode already has this data structure — interactive mode just needs to leverage it. - Prerequisite: Orchestrator state file infrastructure (already exists).
- Impact: Medium. Persist run results to
nightytidy-history.json(step pass/fail, duration, date). Enable a--historycommand showing trend data. Powers future features: health scoring, step recommendations, time estimation. - Prerequisite: None.
- Impact: Medium. A reusable workflow or action that runs NightyTidy on a schedule (weekly cron) or on PR merge. Opens team adoption. The orchestrator mode's JSON API makes this straightforward.
- Prerequisite: npm publish (F1).
- Impact: Medium. Allow loading prompts from a user-specified directory (e.g.,
.nightytidy/prompts/) in addition to the built-in 33. Enables domain-specific improvements (React, Django, Rust) and community contribution. - Prerequisite: Configuration file (F3) for specifying custom prompt directory.
- Impact: Medium. Show estimated Claude Code API cost before a run starts. Use historical per-step durations from run history, multiply by approximate token rate. "Estimated cost: $15-25 for 12 steps (~2h)."
- Prerequisite: Run history (F6).
- Impact: Medium. Capture
git diffafter each step and include in NIGHTYTIDY-REPORT.md. Shows exactly what each step changed. Makes the report 10x more useful for review. - Prerequisite: None.
- Impact: Low. Extend notifications beyond desktop (which may be missed for overnight runs). Add a
--webhook-urlflag or config option that POSTs run results to a webhook endpoint. - Prerequisite: Configuration file (F3).
- Impact: Low (future platform play). Expose NightyTidy's step library and run capabilities as MCP tools. Any AI assistant (Claude, Codex, Cline) could discover and invoke NightyTidy.
- Prerequisite: npm publish (F1), stable API.