Skip to content

feat(audit): /pre-apply-audit + /polish-audit skills + agents (F125, Theme 3 PR 1 of 2)#322

Draft
mitwilli-create wants to merge 1 commit into
mainfrom
feat/audit-skills-pre-apply-polish-2026-05-29
Draft

feat(audit): /pre-apply-audit + /polish-audit skills + agents (F125, Theme 3 PR 1 of 2)#322
mitwilli-create wants to merge 1 commit into
mainfrom
feat/audit-skills-pre-apply-polish-2026-05-29

Conversation

@mitwilli-create
Copy link
Copy Markdown
Owner

Summary

Theme 3 of the 2026-05-29 task-audit chain — the F125 net-new asks. Ships two diagnostic skills with full agent backing:

  • /pre-apply-audit — calibrates the 3 sub-checks (HM match 50% / ATS 30% / voice retention 20%) composed by lib/pre-apply-orchestrator.mjs. 5 deterministic dimensions + Sonnet 4.6 narrative synthesis. ~$0.05-0.15 per run.
  • /polish-audit — interrogates the polish loop: convergence health, rounds-to-converge, ATS posture, voice retention, spend, optional AI-detection bands. Same cost profile.

Both are diagnostic only — no orchestrator code mutation, no apply-pack writes. Cross-fork-leak hardened: [PERSONAL — DO NOT PUBLISH] frontmatter, hash-only citations, assertNoInlineQuotesFromSensitivePaths() guard at render time, audit outputs gitignored.

Live validation

$ node scripts/agents/pre-apply-audit.mjs --row 48
[D1] hm-intel coverage: 100% (21/21) — PASS
[D2] subcheck availability: PASS (sample=1)
[D3] readiness distribution: CAVEATS
[D4] threshold calibration: CAVEATS
[D5] spend health: SKIPPED
[synthesis] ok (Sonnet 4.6, $0.024, 21s)

Synthesis surfaced a real miscalibration: HM sub-check observed 0% agreement with overall_status on the sampled trace, vs the orchestrator's expected ≥75% per the hard-rule design. This is the kind of signal the skill was scoped to produce.

polish-audit --row 48 ran cleanly: $0.024 spend, D1 correctly FAIL (no polish-orchestrator-summary.json on row 48 yet — Mitchell hasn't polished this row).

Locked decisions (interview 2026-05-29)

# Decision
Q1 scope Build all 6 audit skills (PR 2 follows with corpus / scan / batch / eval-ux)
Q2 depth Full agent scripts
Q3 PR strategy F125-first split (this PR)
Q4 validation Live invocation on one row

Memory drift surfaced + fixed

~/.claude/projects/.../memory/project_eval_ux_audit_prompt.md (dated 2026-05-26) claimed all 6 audit skills (/corpus-audit, /scan-audit, /batch-audit, /eval-ux-audit, /pre-apply-audit, /polish-audit) existed at .claude/skills/<name>/SKILL.md. None did. Classic documented-but-unbuilt pattern. This PR ships the first 2; PR 2 ships the remaining 4.

Bucket-B classification

Locked Q4 routes anything reading personal-data corpora (cv.md, applications.md, hm-intel, apply-pack) to bucket-B DRAFT + manual review even when the writers ARE committable code. These agents only READ from those surfaces (the OUTPUTS land in gitignored .claude/audit/<DATE>/), but the dependency graph makes them sensitive enough to keep in bucket-B per the locked rule.

Test plan

  • node scripts/agents/pre-apply-audit.mjs --help prints usage
  • node scripts/agents/pre-apply-audit.mjs --dry-run reports packs=19/19, no synthesis, no spend
  • node scripts/agents/pre-apply-audit.mjs --row 48 produces a synthesis section in clean markdown (not raw JSON)
  • node scripts/agents/polish-audit.mjs --row 48 runs without error
  • git check-ignore .claude/audit/2026-05-29/pre-apply-audit-2026-05-29.md returns the .gitignore:62 rule
  • /pre-apply-audit and /polish-audit show up in Skill tool discovery (verified — both appear in the available-skills list)

🤖 Generated with Claude Code

…Theme 3)

Two diagnostic audit skills (the F125 net-new asks from the 2026-05-29
task-audit) ship with full agent backing. Both are cross-fork-leak-safe
and diagnostic-only (no orchestrator code mutation, no apply-pack writes).

scripts/agents/pre-apply-audit.mjs (~480 LOC)
  Audits the /api/pre-apply-check pipeline across 5 deterministic dimensions:
    D1 hm-intel coverage     — apply-now rows with hm-intel files
    D2 subcheck availability — composePreApplyCheck status distribution
    D3 readiness distribution — H/M/L band histogram
    D4 threshold calibration — observed sub-check agreement vs CHECK_WEIGHTS
    D5 spend health          — pre-apply-spend ledger trends
  Sonnet 4.6 narrative synthesis on top, ~$0.05-0.15 typical. Opt-in
  --council adds 7-model adjudication.

scripts/agents/polish-audit.mjs (~500 LOC)
  Audits the apply-pack polish loop across 6 dimensions:
    D1 convergence health    — % packs all-converged vs has-abandoned
    D2 rounds-to-converge    — distribution; flags packs at POLISH_MAX_ROUNDS
    D3 ATS impact            — absolute keyword score on polished CV/CL
    D4 voice-retention pressure — % packs triggering voice-rule warnings
    D5 spend distribution    — per-pack polish cost p50/p95/max
    D6 (opt-in) AI-detection bands via Pangram
  Same synthesis + --council pattern.

.claude/skills/{pre-apply-audit,polish-audit}/SKILL.md
  Slash-command discovery surfaces. Describe trigger phrases, dimensions,
  env knobs, output contract, what each skill does NOT do.

.gitignore
  Adds .claude/audit/*/pre-apply-audit-*.md +
  .claude/audit/*/polish-audit-*.md +
  data/{pre-apply-audit,polish-audit}-spend.jsonl so the audit outputs
  (which carry [PERSONAL — DO NOT PUBLISH] frontmatter) cannot leak via
  accidental git add.

Cross-fork-leak hardening:
  - Frontmatter: classification: "[PERSONAL — DO NOT PUBLISH]"
  - assertNoInlineQuotesFromSensitivePaths() guards rendered reports
  - SENSITIVE_PATH_PATTERNS covers cv.md / applications.md / hm-intel /
    apply-pack / role-enrichment
  - Hash-only citations (sha256:hex12) for any sensitive-path reference

Live validation:
  - node scripts/agents/pre-apply-audit.mjs --row 48 → $0.024 spend,
    Sonnet 4.6 produced substantive synthesis surfacing real miscalibration
    (HM sub-check 0% agreement with overall_status, expected ≥75% per
    orchestrator hard rule).
  - node scripts/agents/polish-audit.mjs --row 48 → $0.024 spend,
    D1 FAIL correctly identifying no polish-orchestrator-summary on row 48.
  - Both --dry-run smoke tests pass; node --check passes.

Decisions locked via interview 2026-05-29:
  Q1 scope    → build all 6 documented skills (this PR ships F125 first;
                next PR ships /corpus-audit /scan-audit /batch-audit
                /eval-ux-audit per Theme 3 split).
  Q2 depth    → full agent scripts.
  Q3 strategy → F125-first PR split; this is PR 1 of 2.
  Q4 validation → live invocation on one row.

Memory drift: project_eval_ux_audit_prompt.md claimed all 6 skills already
existed on disk; verification showed none did. Doc-but-unbuilt pattern.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant