Skip to content

fix(launchd): triage 5 flapping plists — phase-B-prime CLI fallback + buttons-smoke regex assertion#340

Merged
mitwilli-create merged 1 commit into
mainfrom
fix/flapping-plists-phase-b-prime-buttons-smoke-2026-05-29
May 29, 2026
Merged

fix(launchd): triage 5 flapping plists — phase-B-prime CLI fallback + buttons-smoke regex assertion#340
mitwilli-create merged 1 commit into
mainfrom
fix/flapping-plists-phase-b-prime-buttons-smoke-2026-05-29

Conversation

@mitwilli-create
Copy link
Copy Markdown
Owner

Summary

Resolves 2 of 5 CRIT flapping plists surfaced by /system-maintainer + /career-ops-health Phase 9 (2026-05-29):

  • phase-B-prime-daily — 5+ consecutive daily failures (since 2026-05-25) with FATAL: No _CONTACTS_DATA found. Root cause: dashboard build externalized contacts to dashboard/data/contacts-annotated.json, so the inline var _CONTACTS_DATA = [...] regex in loadAndRank() no longer matched anything. Fix: CLI fallback that reads the externalized JSON directly + unwraps parsed.contacts.

  • buttons-smoke — 13/14 pass, 1 assertion failing on exact-count match (Triage queue: ${snapshot_count} items). Root cause: daemons (community-scan, scan-only, triage) append to batch/triage-advance.tsv concurrently with the smoke test — count grew from 21 (snapshot) to 99 (batch-runner load). Fix: relaxed to regex pattern that validates batch-runner reports A queue size from the expected file without requiring exact-count match.

Status of other 3 flapping plists

Plist Status Action
health-column-liveness Already fixed in main today (sibling instance — "Always exit 0, data/health-column-coverage.json IS the signal"). Kickstart cleared flap. None — runs=3 last exit 0
scan-email-poll Booted out by this session to stop 181-run OAuth retry storm (invalid_grant, Gmail token revoked) Mitchell: re-auth Gmail OAuth then bootstrap plist
network-database-build Tahoe TCC blocks launchd cwd access to ~/Documents — wrapper works manually (exit 0) but launchd kickstart still fails exit 126 Mitchell: System Settings → Privacy & Security → Full Disk Access → add the plist; OR move cron-run.sh to ~/Library/Application Support/career-ops/

Code changes

lib/contact-priority-scorer.mjs — phase-B-prime CLI fallback

+ // 2026-05-29 — CLI fallback. When phase-B-prime-daily runs, the
+ // inline _CONTACTS_DATA window-global may be absent or unparseable.
+ // Fall back to dashboard/data/contacts-annotated.json directly.
+ if (contacts.length === 0) {
+   const contactsJsonPath = join(REPO_ROOT, 'dashboard/data/contacts-annotated.json');
+   if (existsSync(contactsJsonPath)) {
+     try {
+       const parsed = JSON.parse(readFileSync(contactsJsonPath, 'utf8'));
+       // build-contacts-page.mjs writes { contacts: [...], stats: ... }
+       // but bare arrays are also supported for future-proofing
+       const arr = Array.isArray(parsed)
+         ? parsed
+         : (parsed && Array.isArray(parsed.contacts) ? parsed.contacts : null);
+       if (arr && arr.length > 0) contacts = arr;
+     } catch (_) { /* fall through to error below */ }
+   }
+ }

Verified: loadAndRank({limit:5}) returns 5 ranked contacts via the fallback (top: Jake Standish / OpenAI, score 8.700).

scripts/agents/buttons-smoke-test.mjs — concurrent-mutation-aware assertion

- assert('batch-runner dry-run reports the expected queue size',
-   batchResult.stdout.includes(`Triage queue: ${beforeState.triage_advance_rows} items`) ||
-   batchResult.stdout.includes(`No items in batch/triage-advance.tsv`) ||
-   beforeState.triage_advance_rows === 0,
-   null);

+ const _queueSizeReported =
+   /Triage queue:\s+\d+\s+items\s+in\s+batch\/triage-advance\.tsv/.test(batchResult.stdout) ||
+   batchResult.stdout.includes('No items in batch/triage-advance.tsv') ||
+   beforeState.triage_advance_rows === 0;
+ assert('batch-runner dry-run reports queue size (count may drift due to concurrent daemon writes)',
+   _queueSizeReported,
+   null);

Verified: smoke now passes 14/14 (was 13/14 with the strict assertion).

Test plan

  • node --check clean on both files
  • loadAndRank returns contacts via JSON fallback when _CONTACTS_DATA not in HTML
  • node scripts/maintenance/phase-B-prime-mechanical-enrich.mjs --top 5 --dry-run runs end-to-end
  • node scripts/agents/buttons-smoke-test.mjs exits 0 with 14/14 PASS
  • (post-merge) verify next 03:30 PT scheduled phase-B-prime run lands clean
  • (post-merge) verify next 04:00 PT scheduled buttons-smoke run lands clean

Rollback

git revert <merge-sha> — both changes are purely additive (new fallback path + relaxed assertion).

Out of scope (USER action required)

  • scan-email-poll OAuth re-auth: Gmail token rotated out. Plist is currently UNLOADED. Mitchell needs to run the OAuth setup script then bootstrap the plist.
  • network-database-build Tahoe Full Disk Access: Requires manual System Settings action OR a structural fix to move the wrapper out of ~/Documents/. Documented options in .claude/audit/flapping-plist-triage-2026-05-29/notes.md.

🤖 Generated with Claude Code

… buttons-smoke regex assertion

Resolves 2 of 5 CRIT flapping plists surfaced by /system-maintainer +
/career-ops-health Phase 9 (2026-05-29). Other 3: 1 already fixed in
main today (kickstart cleared flap), 2 need USER action (OAuth re-auth
+ Tahoe Full Disk Access).

### Fix 1 — lib/contact-priority-scorer.mjs CLI fallback

`loadAndRank()` now falls back to reading `dashboard/data/contacts-annotated.json`
directly when the dashboard HTML's inline `_CONTACTS_DATA` regex returns
nothing. build-contacts-page.mjs writes the JSON as
`{generated_at, contacts: [...], stats: ...}` (not bare array) — fallback
unwraps via `parsed.contacts` and also handles bare-array shape for
future-proofing.

Closes 5+ consecutive daily failures of phase-B-prime-daily since
2026-05-25 with `FATAL: No _CONTACTS_DATA found`. Root cause: the
dashboard build externalized contacts to dashboard/data/contacts.json
+ contacts-annotated.json, so the inline `var _CONTACTS_DATA = [...]`
array that loadAndRank's regex was looking for no longer exists.

Verified: `loadAndRank({limit:5})` returns 5 ranked contacts via the
fallback (top: Jake Standish / OpenAI, score 8.700).

### Fix 2 — scripts/agents/buttons-smoke-test.mjs concurrent-mutation-aware assertion

"Expected queue size" assertion changed from exact-count match
(`Triage queue: ${snapshot_count} items`) to regex pattern
(`/Triage queue:\s+\d+\s+items\s+in\s+batch\/triage-advance\.tsv/`).

Daemons (community-scan, scan-only, triage) append to
batch/triage-advance.tsv concurrently with the smoke test, so the count
can grow between `snapshotState()` and `batch-runner --dry-run` load.
Canonical incident: snapshot=21, batch-runner read=99 → strict assertion
failed.

The relaxed match still validates that batch-runner reports a queue
size from the expected file — it just doesn't require the count to
match the pre-snapshot exact value.

Verified: smoke now passes 14/14 (was 13/14 with the strict assertion).

### Status of other 3 flapping plists

- `health-column-liveness` — ALREADY FIXED IN MAIN today (sibling). Always
  exits 0 now (data/health-column-coverage.json is the signal, not the
  exit code). Kickstart cleared flap state (runs=3, last exit 0).
- `scan-email-poll` — BOOTED OUT this session to stop a 181-run OAuth
  retry storm (`invalid_grant`, Gmail token revoked). Mitchell action:
  re-auth Gmail OAuth + bootstrap plist.
- `network-database-build` — Tahoe TCC denies launchd cwd access to
  ~/Documents. Wrapper works manually (exit 0) but launchd kickstart
  still fails exit 126. Mitchell action: grant Full Disk Access to the
  plist OR move cron-run.sh to ~/Library/Application Support/career-ops/.

Audit: .claude/audit/flapping-plist-triage-2026-05-29/notes.md

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant