Skip to content

feat: add GitHub events watcher, durable wake-queue, and done-crewmate backstop#152

Open
e-jung wants to merge 24 commits into
kunchenguid:mainfrom
e-jung:fm/fm-done-guard-q1
Open

feat: add GitHub events watcher, durable wake-queue, and done-crewmate backstop#152
e-jung wants to merge 24 commits into
kunchenguid:mainfrom
e-jung:fm/fm-done-guard-q1

Conversation

@e-jung

@e-jung e-jung commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

What Changed

  • Add fm-github-watch.sh, a GitHub events watcher for PR comments, CI state, and reviews, hardened with concurrent PR polling within the 30s check budget, debounced single-event CI-state transitions, transient-API-error skipping, fork-PR CI roll-up, and atomic seen-state writes
  • Make the supervision loop crash-safe with a durable wake-queue and singleton watcher lock (fm-wake-lib.sh, fm-wake-drain.sh, fm-watch.sh), plus fm-guard.sh warnings that surface a stale/missing watcher beacon or pending queued wakes on any fleet-touching script
  • Add the done-crewmate check plugin with an fm-plugin.sh lifecycle manager as a recurring backstop for done/failed/blocked crewmates left in live tmux windows, and support batch task dispatch (multiple id=repo pairs) in fm-spawn.sh

Risk Assessment

✅ Low: The round 1 portability fix is correct and complete, and the surrounding plugin/lifecycle code is well-bounded and read-only, with no new material risks introduced.

Testing

Exercised the done-crewmate guard through the watcher's real invocation path (symlinked state/*.check.sh with a fake tmux) and the full fm-plugin.sh lifecycle; every intended behavior holds — recurring wake on done+alive, silence when healthy/progressed, needs-decision exclusion, multi-offender coalescing, no-clobber of per-task checks, and portable FM_ROOT resolution through the symlink. All 16 harness assertions pass and the three existing test suites are green; no failures.

Evidence: done-guard operator transcript

=== done-crewmate guard: operator transcript (fresh-clone fleet root) === --- 1. bootstrap re-arms plugins: fm-plugin.sh sync recreates the watcher glob --- symlink: .../fleet/bin/check-plugins/done-crewmate.check.sh done-crewmate live --- 2. healthy fleet: crewmate still working, window alive -> check is silent --- output: <empty -> watcher keeps sleeping> --- 3. crewmate reports done but firstmate has not torn it down -> RECURRING WAKE --- output: done crewmate ship-1 still alive in tmux - progress or tear down -> watcher wraps as: check: state/done-crewmate.check.sh: ... --- 4. second offender lands; both surface in ONE wake line --- output: done crewmate scout-2 ship-1 still alive in tmux - progress or tear down --- 5. firstmate progresses the work -> check goes silent --- after ship-1 window gone, output: done crewmate scout-2 still alive in tmux ... after scout-2 resumes working, output: <empty -> not idle-done> === portable root resolution (head-commit fix): watcher invokes the SYMLINK === bare-symlink-path invocation (no FM_ROOT_OVERRIDE) output: done crewmate port still alive in tmux - progress or tear down

=== done-crewmate guard: operator transcript (fresh-clone fleet root) ===
fleet root: /tmp/done-guard-demo.DlIZCg/fleet

--- 1. bootstrap re-arms plugins: fm-plugin.sh sync recreates the watcher glob ---
symlink: /tmp/done-guard-demo.DlIZCg/fleet/bin/check-plugins/done-crewmate.check.sh
done-crewmate	live

--- 2. healthy fleet: crewmate still working, window alive -> check is silent ---
watcher runs: bash $STATE/done-crewmate.check.sh
output: <empty -> watcher keeps sleeping>

--- 3. crewmate reports done but firstmate has not torn it down -> RECURRING WAKE ---
watcher runs the same check on its next CHECK_INTERVAL tick:
output: done crewmate ship-1 still alive in tmux - progress or tear down
  -> watcher wraps as: check: state/done-crewmate.check.sh: done crewmate ship-1 still alive in tmux - progress or tear down

--- 4. second offender lands; both surface in ONE wake line (single backstop tick) ---
output: done crewmate scout-2 ship-1 still alive in tmux - progress or tear down

--- 5. firstmate progresses the work (tears down the window) -> check goes silent ---
after ship-1 window gone, output: done crewmate scout-2 still alive in tmux - progress or tear down
after scout-2 resumes working, output: <empty -> not idle-done>

=== portable root resolution (head-commit fix): watcher invokes the SYMLINK ===
BASH_SOURCE would be the state/ symlink path; script must resolve FM_ROOT without readlink -f (GNU-only).
bare-symlink-path invocation (no FM_ROOT_OVERRIDE) output: done crewmate port still alive in tmux - progress or tear down

=== done ===
Evidence: verify-done-guard harness (16/16 pass)

Stage 1 (fm-plugin lifecycle): sync creates symlink -> list reports live -> sync preserves real per-task state file -> rejects reserved 'fm-' name -> rejects invalid name. Stage 2 (behavior): A working->silent, B done+alive->wakes, C done+window-gone->silent, D failed+blocked flagged, D needs-decision excluded, E resumed working->silent, F no-trailing-newline parsed, G meta without window= skipped, H empty state->silent. Stage 3 (portable root): one-level readlink resolves FM_ROOT through symlink; end-to-end watcher-style invocation wakes correctly. Summary: PASS=16 FAIL=0

#!/usr/bin/env bash
# End-to-end verification of the done-crewmate guard + fm-plugin.sh lifecycle.
# Mirrors the real watcher path: it globs $STATE/*.check.sh and runs
# `bash "$c"`, where state/done-crewmate.check.sh is a symlink to the tracked
# canonical copy under bin/check-plugins/. A fake tmux stands in for real tmux.
set -u

# Worktree root: passed in via DONE_GUARD_REPO, else the caller's CWD.
REPO="${DONE_GUARD_REPO:-$PWD}"
[ -f "$REPO/bin/check-plugins/done-crewmate.check.sh" ] || {
  echo "error: worktree root not found (DONE_GUARD_REPO=$REPO)"; exit 2; }

echo "### worktree root: $REPO"
echo

PASS=0; FAIL=0
assert_eq() { # <desc> <expected> <actual>
  if [ "$2" = "$3" ]; then
    echo "ok - $1"; PASS=$((PASS+1))
  else
    echo "not ok - $1"; echo "      expected: [$2]"; echo "      actual:   [$3]"; FAIL=$((FAIL+1))
  fi
}
assert_silent() { # <desc> <actual-output>
  if [ -z "$2" ]; then echo "ok - $1 (silent, as required)"; PASS=$((PASS+1))
  else echo "not ok - $1 (expected silence, got output)"; echo "      got: [$2]"; FAIL=$((FAIL+1)); fi
}

SCRATCH="$(mktemp -d /tmp/done-guard.XXXXXX)"
trap 'rm -rf "$SCRATCH"' EXIT
FLEET="$SCRATCH/fleet"
mkdir -p "$FLEET/bin/check-plugins" "$FLEET/state"

# Reuse the real plugin + lifecycle scripts by symlinking (read-only, no copy edits).
ln -s "$REPO/bin/check-plugins/done-crewmate.check.sh" "$FLEET/bin/check-plugins/done-crewmate.check.sh"
ln -s "$REPO/bin/fm-plugin.sh" "$FLEET/bin/fm-plugin.sh"

# Fake tmux: prints whatever windows we stage in $FAKE_WINDOWS (one per line).
mkdir -p "$SCRATCH/fakebin"
cat > "$SCRATCH/fakebin/tmux" <<'SH'
#!/usr/bin/env bash
set -u
if [ "${1:-}" = "list-windows" ]; then
  printf '%s\n' "$DONE_GUARD_WINDOWS" 2>/dev/null | grep . || true
fi
SH
chmod +x "$SCRATCH/fakebin/tmux"
export PATH="$SCRATCH/fakebin:$PATH"

run_check_via_symlink() {
  # Exactly how the watcher invokes it: `bash "$STATE/<name>.check.sh"` from FM_ROOT.
  ( cd "$FLEET" && FM_ROOT_OVERRIDE="$FLEET" bash "$FLEET/state/done-crewmate.check.sh" 2>/dev/null )
}

echo "### Stage 1: fm-plugin.sh lifecycle (add/list/sync/remove)"
echo
cd "$FLEET"

# sync should create the state symlink for the canonical plugin.
bin/fm-plugin.sh sync
assert_eq "sync creates state/done-crewmate.check.sh symlink pointing at canonical" \
  "$(readlink "$FLEET/state/done-crewmate.check.sh")" \
  "$FLEET/bin/check-plugins/done-crewmate.check.sh"

# list should report it live.
out="$(bin/fm-plugin.sh list)"
case "$out" in
  *done-crewmate*live*) echo "ok - list reports done-crewmate live"; PASS=$((PASS+1)) ;;
  *) echo "not ok - list did not report live: [$out]"; FAIL=$((FAIL+1)) ;;
esac

# sync must NOT clobber a real per-task state file that happens to share a name.
echo "echo real per-task check" > "$FLEET/state/keep-real.check.sh"
# Make a fake canonical plugin so sync would try to link it.
cat > "$FLEET/bin/check-plugins/keep-real.check.sh" <<'SH'
#!/usr/bin/env bash
echo "SHOULD NOT REPLACE"
SH
bin/fm-plugin.sh sync
if [ -L "$FLEET/state/keep-real.check.sh" ]; then
  echo "not ok - sync clobbered a real per-task state file"; FAIL=$((FAIL+1))
else
  echo "ok - sync preserved real per-task state file (not a symlink)"; PASS=$((PASS+1))
fi
# cleanup the synthetic plugin so it doesn't affect the done-crewmate behavior.
rm -f "$FLEET/state/keep-real.check.sh" "$FLEET/bin/check-plugins/keep-real.check.sh"

# name validation: reject names that would collide with task ids (fm-*) and bad chars.
if bin/fm-plugin.sh add 'fm-bad' "$REPO/bin/check-plugins/done-crewmate.check.sh" 2>/dev/null; then
  echo "not ok - add accepted reserved 'fm-' name"; FAIL=$((FAIL+1))
else echo "ok - add rejected reserved 'fm-' name"; PASS=$((PASS+1)); fi
if bin/fm-plugin.sh add 'bad name!' "$REPO/bin/check-plugins/done-crewmate.check.sh" 2>/dev/null; then
  echo "not ok - add accepted invalid name"; FAIL=$((FAIL+1))
else echo "ok - add rejected invalid name"; PASS=$((PASS+1)); fi
echo

echo "### Stage 2: done-crewmate check behavior (the watcher's recurring backstop)"
echo
unset FM_ROOT_OVERRIDE  # the script will resolve root via the symlink path itself

mk_meta() { # <id> <window>
  printf 'window=%s\nproject=demo\nharness=claude\n' "$2" > "$FLEET/state/$1.meta"
}
mk_status() { # <id> <lines...>
  : > "$FLEET/state/$1.status"
  for line in "$@"; do printf '%s\n' "$line" >> "$FLEET/state/$1.status"; done
}

# --- Scenario A: healthy fleet, crewmate still working -> silent.
DONE_GUARD_WINDOWS="" mk_meta taskA "mysess:fm-taskA"
mk_status taskA "working: implementing"
DONE_GUARD_WINDOWS="" assert_silent \
  "A: working crewmate -> silent" "$(DONE_GUARD_WINDOWS="" run_check_via_symlink)"

# --- Scenario B: done + window alive -> wakes with offender id.
mk_meta taskB "mysess:fm-taskB"
mk_status taskB "done: PR https://example.com/pull/1 checks green"
out="$(DONE_GUARD_WINDOWS="mysess:fm-taskB" run_check_via_symlink)"
case "$out" in
  *taskB*alive*progress*or*tear*down*) echo "ok - B: done+alive -> wakes naming offender"; PASS=$((PASS+1)) ;;
  *) echo "not ok - B: unexpected output: [$out]"; FAIL=$((FAIL+1)) ;;
esac

# --- Scenario C: done but window already torn down -> silent (work progressed).
out="$(DONE_GUARD_WINDOWS="" run_check_via_symlink)"
assert_silent "C: done + window gone -> silent" "$out"

# --- Scenario D: failed/blocked are terminal too; needs-decision is NOT.
mk_meta taskD1 "mysess:fm-d1"; mk_status taskD1 "failed: tests red"
mk_meta taskD2 "mysess:fm-d2"; mk_status taskD2 "blocked: waiting on X"
mk_meta taskD3 "mysess:fm-d3"; mk_status taskD3 "needs-decision: pick A or B"
out="$(DONE_GUARD_WINDOWS="mysess:fm-d1
mysess:fm-d2
mysess:fm-d3" run_check_via_symlink)"
case "$out" in
  *taskD1*taskD2*) echo "ok - D: failed+blocked flagged"; PASS=$((PASS+1)) ;;
  *) echo "not ok - D: failed/blocked not flagged: [$out]"; FAIL=$((FAIL+1)) ;;
esac
case "$out" in
  *taskD3*) echo "not ok - D: needs-decision was flagged (should be excluded)"; FAIL=$((FAIL+1)) ;;
  *) echo "ok - D: needs-decision excluded"; PASS=$((PASS+1)) ;;
esac

# --- Scenario E: resumed working AFTER done -> silent (not idle-done).
mk_status taskD1 "done: ..." "working: follow-up"
out="$(DONE_GUARD_WINDOWS="mysess:fm-d1" run_check_via_symlink)"
assert_silent "E: resumed working after done -> silent" "$out"

# --- Scenario F: status file with no trailing newline still parsed (last line wins).
printf 'working: x\ndone: shipped' > "$FLEET/state/taskD2.status"  # no trailing newline
out="$(DONE_GUARD_WINDOWS="mysess:fm-d2" run_check_via_symlink)"
case "$out" in
  *taskD2*) echo "ok - F: missing-trailing-newline status still detected as done"; PASS=$((PASS+1)) ;;
  *) echo "not ok - F: expected taskD2 offender, got: [$out]"; FAIL=$((FAIL+1)) ;;
esac

# --- Scenario G: meta with no window= line is skipped (cannot cross-ref tmux).
printf 'project=demo\n' > "$FLEET/state/nowin.meta"
printf 'done: x\n' > "$FLEET/state/nowin.status"
out="$(DONE_GUARD_WINDOWS="mysess:fm-nowin" run_check_via_symlink)"
case "$out" in
  *nowin*) echo "not ok - G: meta without window= was flagged"; FAIL=$((FAIL+1)) ;;
  *) echo "ok - G: meta without window= skipped"; PASS=$((PASS+1)) ;;
esac

# --- Scenario H: no .meta / no .status at all -> silent.
rm -f "$FLEET/state/"*.meta "$FLEET/state/"*.status
out="$(DONE_GUARD_WINDOWS="mysess:fm-x" run_check_via_symlink)"
assert_silent "H: empty state dir -> silent" "$out"
echo

echo "### Stage 3: portable root resolution (the head-commit fix)"
echo
# The watcher invokes the SYMLINK. The script must resolve through it to find
# FM_ROOT without `readlink -f` (GNU-only). Prove it works without -f and even
# when BASH_SOURCE is the bare symlink path (relative, cwd = FM_ROOT).
cd "$FLEET"
# Sanity: confirm the system readlink here does NOT need -f for our purpose.
resolved_root="$(FM_ROOT_OVERRIDE= bash -c '
  src="state/done-crewmate.check.sh"
  real="$(readlink "$src")" && case "$real" in /*) src="$real" ;; *) src="$(cd -P "$(dirname "$src")" && pwd)/$real";; esac
  d="$(cd -P "$(dirname "$src")" && pwd)"
  for root in "$d/../.." "$d/.."; do [ -d "$root/bin" ] && [ -d "$root/state" ] && { (cd -P "$root" && pwd); exit; }; done
')"
if [ "$resolved_root" = "$FLEET" ]; then
  echo "ok - portable readlink (one-level) resolves FM_ROOT through the symlink"; PASS=$((PASS+1))
else
  echo "not ok - root resolution failed: got [$resolved_root]"; FAIL=$((FAIL+1))
fi

# Verify the actual script output, invoked exactly as the watcher does it (no override).
printf 'window=mysess:fm-portable\nproject=demo\n' > "$FLEET/state/portable.meta"
printf 'done: ship it\n' > "$FLEET/state/portable.status"
out="$(DONE_GUARD_WINDOWS="mysess:fm-portable" bash "$FLEET/state/done-crewmate.check.sh" 2>/dev/null)"
case "$out" in
  *portable*) echo "ok - end-to-end: watcher-style invocation through symlink wakes correctly"; PASS=$((PASS+1)) ;;
  *) echo "not ok - end-to-end invocation produced no wake (root resolution failed in practice): [$out]"; FAIL=$((FAIL+1)) ;;
esac

echo
echo "### Summary: PASS=$PASS FAIL=$FAIL"
[ "$FAIL" -eq 0 ]

Pipeline

Updates from git push no-mistakes

⏭️ **intent** - skipped

✅ No issues found.

✅ **Rebase** - passed

✅ No issues found.

🔧 **Review** - 1 issue found → auto-fixed ✅
  • ⚠️ bin/check-plugins/done-crewmate.check.sh:31 - fm_root() uses readlink -f (line 31), which stock macOS readlink does not support. This is inconsistent with the rest of the codebase, which explicitly handles Darwin for stat in fm-watch.sh, fm-wake-lib.sh, and fm-guard.sh. When readlink -f fails, src is left as the unresolved state/ symlink path, so d resolves to .../state; the first fallback branch (line 33, $d/../..) then climbs TWO levels (it assumes the resolved bin/check-plugins/ location) and returns the PARENT of FM_ROOT instead of FM_ROOT. STATE ends up pointing at a non-existent directory, and line 40 ([ -d &#34;$STATE&#34; ] || exit 0) silently no-ops the entire backstop check. The common case (watcher armed from FM_ROOT) is rescued by the cwd check on line 29, so this only bites on macOS when the watcher's cwd is not FM_ROOT — but there it defeats the whole purpose of the check (forgotten done/failed/blocked crewmates go uncaught). Fix: resolve the symlink portably without readlink -f (e.g. cd -P &#34;$(dirname &#34;$src&#34;)&#34; and read the link target iteratively), or drop the two-level branch and rely on the cwd check plus a single-level resolve.

🔧 Fix: Fix done-crewmate check portable root resolution
✅ Re-checked - no issues remain.

✅ **Test** - passed

✅ No issues found.

  • bash /tmp/no-mistakes-evidence/01KWBAPZAPNPYV3Z1K1DB1ZGBY/verify-done-guard.sh — 16 assertion harness: fm-plugin.sh add/list/sync/remove, name validation, no-clobber of real per-task state files, and 9 done-crewmate behavior scenarios invoked exactly as the watcher does (bash $STATE/done-crewmate.check.sh via its symlink, with a fake tmux), plus portable root resolution through the symlink without FM_ROOT_OVERRIDE
  • Operator transcript via watcher-style invocation: healthy fleet silent; done+alive wakes; failed+blocked flagged, needs-decision excluded; second offender coalesces into one line; tearing down the window / resuming working: silences the backstop; bare-symlink-path invocation resolves FM_ROOT and wakes (head-commit fix)
  • tests/fm-github-watch.test.sh
  • tests/fm-spawn-batch.test.sh
  • tests/fm-wake-queue.test.sh
✅ **Document** - passed

✅ No issues found.

✅ **Lint** - passed

✅ No issues found.

✅ **Push** - passed

✅ No issues found.

@e-jung e-jung force-pushed the fm/fm-done-guard-q1 branch from a760c03 to 5c3bd56 Compare June 30, 2026 06:20
e-jung added 24 commits June 30, 2026 06:28
Address review findings from the no-mistakes pipeline:
- Emit events per-PR (print then advance that PR's seen) instead of
  buffering all-at-end: a watcher 30s timeout now surfaces partial progress
  rather than killing the poll with zero output every cycle.
- atomic_write (temp + rename) so a read-only/crashing state dir never leaves
  a partial seen file; bash builtin printf write()s immediately, keeping the
  per-PR ordering lossless even under SIGKILL.
- Fix open_basenames membership (space-padded) so the last open PR is skipped
  in detect_left_open instead of always re-checked.
- usage() stops before set -u so --help no longer leaks code.
- Add review-detection and ci-detection tests.
Address the second review pass:
- detect_left_open now honors the merge filter (filter merge off suppresses
  merge/close events instead of always emitting them).
- build_seen carries the CI signature forward across a transiently-empty
  fetch (new commit whose check-runs have not populated), so a later status
  change still fires instead of being silently dropped.
- Header: note --daemon for large fleets (check-script timeout) and that ci
  reads the Checks API only.
- Tests: fix a vacuous config assertion; add regression tests for the
  merge-filter suppression and the CI carry-forward window.
…docs

Address the third review pass:
- Distinguish a configured-empty filters= (all filters off -> watcher muted)
  from a missing key (defaults to all on); previously 'all off' reset to
  defaults, so the captain could not actually mute the watcher.
- Derive the contributor from `gh auth` when unset instead of hardcoding a
  username in a shared public-repo script; FM_GH_CONTRIBUTOR and the
  configured value still take precedence.
- count_reviews now excludes the contributor's own reviews (keeps bot and
  maintainer reviews), matching count_comments.
- Document state/.github-watch-config and state/.github-watch-seen/ in the
  AGENTS.md state layout; add a README toolbelt row + env knobs.
- Fix a stale test comment (apply_pending -> atomic_write); add a regression
  test for the all-filters-off mute.
Address the fourth review pass:
- Append ?per_page=100 to the comments, reviews, and check-runs API calls.
  GitHub list endpoints default to 30 items per page, so counts and the CI
  signature silently capped at 30 on active PRs; per_page=100 lifts that.
- The comment event said 'new maintainer comment(s)' but the count includes
  bot comments (coderabbit, greptile) by design, so relabel to 'comment(s)'.
- Update the fake gh in tests to strip the query string before matching.
Address the fifth review pass:
- discover_prs skips discovery when the contributor resolves empty (gh
  missing/unauthed), so an empty --author is never passed to the search
  (an empty author qualifier would match open PRs across every repo).
- atomic_write now stages its temp in a hidden SEEN_DIR/.tmp subdir (same
  filesystem, so the rename stays atomic) so a crash-leaked temp never
  matches detect_left_open's glob and cause a duplicate merge/close event.
- Tests pin the contributor via FM_GH_CONTRIBUTOR so they no longer depend
  on the fake gh implementing `api user`.
Address the sixth review pass:
- discover_prs passes --limit 1000 so open PRs beyond the gh search default
  of 30 are still discovered (the header advertises large-fleet use).
- Document that comment/review/check-run counts cap at 100 per type per PR
  (per_page=100, no pagination) alongside the existing Checks-API caveat.
…count

Address the seventh review pass:
- detect_left_open now treats only MERGED as terminal. CLOSED PRs are
  re-probed each cycle, so a close->reopen->merge between polls still emits
  MERGED instead of being swallowed. Repeat CLOSED events are suppressed by
  skipping when p_state equals seen_state.
- cmd_status excludes the .tmp staging subdir so leaked temps never inflate
  the 'seen PRs' count; detect_left_open also skips *.tmp.* defensively.
- Add a regression test for the close->merge path.
Address the eighth review pass:
- build_seen carries the prior state forward when pr_state returns empty
  (transient gh failure), matching the ci/comments/reviews carry-forward,
  so a single failed state fetch never erases the tracked OPEN/CLOSED/MERGED.
- detect_left_open skips the rewrite entirely when p_state is empty, so it
  cannot clobber a real state with an empty value and trigger a duplicate.
Address the ninth review pass:
- A closed_at timestamp (set when CLOSED is emitted) bounds how long a
  CLOSED PR is re-probed for a close->reopen->merge. Past FM_GH_CLOSE_REPROBE_SECS
  (default 7200s) it is treated as settled, so accumulated closed PRs cannot
  push the fleet past GitHub's rate limit. The default window is generous
  enough that a prompt close->merge still fires.
…SC2015

CI's shellcheck (v0.10.0) flags SC2015 on `A && B || C` patterns that local
0.11.0 missed; the existing repo scripts avoid the pattern. Rewrite the five
guard/assert occurrences in the new files to explicit if-statements.
A --once sweep polled each PR serially: up to 5 gh calls per PR, one PR at a
time. At the captain's ~22 open PRs that cost ~47s, over the watcher's 30s
check-script kill limit, so a daemon check-script plugin got killed every
cycle and delivered nothing.

Parallelize the per-PR loop in poll_once with a bounded counting semaphore
(FM_GH_CONCURRENCY, default 8; >=1, 0/non-numeric falls back to 8). Each
worker is a subshell running process_pr and owns its own per-PR seen file
(seen_file is keyed by owner/repo/pr), so concurrent seen writes never collide.
The losslessness invariant (print before seen) holds per-worker exactly, and
each worker emits its whole event block in a single printf (atomic under
PIPE_BUF), so stdout lines never interleave. A final wait settles all workers
before detect_left_open scans the seen dir.

Measured on the live fleet (~22 PRs): 47.6s -> ~8.6s (5.5x), comfortably under
both the 15s target and the 30s kill limit.

Adds a parallel-mode regression test asserting events still print before seen
advances (via the read-only-seen-dir trick) across 12 concurrently-polled PRs,
and that workers never cross-contaminate each other's seen files.
…ition

The ci filter built a sorted multiset of every check-run conclusion and fired
whenever that multiset changed, so a PR whose checks complete at staggered
times fired once per check landing (observed live: no-mistakes#312 fired
several times as its 7 check-runs trickled in). The captain wants "is the PR
green or not", not per-check noise.

Replace the multiset with a single rolled-up overall state per PR, computed in
one jq pass over the check-runs:
  success  every non-neutral check passed (success/skipped), none still running
  failure  at least one non-neutral check failed (failure/timed_out/cancelled/
           action_required/stale)
  pending  at least one non-neutral check is still queued/in_progress
  neutral  only neutral check-runs present
  (empty)  no check-runs reported yet
Fire one event only when this state flips vs the seen marker
(`CI: <repo>#<pr> -> green|failure|pending|neutral`); stay silent while it is
unchanged. Failure beats pending (a red check already settles the outcome),
matching GitHub's own combined-status precedence.

Losslessness is preserved exactly: the seen marker is still written AFTER the
event prints, the empty-response carry-forward for a just-pushed SHA whose
checks have not populated yet still holds (so a later transition fires), and
per-PR seen files stay independently owned by each parallel worker. The bounded
concurrent poll is untouched.

The test fake gh now runs the watcher's real --jq roll-up over JSON check-run
fixtures (via jq), so the roll-up logic itself is exercised, not just the
comparison. New tests cover the staggered-checks debounce (7 checks trickling
pending->success fires exactly once, not once per check), the
pending->green / green->green / green->failure transitions, and roll-up
precedence (failure beats pending; neutral checks ignored). Existing CI tests
moved to the rolled-up state model; the parallel losslessness test is unchanged
and still passes.

Measured on the live fleet (23 open PRs): --once = 7.74s, well under the 15s
target and the 30s check-script kill limit.
… use

Two refinements that make ghwatch safe to leave wired in as a daemon
check-script plugin, plus strict-mode hardening.

1. Silent re-baseline on seen-schema migration. A SEEN_SCHEMA version is now
   stamped into each seen file. When a prior file's schema does not match the
   current version, the first poll rewrites it at the current schema with
   carried-forward values and emits NOTHING -- so deploying a schema change
   (e.g. the ci debounce) no longer floods once as every PR appears to
   "transition" off the old format. Applied in both process_pr (open PRs) and
   detect_left_open (PRs that left the open set). Only subsequent real
   transitions fire.

2. Exclude FM_GH_IGNORE_CHECKS names from the CI roll-up (default: the known
   fork-routing signature gap #293, "PR must be raised via no-mistakes"). A PR
   that fails only that check now rolls up to green when its real checks pass,
   instead of a false failure. Only the roll-up applies the filter; the raw
   check list and the other filters are unchanged. Set FM_GH_IGNORE_CHECKS to a
   custom regex, or empty to disable. The regex is embedded into the jq program
   (gh api has no --arg binding for --jq); a malformed regex fails open to empty
   (carried forward), never crashing the poll.

Also adopt `set -euo pipefail`; the fail-open design is preserved via the
existing `|| true` / `|| return 0` guards (full suite green under strict mode).

Tests: silent-baseline-on-migration (no flood on schema change; a real
transition still fires afterward); gap-excluded PR rolls up green while a real
failure still surfaces. All 17 prior ghwatch tests stay green.
…sing them as data

ghc() did `command gh "$@" 2>/dev/null || true`: it swallowed stderr and the
exit code, but on an API error (e.g. a transient 401 "Bad credentials") gh api
writes the error JSON to stdout — bypassing --jq — which the script then parsed
as CI data and fired as an event (observed:
"CI: manaflow-ai/cmux#6570 -> { \"message\": \"Bad credentials\" ... }").

Detect the error and SKIP that PR for the cycle: never surface an API error as
an event. ghc() now captures gh's stdout and returns non-zero (suppressing the
body) when gh exits non-zero OR its output is a GitHub error body (a JSON object
with top-level "message" + "documentation_url"). process_pr treats any
non-zero probe return as "skip this PR this cycle": emit nothing, do not write
seen, so the next cycle re-evaluates from the same baseline (lossless).

Also guards the discovery fetch (abort the cycle on failure instead of misreading
an empty list as "everyone merged") and get_contributor's best-effort user
lookup (no set -e trip on a blip).

Tests: new test injects a 401 error JSON via the fake gh (verified it fails on
the old ghc with the exact bogus CI event, passes with the fix) plus a recovery
cycle proving the real transition still fires. shellcheck clean; all 20 green.
Deterministic backstop for crewmates that report a terminal status
(done/failed/blocked) but whose tmux window is still alive. The signal
layer fires once on the status write; if firstmate drops the thread, the
stale-pane alarm is indistinguishable from a stuck crewmate and gets
dismissed. This check re-asserts every check interval until the crewmate
is progressed or torn down.

- bin/check-plugins/done-crewmate.check.sh: scans state/*.meta for a
  terminal current status whose window is still in tmux; prints one wake
  line listing offenders, silent when healthy. Fast (~7ms), same contract
  as fm-pr-check checks; excludes needs-decision (escalates via signals).
- bin/fm-plugin.sh: add/remove/list/sync for durable plugins. Source lives
  tracked under bin/check-plugins/; symlinked into state/*.check.sh so the
  watcher's existing glob discovers it (no watcher changes). sync never
  clobbers a real per-task state file.
- bin/fm-bootstrap.sh: call fm-plugin.sh sync so plugins survive a fresh
  clone (state/ is gitignored).
- AGENTS.md: document the plugin dir, the state symlink, and the section 8
  obligation to progress terminal-status crewmates immediately.
@e-jung e-jung force-pushed the fm/fm-done-guard-q1 branch from 5c3bd56 to acb852c Compare June 30, 2026 08:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant