feat: add GitHub events watcher, durable wake-queue, and done-crewmate backstop#152
Open
e-jung wants to merge 24 commits into
Open
feat: add GitHub events watcher, durable wake-queue, and done-crewmate backstop#152e-jung wants to merge 24 commits into
e-jung wants to merge 24 commits into
Conversation
a760c03 to
5c3bd56
Compare
Address review findings from the no-mistakes pipeline: - Emit events per-PR (print then advance that PR's seen) instead of buffering all-at-end: a watcher 30s timeout now surfaces partial progress rather than killing the poll with zero output every cycle. - atomic_write (temp + rename) so a read-only/crashing state dir never leaves a partial seen file; bash builtin printf write()s immediately, keeping the per-PR ordering lossless even under SIGKILL. - Fix open_basenames membership (space-padded) so the last open PR is skipped in detect_left_open instead of always re-checked. - usage() stops before set -u so --help no longer leaks code. - Add review-detection and ci-detection tests.
Address the second review pass: - detect_left_open now honors the merge filter (filter merge off suppresses merge/close events instead of always emitting them). - build_seen carries the CI signature forward across a transiently-empty fetch (new commit whose check-runs have not populated), so a later status change still fires instead of being silently dropped. - Header: note --daemon for large fleets (check-script timeout) and that ci reads the Checks API only. - Tests: fix a vacuous config assertion; add regression tests for the merge-filter suppression and the CI carry-forward window.
…docs Address the third review pass: - Distinguish a configured-empty filters= (all filters off -> watcher muted) from a missing key (defaults to all on); previously 'all off' reset to defaults, so the captain could not actually mute the watcher. - Derive the contributor from `gh auth` when unset instead of hardcoding a username in a shared public-repo script; FM_GH_CONTRIBUTOR and the configured value still take precedence. - count_reviews now excludes the contributor's own reviews (keeps bot and maintainer reviews), matching count_comments. - Document state/.github-watch-config and state/.github-watch-seen/ in the AGENTS.md state layout; add a README toolbelt row + env knobs. - Fix a stale test comment (apply_pending -> atomic_write); add a regression test for the all-filters-off mute.
Address the fourth review pass: - Append ?per_page=100 to the comments, reviews, and check-runs API calls. GitHub list endpoints default to 30 items per page, so counts and the CI signature silently capped at 30 on active PRs; per_page=100 lifts that. - The comment event said 'new maintainer comment(s)' but the count includes bot comments (coderabbit, greptile) by design, so relabel to 'comment(s)'. - Update the fake gh in tests to strip the query string before matching.
Address the fifth review pass: - discover_prs skips discovery when the contributor resolves empty (gh missing/unauthed), so an empty --author is never passed to the search (an empty author qualifier would match open PRs across every repo). - atomic_write now stages its temp in a hidden SEEN_DIR/.tmp subdir (same filesystem, so the rename stays atomic) so a crash-leaked temp never matches detect_left_open's glob and cause a duplicate merge/close event. - Tests pin the contributor via FM_GH_CONTRIBUTOR so they no longer depend on the fake gh implementing `api user`.
Address the sixth review pass: - discover_prs passes --limit 1000 so open PRs beyond the gh search default of 30 are still discovered (the header advertises large-fleet use). - Document that comment/review/check-run counts cap at 100 per type per PR (per_page=100, no pagination) alongside the existing Checks-API caveat.
…count Address the seventh review pass: - detect_left_open now treats only MERGED as terminal. CLOSED PRs are re-probed each cycle, so a close->reopen->merge between polls still emits MERGED instead of being swallowed. Repeat CLOSED events are suppressed by skipping when p_state equals seen_state. - cmd_status excludes the .tmp staging subdir so leaked temps never inflate the 'seen PRs' count; detect_left_open also skips *.tmp.* defensively. - Add a regression test for the close->merge path.
Address the eighth review pass: - build_seen carries the prior state forward when pr_state returns empty (transient gh failure), matching the ci/comments/reviews carry-forward, so a single failed state fetch never erases the tracked OPEN/CLOSED/MERGED. - detect_left_open skips the rewrite entirely when p_state is empty, so it cannot clobber a real state with an empty value and trigger a duplicate.
Address the ninth review pass: - A closed_at timestamp (set when CLOSED is emitted) bounds how long a CLOSED PR is re-probed for a close->reopen->merge. Past FM_GH_CLOSE_REPROBE_SECS (default 7200s) it is treated as settled, so accumulated closed PRs cannot push the fleet past GitHub's rate limit. The default window is generous enough that a prompt close->merge still fires.
…SC2015 CI's shellcheck (v0.10.0) flags SC2015 on `A && B || C` patterns that local 0.11.0 missed; the existing repo scripts avoid the pattern. Rewrite the five guard/assert occurrences in the new files to explicit if-statements.
A --once sweep polled each PR serially: up to 5 gh calls per PR, one PR at a time. At the captain's ~22 open PRs that cost ~47s, over the watcher's 30s check-script kill limit, so a daemon check-script plugin got killed every cycle and delivered nothing. Parallelize the per-PR loop in poll_once with a bounded counting semaphore (FM_GH_CONCURRENCY, default 8; >=1, 0/non-numeric falls back to 8). Each worker is a subshell running process_pr and owns its own per-PR seen file (seen_file is keyed by owner/repo/pr), so concurrent seen writes never collide. The losslessness invariant (print before seen) holds per-worker exactly, and each worker emits its whole event block in a single printf (atomic under PIPE_BUF), so stdout lines never interleave. A final wait settles all workers before detect_left_open scans the seen dir. Measured on the live fleet (~22 PRs): 47.6s -> ~8.6s (5.5x), comfortably under both the 15s target and the 30s kill limit. Adds a parallel-mode regression test asserting events still print before seen advances (via the read-only-seen-dir trick) across 12 concurrently-polled PRs, and that workers never cross-contaminate each other's seen files.
…ition
The ci filter built a sorted multiset of every check-run conclusion and fired
whenever that multiset changed, so a PR whose checks complete at staggered
times fired once per check landing (observed live: no-mistakes#312 fired
several times as its 7 check-runs trickled in). The captain wants "is the PR
green or not", not per-check noise.
Replace the multiset with a single rolled-up overall state per PR, computed in
one jq pass over the check-runs:
success every non-neutral check passed (success/skipped), none still running
failure at least one non-neutral check failed (failure/timed_out/cancelled/
action_required/stale)
pending at least one non-neutral check is still queued/in_progress
neutral only neutral check-runs present
(empty) no check-runs reported yet
Fire one event only when this state flips vs the seen marker
(`CI: <repo>#<pr> -> green|failure|pending|neutral`); stay silent while it is
unchanged. Failure beats pending (a red check already settles the outcome),
matching GitHub's own combined-status precedence.
Losslessness is preserved exactly: the seen marker is still written AFTER the
event prints, the empty-response carry-forward for a just-pushed SHA whose
checks have not populated yet still holds (so a later transition fires), and
per-PR seen files stay independently owned by each parallel worker. The bounded
concurrent poll is untouched.
The test fake gh now runs the watcher's real --jq roll-up over JSON check-run
fixtures (via jq), so the roll-up logic itself is exercised, not just the
comparison. New tests cover the staggered-checks debounce (7 checks trickling
pending->success fires exactly once, not once per check), the
pending->green / green->green / green->failure transitions, and roll-up
precedence (failure beats pending; neutral checks ignored). Existing CI tests
moved to the rolled-up state model; the parallel losslessness test is unchanged
and still passes.
Measured on the live fleet (23 open PRs): --once = 7.74s, well under the 15s
target and the 30s check-script kill limit.
… use Two refinements that make ghwatch safe to leave wired in as a daemon check-script plugin, plus strict-mode hardening. 1. Silent re-baseline on seen-schema migration. A SEEN_SCHEMA version is now stamped into each seen file. When a prior file's schema does not match the current version, the first poll rewrites it at the current schema with carried-forward values and emits NOTHING -- so deploying a schema change (e.g. the ci debounce) no longer floods once as every PR appears to "transition" off the old format. Applied in both process_pr (open PRs) and detect_left_open (PRs that left the open set). Only subsequent real transitions fire. 2. Exclude FM_GH_IGNORE_CHECKS names from the CI roll-up (default: the known fork-routing signature gap #293, "PR must be raised via no-mistakes"). A PR that fails only that check now rolls up to green when its real checks pass, instead of a false failure. Only the roll-up applies the filter; the raw check list and the other filters are unchanged. Set FM_GH_IGNORE_CHECKS to a custom regex, or empty to disable. The regex is embedded into the jq program (gh api has no --arg binding for --jq); a malformed regex fails open to empty (carried forward), never crashing the poll. Also adopt `set -euo pipefail`; the fail-open design is preserved via the existing `|| true` / `|| return 0` guards (full suite green under strict mode). Tests: silent-baseline-on-migration (no flood on schema change; a real transition still fires afterward); gap-excluded PR rolls up green while a real failure still surfaces. All 17 prior ghwatch tests stay green.
…sing them as data ghc() did `command gh "$@" 2>/dev/null || true`: it swallowed stderr and the exit code, but on an API error (e.g. a transient 401 "Bad credentials") gh api writes the error JSON to stdout — bypassing --jq — which the script then parsed as CI data and fired as an event (observed: "CI: manaflow-ai/cmux#6570 -> { \"message\": \"Bad credentials\" ... }"). Detect the error and SKIP that PR for the cycle: never surface an API error as an event. ghc() now captures gh's stdout and returns non-zero (suppressing the body) when gh exits non-zero OR its output is a GitHub error body (a JSON object with top-level "message" + "documentation_url"). process_pr treats any non-zero probe return as "skip this PR this cycle": emit nothing, do not write seen, so the next cycle re-evaluates from the same baseline (lossless). Also guards the discovery fetch (abort the cycle on failure instead of misreading an empty list as "everyone merged") and get_contributor's best-effort user lookup (no set -e trip on a blip). Tests: new test injects a 401 error JSON via the fake gh (verified it fails on the old ghc with the exact bogus CI event, passes with the fix) plus a recovery cycle proving the real transition still fires. shellcheck clean; all 20 green.
…IGNORE_CHECKS env knobs
Deterministic backstop for crewmates that report a terminal status (done/failed/blocked) but whose tmux window is still alive. The signal layer fires once on the status write; if firstmate drops the thread, the stale-pane alarm is indistinguishable from a stuck crewmate and gets dismissed. This check re-asserts every check interval until the crewmate is progressed or torn down. - bin/check-plugins/done-crewmate.check.sh: scans state/*.meta for a terminal current status whose window is still in tmux; prints one wake line listing offenders, silent when healthy. Fast (~7ms), same contract as fm-pr-check checks; excludes needs-decision (escalates via signals). - bin/fm-plugin.sh: add/remove/list/sync for durable plugins. Source lives tracked under bin/check-plugins/; symlinked into state/*.check.sh so the watcher's existing glob discovers it (no watcher changes). sync never clobbers a real per-task state file. - bin/fm-bootstrap.sh: call fm-plugin.sh sync so plugins survive a fresh clone (state/ is gitignored). - AGENTS.md: document the plugin dir, the state symlink, and the section 8 obligation to progress terminal-status crewmates immediately.
…check on older git
…, and github watcher
5c3bd56 to
acb852c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What Changed
fm-github-watch.sh, a GitHub events watcher for PR comments, CI state, and reviews, hardened with concurrent PR polling within the 30s check budget, debounced single-event CI-state transitions, transient-API-error skipping, fork-PR CI roll-up, and atomic seen-state writesfm-wake-lib.sh,fm-wake-drain.sh,fm-watch.sh), plusfm-guard.shwarnings that surface a stale/missing watcher beacon or pending queued wakes on any fleet-touching scriptdone-crewmatecheck plugin with anfm-plugin.shlifecycle manager as a recurring backstop for done/failed/blocked crewmates left in live tmux windows, and support batch task dispatch (multipleid=repopairs) infm-spawn.shRisk Assessment
✅ Low: The round 1 portability fix is correct and complete, and the surrounding plugin/lifecycle code is well-bounded and read-only, with no new material risks introduced.
Testing
Exercised the done-crewmate guard through the watcher's real invocation path (symlinked state/*.check.sh with a fake tmux) and the full fm-plugin.sh lifecycle; every intended behavior holds — recurring wake on done+alive, silence when healthy/progressed, needs-decision exclusion, multi-offender coalescing, no-clobber of per-task checks, and portable FM_ROOT resolution through the symlink. All 16 harness assertions pass and the three existing test suites are green; no failures.
Evidence: done-guard operator transcript
=== done-crewmate guard: operator transcript (fresh-clone fleet root) === --- 1. bootstrap re-arms plugins: fm-plugin.sh sync recreates the watcher glob --- symlink: .../fleet/bin/check-plugins/done-crewmate.check.sh done-crewmate live --- 2. healthy fleet: crewmate still working, window alive -> check is silent --- output: <empty -> watcher keeps sleeping> --- 3. crewmate reports done but firstmate has not torn it down -> RECURRING WAKE --- output: done crewmate ship-1 still alive in tmux - progress or tear down -> watcher wraps as: check: state/done-crewmate.check.sh: ... --- 4. second offender lands; both surface in ONE wake line --- output: done crewmate scout-2 ship-1 still alive in tmux - progress or tear down --- 5. firstmate progresses the work -> check goes silent --- after ship-1 window gone, output: done crewmate scout-2 still alive in tmux ... after scout-2 resumes working, output: <empty -> not idle-done> === portable root resolution (head-commit fix): watcher invokes the SYMLINK === bare-symlink-path invocation (no FM_ROOT_OVERRIDE) output: done crewmate port still alive in tmux - progress or tear downEvidence: verify-done-guard harness (16/16 pass)
Stage 1 (fm-plugin lifecycle): sync creates symlink -> list reports live -> sync preserves real per-task state file -> rejects reserved 'fm-' name -> rejects invalid name. Stage 2 (behavior): A working->silent, B done+alive->wakes, C done+window-gone->silent, D failed+blocked flagged, D needs-decision excluded, E resumed working->silent, F no-trailing-newline parsed, G meta without window= skipped, H empty state->silent. Stage 3 (portable root): one-level readlink resolves FM_ROOT through symlink; end-to-end watcher-style invocation wakes correctly. Summary: PASS=16 FAIL=0Pipeline
Updates from git push no-mistakes
⏭️ **intent** - skipped
✅ No issues found.
✅ **Rebase** - passed
✅ No issues found.
🔧 **Review** - 1 issue found → auto-fixed ✅
bin/check-plugins/done-crewmate.check.sh:31- fm_root() usesreadlink -f(line 31), which stock macOS readlink does not support. This is inconsistent with the rest of the codebase, which explicitly handles Darwin forstatin fm-watch.sh, fm-wake-lib.sh, and fm-guard.sh. Whenreadlink -ffails,srcis left as the unresolved state/ symlink path, sodresolves to.../state; the first fallback branch (line 33,$d/../..) then climbs TWO levels (it assumes the resolvedbin/check-plugins/location) and returns the PARENT of FM_ROOT instead of FM_ROOT.STATEends up pointing at a non-existent directory, and line 40 ([ -d "$STATE" ] || exit 0) silently no-ops the entire backstop check. The common case (watcher armed from FM_ROOT) is rescued by the cwd check on line 29, so this only bites on macOS when the watcher's cwd is not FM_ROOT — but there it defeats the whole purpose of the check (forgotten done/failed/blocked crewmates go uncaught). Fix: resolve the symlink portably withoutreadlink -f(e.g.cd -P "$(dirname "$src")"and read the link target iteratively), or drop the two-level branch and rely on the cwd check plus a single-level resolve.🔧 Fix: Fix done-crewmate check portable root resolution
✅ Re-checked - no issues remain.
✅ **Test** - passed
✅ No issues found.
bash /tmp/no-mistakes-evidence/01KWBAPZAPNPYV3Z1K1DB1ZGBY/verify-done-guard.sh— 16 assertion harness: fm-plugin.sh add/list/sync/remove, name validation, no-clobber of real per-task state files, and 9 done-crewmate behavior scenarios invoked exactly as the watcher does (bash $STATE/done-crewmate.check.shvia its symlink, with a fake tmux), plus portable root resolution through the symlink without FM_ROOT_OVERRIDEOperator transcript via watcher-style invocation: healthy fleet silent; done+alive wakes; failed+blocked flagged, needs-decision excluded; second offender coalesces into one line; tearing down the window / resumingworking:silences the backstop; bare-symlink-path invocation resolves FM_ROOT and wakes (head-commit fix)tests/fm-github-watch.test.shtests/fm-spawn-batch.test.shtests/fm-wake-queue.test.sh✅ **Document** - passed
✅ No issues found.
✅ **Lint** - passed
✅ No issues found.
✅ **Push** - passed
✅ No issues found.