feat(supervision): GitHub PR events watcher (comments, CI, reviews, merges)#38
feat(supervision): GitHub PR events watcher (comments, CI, reviews, merges)#38e-jung wants to merge 18 commits into
Conversation
Parallelized the per-PR sweep
Measured on the live fleet (~22 open PRs): serial This now fits the watcher's 30s check-script kill limit with comfortable headroom, so the script is finally usable as a daemon Tests: extended No behavioral changes beyond speed — same filters, same seen-state shape, same CLI. $FM_GH_CONCURRENCY tunes the worker cap (0/non-numeric falls back to 8). |
|
Update: two refinements that make 1. Silent re-baseline on seen-schema migrationA Regression test: 2. Exclude
|
Address review findings from the no-mistakes pipeline: - Emit events per-PR (print then advance that PR's seen) instead of buffering all-at-end: a watcher 30s timeout now surfaces partial progress rather than killing the poll with zero output every cycle. - atomic_write (temp + rename) so a read-only/crashing state dir never leaves a partial seen file; bash builtin printf write()s immediately, keeping the per-PR ordering lossless even under SIGKILL. - Fix open_basenames membership (space-padded) so the last open PR is skipped in detect_left_open instead of always re-checked. - usage() stops before set -u so --help no longer leaks code. - Add review-detection and ci-detection tests.
Address the second review pass: - detect_left_open now honors the merge filter (filter merge off suppresses merge/close events instead of always emitting them). - build_seen carries the CI signature forward across a transiently-empty fetch (new commit whose check-runs have not populated), so a later status change still fires instead of being silently dropped. - Header: note --daemon for large fleets (check-script timeout) and that ci reads the Checks API only. - Tests: fix a vacuous config assertion; add regression tests for the merge-filter suppression and the CI carry-forward window.
…docs Address the third review pass: - Distinguish a configured-empty filters= (all filters off -> watcher muted) from a missing key (defaults to all on); previously 'all off' reset to defaults, so the captain could not actually mute the watcher. - Derive the contributor from `gh auth` when unset instead of hardcoding a username in a shared public-repo script; FM_GH_CONTRIBUTOR and the configured value still take precedence. - count_reviews now excludes the contributor's own reviews (keeps bot and maintainer reviews), matching count_comments. - Document state/.github-watch-config and state/.github-watch-seen/ in the AGENTS.md state layout; add a README toolbelt row + env knobs. - Fix a stale test comment (apply_pending -> atomic_write); add a regression test for the all-filters-off mute.
Address the fourth review pass: - Append ?per_page=100 to the comments, reviews, and check-runs API calls. GitHub list endpoints default to 30 items per page, so counts and the CI signature silently capped at 30 on active PRs; per_page=100 lifts that. - The comment event said 'new maintainer comment(s)' but the count includes bot comments (coderabbit, greptile) by design, so relabel to 'comment(s)'. - Update the fake gh in tests to strip the query string before matching.
Address the fifth review pass: - discover_prs skips discovery when the contributor resolves empty (gh missing/unauthed), so an empty --author is never passed to the search (an empty author qualifier would match open PRs across every repo). - atomic_write now stages its temp in a hidden SEEN_DIR/.tmp subdir (same filesystem, so the rename stays atomic) so a crash-leaked temp never matches detect_left_open's glob and cause a duplicate merge/close event. - Tests pin the contributor via FM_GH_CONTRIBUTOR so they no longer depend on the fake gh implementing `api user`.
Address the sixth review pass: - discover_prs passes --limit 1000 so open PRs beyond the gh search default of 30 are still discovered (the header advertises large-fleet use). - Document that comment/review/check-run counts cap at 100 per type per PR (per_page=100, no pagination) alongside the existing Checks-API caveat.
…count Address the seventh review pass: - detect_left_open now treats only MERGED as terminal. CLOSED PRs are re-probed each cycle, so a close->reopen->merge between polls still emits MERGED instead of being swallowed. Repeat CLOSED events are suppressed by skipping when p_state equals seen_state. - cmd_status excludes the .tmp staging subdir so leaked temps never inflate the 'seen PRs' count; detect_left_open also skips *.tmp.* defensively. - Add a regression test for the close->merge path.
Address the eighth review pass: - build_seen carries the prior state forward when pr_state returns empty (transient gh failure), matching the ci/comments/reviews carry-forward, so a single failed state fetch never erases the tracked OPEN/CLOSED/MERGED. - detect_left_open skips the rewrite entirely when p_state is empty, so it cannot clobber a real state with an empty value and trigger a duplicate.
Address the ninth review pass: - A closed_at timestamp (set when CLOSED is emitted) bounds how long a CLOSED PR is re-probed for a close->reopen->merge. Past FM_GH_CLOSE_REPROBE_SECS (default 7200s) it is treated as settled, so accumulated closed PRs cannot push the fleet past GitHub's rate limit. The default window is generous enough that a prompt close->merge still fires.
…SC2015 CI's shellcheck (v0.10.0) flags SC2015 on `A && B || C` patterns that local 0.11.0 missed; the existing repo scripts avoid the pattern. Rewrite the five guard/assert occurrences in the new files to explicit if-statements.
A --once sweep polled each PR serially: up to 5 gh calls per PR, one PR at a time. At the captain's ~22 open PRs that cost ~47s, over the watcher's 30s check-script kill limit, so a daemon check-script plugin got killed every cycle and delivered nothing. Parallelize the per-PR loop in poll_once with a bounded counting semaphore (FM_GH_CONCURRENCY, default 8; >=1, 0/non-numeric falls back to 8). Each worker is a subshell running process_pr and owns its own per-PR seen file (seen_file is keyed by owner/repo/pr), so concurrent seen writes never collide. The losslessness invariant (print before seen) holds per-worker exactly, and each worker emits its whole event block in a single printf (atomic under PIPE_BUF), so stdout lines never interleave. A final wait settles all workers before detect_left_open scans the seen dir. Measured on the live fleet (~22 PRs): 47.6s -> ~8.6s (5.5x), comfortably under both the 15s target and the 30s kill limit. Adds a parallel-mode regression test asserting events still print before seen advances (via the read-only-seen-dir trick) across 12 concurrently-polled PRs, and that workers never cross-contaminate each other's seen files.
…ition
The ci filter built a sorted multiset of every check-run conclusion and fired
whenever that multiset changed, so a PR whose checks complete at staggered
times fired once per check landing (observed live: no-mistakes#312 fired
several times as its 7 check-runs trickled in). The captain wants "is the PR
green or not", not per-check noise.
Replace the multiset with a single rolled-up overall state per PR, computed in
one jq pass over the check-runs:
success every non-neutral check passed (success/skipped), none still running
failure at least one non-neutral check failed (failure/timed_out/cancelled/
action_required/stale)
pending at least one non-neutral check is still queued/in_progress
neutral only neutral check-runs present
(empty) no check-runs reported yet
Fire one event only when this state flips vs the seen marker
(`CI: <repo>#<pr> -> green|failure|pending|neutral`); stay silent while it is
unchanged. Failure beats pending (a red check already settles the outcome),
matching GitHub's own combined-status precedence.
Losslessness is preserved exactly: the seen marker is still written AFTER the
event prints, the empty-response carry-forward for a just-pushed SHA whose
checks have not populated yet still holds (so a later transition fires), and
per-PR seen files stay independently owned by each parallel worker. The bounded
concurrent poll is untouched.
The test fake gh now runs the watcher's real --jq roll-up over JSON check-run
fixtures (via jq), so the roll-up logic itself is exercised, not just the
comparison. New tests cover the staggered-checks debounce (7 checks trickling
pending->success fires exactly once, not once per check), the
pending->green / green->green / green->failure transitions, and roll-up
precedence (failure beats pending; neutral checks ignored). Existing CI tests
moved to the rolled-up state model; the parallel losslessness test is unchanged
and still passes.
Measured on the live fleet (23 open PRs): --once = 7.74s, well under the 15s
target and the 30s check-script kill limit.
… use Two refinements that make ghwatch safe to leave wired in as a daemon check-script plugin, plus strict-mode hardening. 1. Silent re-baseline on seen-schema migration. A SEEN_SCHEMA version is now stamped into each seen file. When a prior file's schema does not match the current version, the first poll rewrites it at the current schema with carried-forward values and emits NOTHING -- so deploying a schema change (e.g. the ci debounce) no longer floods once as every PR appears to "transition" off the old format. Applied in both process_pr (open PRs) and detect_left_open (PRs that left the open set). Only subsequent real transitions fire. 2. Exclude FM_GH_IGNORE_CHECKS names from the CI roll-up (default: the known fork-routing signature gap #293, "PR must be raised via no-mistakes"). A PR that fails only that check now rolls up to green when its real checks pass, instead of a false failure. Only the roll-up applies the filter; the raw check list and the other filters are unchanged. Set FM_GH_IGNORE_CHECKS to a custom regex, or empty to disable. The regex is embedded into the jq program (gh api has no --arg binding for --jq); a malformed regex fails open to empty (carried forward), never crashing the poll. Also adopt `set -euo pipefail`; the fail-open design is preserved via the existing `|| true` / `|| return 0` guards (full suite green under strict mode). Tests: silent-baseline-on-migration (no flood on schema change; a real transition still fires afterward); gap-excluded PR rolls up green while a real failure still surfaces. All 17 prior ghwatch tests stay green.
…sing them as data ghc() did `command gh "$@" 2>/dev/null || true`: it swallowed stderr and the exit code, but on an API error (e.g. a transient 401 "Bad credentials") gh api writes the error JSON to stdout — bypassing --jq — which the script then parsed as CI data and fired as an event (observed: "CI: manaflow-ai/cmux#6570 -> { \"message\": \"Bad credentials\" ... }"). Detect the error and SKIP that PR for the cycle: never surface an API error as an event. ghc() now captures gh's stdout and returns non-zero (suppressing the body) when gh exits non-zero OR its output is a GitHub error body (a JSON object with top-level "message" + "documentation_url"). process_pr treats any non-zero probe return as "skip this PR this cycle": emit nothing, do not write seen, so the next cycle re-evaluates from the same baseline (lossless). Also guards the discovery fetch (abort the cycle on failure instead of misreading an empty list as "everyone merged") and get_contributor's best-effort user lookup (no set -e trip on a blip). Tests: new test injects a 401 error JSON via the fake gh (verified it fails on the old ghc with the exact bogus CI event, passes with the fix) plus a recovery cycle proving the real transition still fires. shellcheck clean; all 20 green.
…config, and architecture
Intent
The developer (captain) tasked a crewmate with building bin/fm-github-watch.sh, a standalone GitHub events poller that watches all the captain's open PRs (13 across 6 repos) and surfaces new maintainer comments, CI status changes, reviews, and merge/close transitions into the existing batched wake-queue escalation path, closing the gap that left firstmate blind to comments like the one on PR #30. They specified an auto-discovery design via gh search prs, per-PR high-water-mark seen tracking in state/.github-watch-seen/, and explicitly demanded a losslessness rule (seen markers written only after events print) learned from a prior #3095 event-swallow incident. They also required CLI filter knobs (comments/ci/reviews/merge on|off, contributor get/set, status) so different agents can toggle which signals fire, plus a TAP-style test suite with mocked gh and strict avoidance of webhooks, fm-watch.sh edits, or daemon changes. Acceptance criteria included discovering all open PRs, working filter toggles, clean shellcheck, passing existing tests, and a new test file, shipped as a cross-repo PR titled "feat(supervision): GitHub events watcher (comments/CI/reviews) with CLI filter knobs" pushed to the e-jung fork with AI disclosure set to Human-reviewed.
What Changed
bin/fm-github-watch.sh, a standalone GitHub events poller that auto-discovers the user's open PRs viagh search prs, tracks per-PR high-water marks understate/.github-watch-seen/, and emits maintainer comments, CI transitions, reviews, and merge/close changes into the batched wake-queue; seen markers are written only after events print, so a failed write re-fires events losslessly until it succeeds.comments|ci|reviews|merge on|off,contributor get|set,status) so different agents can toggle which signals fire, withper_page/--limittuning and concurrent per-PR polling to stay within the 30s check-script budget; transient GitHub API errors skip a PR rather than being misparsed as data.bin/fm-teardown.sh(pre-2.38 git support in the content-landed check) andbin/fm-guard.sh, and documents the watcher across the scripts/configuration/architecture docs plus a newtests/fm-github-watch.test.shTAP suite.Risk Assessment
✅ Low: Additive, opt-in standalone script activated only by an explicit symlink; it does not modify fm-watch.sh, daemons, or any existing supervision code path, and its core logic (lossless per-PR print-before-seen ordering, CI debounce, schema migration, bounded concurrency, API-error fail-open) is correct and comprehensively covered by 19 behavior tests.
Testing
The change's TAP suite and the full repo suite pass cleanly (the earlier "tests failed" was resolved by the chain of ghwatch fix commits already on the branch; my only timeout this round was the harness's 120s budget running all tests serially — each test, run individually, exits 0). Beyond unit assertions, I drove the real fm-github-watch.sh through a fake gh to capture a CLI transcript demonstrating the complete intent: auto-discovery of 3 open PRs across 2 repos, silent first-run baseline with per-PR seen files created, then all four event types firing on real transitions (COMMENT, REVIEW, CI green->failure, MERGED), the contributor/filter/status CLI knobs persisting to config, and the losslessness rule in action — with the seen dir made read-only, the comment+CI events re-fire on every poll and only stop once seen writes succeed (no permanent swallow). Worktree left clean.Evidence: End-to-end CLI transcript (real watcher, fake gh)
$ fm-github-watch.sh contributor the-captain -> contributor=the-captain $ filter reviews off / merge off / merge on / reviews on -> filters=comments,ci,merge,reviews (knobs persist) $ status -> contributor + per-filter on/off + poll interval + seen PRs: 0 $ --once (first poll) -> (silent baseline) $ status -> seen PRs: 3 $ --once (after transitions) -> COMMENT: e-jung/firstmate#152 has 2 new comment(s) REVIEW: e-jung/firstmate#152 has 1 new review(s) CI: e-jung/firstmate#152 -> failure MERGED: e-jung/firstmate#153 --- seen dir made read-only to force failed seen writes --- $ --once -> COMMENT: ...1 new comment(s) / CI: ... -> green (event delivered, seen can't advance) $ --once -> COMMENT: ...1 new comment(s) / CI: ... -> green (SAME event re-fires = lossless) $ --once -> COMMENT: ...1 new comment(s) / CI: ... -> green (still re-firing) --- seen dir writable again --- $ --once -> (silent: seen finally advanced) final seen: #153 state=MERGED (terminal), #152/#7 OPEN with correct high-water comment/review/ci marksEvidence: Reproducible end-to-end demo script
Pipeline
Updates from git push no-mistakes
✅ **intent** - passed
✅ No issues found.
🔧 **Rebase** - 2 issues found → auto-fixed ✅
AGENTS.md- merge conflict rebasing onto origin/mainREADME.md- merge conflict rebasing onto origin/main🔧 Fix applied.
✅ Re-checked - no issues remain.
⏭️ **Review** - skipped
bin/fm-github-watch.sh:81- The default IGNORE_CHECKS regex bakes in a project-specific check name ("PR must be raised via no-mistakes", the #293 fork-routing gap) for a tool that is otherwise positioned as shared/general (get_contributor's comment says "a shared tool should poll whoever is logged in"). It is harmless to other users (no check collides with that name), configurable via FM_GH_IGNORE_CHECKS, and explicitly required by the captain's kunchenguid workflow, but shipping it as the default rather than opt-in is a product call worth confirming for a committed shared script.bin/fm-github-watch.sh:460- get_contributor is resolved twice per poll cycle: once here in poll_once (to pass into process_pr) and again inside discover_prs (line 231). When the contributor is unconfigured this means twogh api usercalls per sweep. discover_prs has a single caller, so it could accept the contributor as an argument and reuse the value poll_once already resolved, removing the duplicate resolution.🔧 **Test** - 1 issue found → auto-fixed ✅
command -v tmux >/dev/null || { echo "tmux is required for e2e tests" >&2; exit 1; }; tmux -V; rc=0; for t in tests/*.test.sh; do echo "== $t =="; bash "$t" || rc=1; done; exit "$rc"🔧 Fix: fix(teardown): support pre-2.38 git in content-landed check
✅ Re-checked - no issues remain.
command -v tmux >/dev/null || { echo "tmux is required for e2e tests" >&2; exit 1; }; tmux -V; rc=0; for t in tests/*.test.sh; do echo "== $t =="; bash "$t" || rc=1; done; exit "$rc"bash tests/fm-github-watch.test.sh(20/20 TAP assertions, EXIT=0): filter toggles, baseline silence, comment/review/CI/merge events, lossless re-detect on failed seen write, CLOSED non-terminal, re-probe windowing, config round-trip, CI debounce + roll-up precedence, all-filters-off mute, parallel-poll seen isolation, schema re-baseline, fork-routing gap exclusion, transient API error skipbash tests/fm-watcher-lock.test.sh(21 ok, EXIT=0)bash tests/fm-x-mode.test.sh(25 ok, EXIT=0)bash tests/fm-watch-triage.test.sh(all ok, EXIT=0)End-to-end demo/tmp/no-mistakes-evidence/01KWCMH0E270PV2K0YASY7JW6M/ghwatch-e2e-demo.shdriving the real bin/fm-github-watch.sh through a fake gh: contributor get/set, filter list/on/off, status; baseline poll silent then seen PRs=3; COMMENT(2)+REVIEW(1)+CI(->failure)+MERGED(#153) fire on transitions; read-only seen dir forces failed seen writes and the events re-fire every cycle (lossless), then go silent once writable again; final seen files show #153 state=MERGED with correct high-water marksgit status --porcelainconfirming the worktree is clean after testing (no transient artifacts leaked)✅ **Document** - passed
✅ No issues found.
✅ **Lint** - passed
✅ No issues found.
✅ **Push** - passed
✅ No issues found.