Skip to content

feat(supervision): GitHub PR events watcher (comments, CI, reviews, merges)#38

Open
e-jung wants to merge 18 commits into
kunchenguid:mainfrom
e-jung:fm-ghwatch-k6
Open

feat(supervision): GitHub PR events watcher (comments, CI, reviews, merges)#38
e-jung wants to merge 18 commits into
kunchenguid:mainfrom
e-jung:fm-ghwatch-k6

Conversation

@e-jung

@e-jung e-jung commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Intent

The developer (captain) tasked a crewmate with building bin/fm-github-watch.sh, a standalone GitHub events poller that watches all the captain's open PRs (13 across 6 repos) and surfaces new maintainer comments, CI status changes, reviews, and merge/close transitions into the existing batched wake-queue escalation path, closing the gap that left firstmate blind to comments like the one on PR #30. They specified an auto-discovery design via gh search prs, per-PR high-water-mark seen tracking in state/.github-watch-seen/, and explicitly demanded a losslessness rule (seen markers written only after events print) learned from a prior #3095 event-swallow incident. They also required CLI filter knobs (comments/ci/reviews/merge on|off, contributor get/set, status) so different agents can toggle which signals fire, plus a TAP-style test suite with mocked gh and strict avoidance of webhooks, fm-watch.sh edits, or daemon changes. Acceptance criteria included discovering all open PRs, working filter toggles, clean shellcheck, passing existing tests, and a new test file, shipped as a cross-repo PR titled "feat(supervision): GitHub events watcher (comments/CI/reviews) with CLI filter knobs" pushed to the e-jung fork with AI disclosure set to Human-reviewed.

What Changed

  • Adds bin/fm-github-watch.sh, a standalone GitHub events poller that auto-discovers the user's open PRs via gh search prs, tracks per-PR high-water marks under state/.github-watch-seen/, and emits maintainer comments, CI transitions, reviews, and merge/close changes into the batched wake-queue; seen markers are written only after events print, so a failed write re-fires events losslessly until it succeeds.
  • Exposes CLI filter knobs (comments|ci|reviews|merge on|off, contributor get|set, status) so different agents can toggle which signals fire, with per_page/--limit tuning and concurrent per-PR polling to stay within the 30s check-script budget; transient GitHub API errors skip a PR rather than being misparsed as data.
  • Updates bin/fm-teardown.sh (pre-2.38 git support in the content-landed check) and bin/fm-guard.sh, and documents the watcher across the scripts/configuration/architecture docs plus a new tests/fm-github-watch.test.sh TAP suite.

Risk Assessment

✅ Low: Additive, opt-in standalone script activated only by an explicit symlink; it does not modify fm-watch.sh, daemons, or any existing supervision code path, and its core logic (lossless per-PR print-before-seen ordering, CI debounce, schema migration, bounded concurrency, API-error fail-open) is correct and comprehensively covered by 19 behavior tests.

Testing

The change's TAP suite and the full repo suite pass cleanly (the earlier "tests failed" was resolved by the chain of ghwatch fix commits already on the branch; my only timeout this round was the harness's 120s budget running all tests serially — each test, run individually, exits 0). Beyond unit assertions, I drove the real fm-github-watch.sh through a fake gh to capture a CLI transcript demonstrating the complete intent: auto-discovery of 3 open PRs across 2 repos, silent first-run baseline with per-PR seen files created, then all four event types firing on real transitions (COMMENT, REVIEW, CI green->failure, MERGED), the contributor/filter/status CLI knobs persisting to config, and the losslessness rule in action — with the seen dir made read-only, the comment+CI events re-fire on every poll and only stop once seen writes succeed (no permanent swallow). Worktree left clean.

Evidence: End-to-end CLI transcript (real watcher, fake gh)

$ fm-github-watch.sh contributor the-captain -> contributor=the-captain $ filter reviews off / merge off / merge on / reviews on -> filters=comments,ci,merge,reviews (knobs persist) $ status -> contributor + per-filter on/off + poll interval + seen PRs: 0 $ --once (first poll) -> (silent baseline) $ status -> seen PRs: 3 $ --once (after transitions) -> COMMENT: e-jung/firstmate#152 has 2 new comment(s) REVIEW: e-jung/firstmate#152 has 1 new review(s) CI: e-jung/firstmate#152 -> failure MERGED: e-jung/firstmate#153 --- seen dir made read-only to force failed seen writes --- $ --once -> COMMENT: ...1 new comment(s) / CI: ... -> green (event delivered, seen can't advance) $ --once -> COMMENT: ...1 new comment(s) / CI: ... -> green (SAME event re-fires = lossless) $ --once -> COMMENT: ...1 new comment(s) / CI: ... -> green (still re-firing) --- seen dir writable again --- $ --once -> (silent: seen finally advanced) final seen: #153 state=MERGED (terminal), #152/#7 OPEN with correct high-water comment/review/ci marks


$ /home/ubuntu/.no-mistakes/worktrees/507cc8072d97/01KWCMH0E270PV2K0YASY7JW6M/bin/fm-github-watch.sh contributor the-captain
contributor=the-captain

$ /home/ubuntu/.no-mistakes/worktrees/507cc8072d97/01KWCMH0E270PV2K0YASY7JW6M/bin/fm-github-watch.sh contributor
the-captain
$ /home/ubuntu/.no-mistakes/worktrees/507cc8072d97/01KWCMH0E270PV2K0YASY7JW6M/bin/fm-github-watch.sh filter list
comments
ci
reviews
merge

$ /home/ubuntu/.no-mistakes/worktrees/507cc8072d97/01KWCMH0E270PV2K0YASY7JW6M/bin/fm-github-watch.sh filter reviews off
filters=comments,ci,merge

$ /home/ubuntu/.no-mistakes/worktrees/507cc8072d97/01KWCMH0E270PV2K0YASY7JW6M/bin/fm-github-watch.sh filter merge off
filters=comments,ci

$ /home/ubuntu/.no-mistakes/worktrees/507cc8072d97/01KWCMH0E270PV2K0YASY7JW6M/bin/fm-github-watch.sh filter list
comments
ci

$ /home/ubuntu/.no-mistakes/worktrees/507cc8072d97/01KWCMH0E270PV2K0YASY7JW6M/bin/fm-github-watch.sh filter merge on
filters=comments,ci,merge

$ /home/ubuntu/.no-mistakes/worktrees/507cc8072d97/01KWCMH0E270PV2K0YASY7JW6M/bin/fm-github-watch.sh filter reviews on
filters=comments,ci,merge,reviews

$ /home/ubuntu/.no-mistakes/worktrees/507cc8072d97/01KWCMH0E270PV2K0YASY7JW6M/bin/fm-github-watch.sh status
contributor: the-captain
filters:
  comments: on
  ci: on
  reviews: on
  merge: on
poll interval: 300s
seen PRs: 0

$ /home/ubuntu/.no-mistakes/worktrees/507cc8072d97/01KWCMH0E270PV2K0YASY7JW6M/bin/fm-github-watch.sh --once

$ /home/ubuntu/.no-mistakes/worktrees/507cc8072d97/01KWCMH0E270PV2K0YASY7JW6M/bin/fm-github-watch.sh status
contributor: the-captain
filters:
  comments: on
  ci: on
  reviews: on
  merge: on
poll interval: 300s
seen PRs: 3

$ /home/ubuntu/.no-mistakes/worktrees/507cc8072d97/01KWCMH0E270PV2K0YASY7JW6M/bin/fm-github-watch.sh --once
COMMENT: e-jung/firstmate#152 has 2 new comment(s)
REVIEW: e-jung/firstmate#152 has 1 new review(s)
CI: e-jung/firstmate#152 -> failure
MERGED: e-jung/firstmate#153

--- making seen dir read-only to force the seen-write to fail ---

$ /home/ubuntu/.no-mistakes/worktrees/507cc8072d97/01KWCMH0E270PV2K0YASY7JW6M/bin/fm-github-watch.sh --once
COMMENT: e-jung/firstmate#152 has 1 new comment(s)
CI: e-jung/firstmate#152 -> green

$ /home/ubuntu/.no-mistakes/worktrees/507cc8072d97/01KWCMH0E270PV2K0YASY7JW6M/bin/fm-github-watch.sh --once
COMMENT: e-jung/firstmate#152 has 1 new comment(s)
CI: e-jung/firstmate#152 -> green

$ /home/ubuntu/.no-mistakes/worktrees/507cc8072d97/01KWCMH0E270PV2K0YASY7JW6M/bin/fm-github-watch.sh --once
COMMENT: e-jung/firstmate#152 has 1 new comment(s)
CI: e-jung/firstmate#152 -> green

$ /home/ubuntu/.no-mistakes/worktrees/507cc8072d97/01KWCMH0E270PV2K0YASY7JW6M/bin/fm-github-watch.sh --once

$ /home/ubuntu/.no-mistakes/worktrees/507cc8072d97/01KWCMH0E270PV2K0YASY7JW6M/bin/fm-github-watch.sh status
contributor: the-captain
filters:
  comments: on
  ci: on
  reviews: on
  merge: on
poll interval: 300s
seen PRs: 3

=== final seen-state files ===
--- /tmp/ghwatch-demo/state/.github-watch-seen/anhydrite-reconciler-7 ---
owner=anhydrite
repo=reconciler
pr=7
schema=2
initialized=1
comments=0
reviews=0
ci=success
sha=ccc333
state=OPEN

--- /tmp/ghwatch-demo/state/.github-watch-seen/e-jung-firstmate-152 ---
owner=e-jung
repo=firstmate
pr=152
schema=2
initialized=1
comments=6
reviews=3
ci=success
sha=aaa111
state=OPEN

--- /tmp/ghwatch-demo/state/.github-watch-seen/e-jung-firstmate-153 ---
owner=e-jung
repo=firstmate
pr=153
schema=2
initialized=1
comments=1
reviews=0
ci=success
sha=bbb222
state=MERGED
Evidence: Reproducible end-to-end demo script
#!/usr/bin/env bash
# End-to-end CLI transcript of bin/fm-github-watch.sh.
# Drives the REAL script through a fake `gh` (mirrors the test harness) so the
# transcript shows exactly what firstmate / the captain would experience.
set -u

CASE=/tmp/ghwatch-demo
rm -rf "$CASE"
mkdir -p "$CASE/state" "$CASE/fixture" "$CASE/fakebin"

cat > "$CASE/fakebin/gh" <<'GH'
#!/usr/bin/env bash
set -u
FX="${GH_FIXTURE:?}"
sub="${1:-}"; shift || true
case "$sub" in
  search)
    [ -f "$FX/prs" ] && cat "$FX/prs"; exit 0 ;;
  api)
    path=""
    for a in "$@"; do case "$a" in repos/*) path=$a ;; esac; done
    path="${path%%\?*}"
    case "$path" in
      */issues/*/comments)
        rest=${path#repos/}; owner=${rest%%/*}; rest=${rest#*/}; repo=${rest%%/*}; rest=${rest#*/}
        num=${rest#issues/}; num=${num%/comments}
        f="$FX/comments-$owner-$repo-$num"; [ -f "$f" ] && { cat "$f"; exit 0; }; echo 0; exit 0 ;;
      */pulls/*/reviews)
        rest=${path#repos/}; owner=${rest%%/*}; rest=${rest#*/}; repo=${rest%%/*}; rest=${rest#*/}
        num=${rest#pulls/}; num=${num%/reviews}
        f="$FX/reviews-$owner-$repo-$num"; [ -f "$f" ] && { cat "$f"; exit 0; }; echo 0; exit 0 ;;
      */commits/*/check-runs)
        sha=${path##*/commits/}; sha=${sha%/check-runs}
        f="$FX/ci-$sha"; [ -f "$f" ] || exit 0
        jq_expr=""; prev=""
        for a in "$@"; do [ "$prev" = "--jq" ] && jq_expr=$a; prev=$a; done
        if [ -n "$jq_expr" ]; then jq -r "$jq_expr" "$f"; else cat "$f"; fi
        exit 0 ;;
    esac
    exit 0 ;;
  pr)
    num=""; repo=""; field=""; prev=""
    for a in "$@"; do
      [ "$prev" = "-R" ] && repo=$a
      [ "$prev" = "--json" ] && field=$a
      case "$a" in [0-9]*) num=$a ;; esac
      prev=$a
    done
    owner=${repo%%/*}; rn=${repo#*/}
    case "$field" in
      state)      f="$FX/state-$owner-$rn-$num"; [ -f "$f" ] && { cat "$f"; exit 0; }; echo OPEN; exit 0 ;;
      headRefOid) f="$FX/sha-$owner-$rn-$num";   [ -f "$f" ] && { cat "$f"; exit 0; }; echo deadbeef; exit 0 ;;
    esac
    exit 0 ;;
esac
exit 0
GH
chmod +x "$CASE/fakebin/gh"

export PATH="$CASE/fakebin:$PATH"
export FM_STATE_OVERRIDE="$CASE/state"
export GH_FIXTURE="$CASE/fixture"
export FM_GH_CONTRIBUTOR=the-captain
WATCH="$PWD/bin/fm-github-watch.sh"

run() { printf '\n$ %s\n' "$*"; "$@"; }

# --- 1. CLI knobs: contributor + filter toggles + status ---
run "$WATCH" contributor the-captain
run "$WATCH" contributor
run "$WATCH" filter list
run "$WATCH" filter reviews off
run "$WATCH" filter merge off
run "$WATCH" filter list
run "$WATCH" filter merge on
run "$WATCH" filter reviews on
run "$WATCH" status

# --- 2. Fleet: 3 open PRs across 2 repos ---
printf 'e-jung/firstmate\t152\ne-jung/firstmate\t153\nanhydrite/reconciler\t7\n' > "$GH_FIXTURE/prs"
printf '3\n' > "$GH_FIXTURE/comments-e-jung-firstmate-152"
printf '1\n' > "$GH_FIXTURE/comments-e-jung-firstmate-153"
printf '2\n' > "$GH_FIXTURE/reviews-e-jung-firstmate-152"
printf 'OPEN\n' > "$GH_FIXTURE/state-e-jung-firstmate-152"
printf 'OPEN\n' > "$GH_FIXTURE/state-e-jung-firstmate-153"
printf 'OPEN\n' > "$GH_FIXTURE/state-anhydrite-reconciler-7"
printf 'aaa111\n' > "$GH_FIXTURE/sha-e-jung-firstmate-152"
printf 'bbb222\n' > "$GH_FIXTURE/sha-e-jung-firstmate-153"
printf 'ccc333\n' > "$GH_FIXTURE/sha-anhydrite-reconciler-7"
# CI: all green initially
printf '{"check_runs":[{"name":"ci","conclusion":"success"},{"name":"lint","conclusion":"success"}]}\n' > "$GH_FIXTURE/ci-aaa111"
printf '{"check_runs":[{"name":"ci","conclusion":"success"}]}\n' > "$GH_FIXTURE/ci-bbb222"
printf '{"check_runs":[{"name":"build","conclusion":"success"}]}\n' > "$GH_FIXTURE/ci-ccc333"

run "$WATCH" --once      # baseline: silent (first run)
run "$WATCH" status      # seen PRs now tracked

# --- 3. Real transitions: new comment + CI red + new review + a merge ---
printf '5\n' > "$GH_FIXTURE/comments-e-jung-firstmate-152"     # +2 maintainer comments
printf '{"check_runs":[{"name":"ci","conclusion":"failure"},{"name":"lint","conclusion":"success"}]}\n' > "$GH_FIXTURE/ci-aaa111"  # green->failure
printf '3\n' > "$GH_FIXTURE/reviews-e-jung-firstmate-152"      # +1 review
printf 'MERGED\n' > "$GH_FIXTURE/state-e-jung-firstmate-153"   # merged
printf 'e-jung/firstmate\t152\nanhydrite/reconciler\t7\n' > "$GH_FIXTURE/prs"  # 153 left open set

run "$WATCH" --once      # should emit COMMENT, CI, REVIEW, MERGED

# --- 4. Losslessness: simulate a crash between print and seen-write ---
# Roll CI back to green on #152 but corrupt the seen file's write path by
# making the seen dir read-only AFTER the comment count jumps again, so the
# seen marker can't advance. The event must re-fire next cycle (no swallow).
printf '6\n' > "$GH_FIXTURE/comments-e-jung-firstmate-152"     # +1 more comment
printf '{"check_runs":[{"name":"ci","conclusion":"success"},{"name":"lint","conclusion":"success"}]}\n' > "$GH_FIXTURE/ci-aaa111"
printf '\n--- making seen dir read-only to force the seen-write to fail ---\n'
chmod -R a-w "$CASE/state/.github-watch-seen"
run "$WATCH" --once       # event prints; seen write fails; marker unchanged
run "$WATCH" --once       # SAME event re-fires (lossless: no permanent swallow)
chmod -R u+w "$CASE/state/.github-watch-seen"
run "$WATCH" --once       # now seen advances; next poll is silent again
run "$WATCH" --once

run "$WATCH" status
echo
echo "=== final seen-state files ==="
for f in "$CASE/state/.github-watch-seen"/* ; do
  case "$f" in *.tmp.*) continue ;; esac
  echo "--- $f ---"; cat "$f"; echo
done
- Outcome: 🔧 1 issue found → auto-fixed ✅ across 2 runs (26m26s)

Pipeline

Updates from git push no-mistakes

✅ **intent** - passed

✅ No issues found.

🔧 **Rebase** - 2 issues found → auto-fixed ✅
  • ⚠️ AGENTS.md - merge conflict rebasing onto origin/main
  • ⚠️ README.md - merge conflict rebasing onto origin/main

🔧 Fix applied.
✅ Re-checked - no issues remain.

⏭️ **Review** - skipped
  • ℹ️ bin/fm-github-watch.sh:81 - The default IGNORE_CHECKS regex bakes in a project-specific check name ("PR must be raised via no-mistakes", the #293 fork-routing gap) for a tool that is otherwise positioned as shared/general (get_contributor's comment says "a shared tool should poll whoever is logged in"). It is harmless to other users (no check collides with that name), configurable via FM_GH_IGNORE_CHECKS, and explicitly required by the captain's kunchenguid workflow, but shipping it as the default rather than opt-in is a product call worth confirming for a committed shared script.
  • ℹ️ bin/fm-github-watch.sh:460 - get_contributor is resolved twice per poll cycle: once here in poll_once (to pass into process_pr) and again inside discover_prs (line 231). When the contributor is unconfigured this means two gh api user calls per sweep. discover_prs has a single caller, so it could accept the contributor as an argument and reuse the value poll_once already resolved, removing the duplicate resolution.
🔧 **Test** - 1 issue found → auto-fixed ✅
  • 🚨 tests failed with exit code 1
  • command -v tmux >/dev/null || { echo "tmux is required for e2e tests" >&2; exit 1; }; tmux -V; rc=0; for t in tests/*.test.sh; do echo "== $t =="; bash "$t" || rc=1; done; exit "$rc"

🔧 Fix: fix(teardown): support pre-2.38 git in content-landed check
✅ Re-checked - no issues remain.

  • command -v tmux >/dev/null || { echo "tmux is required for e2e tests" >&2; exit 1; }; tmux -V; rc=0; for t in tests/*.test.sh; do echo "== $t =="; bash "$t" || rc=1; done; exit "$rc"
  • bash tests/fm-github-watch.test.sh (20/20 TAP assertions, EXIT=0): filter toggles, baseline silence, comment/review/CI/merge events, lossless re-detect on failed seen write, CLOSED non-terminal, re-probe windowing, config round-trip, CI debounce + roll-up precedence, all-filters-off mute, parallel-poll seen isolation, schema re-baseline, fork-routing gap exclusion, transient API error skip
  • bash tests/fm-watcher-lock.test.sh (21 ok, EXIT=0)
  • bash tests/fm-x-mode.test.sh (25 ok, EXIT=0)
  • bash tests/fm-watch-triage.test.sh (all ok, EXIT=0)
  • End-to-end demo /tmp/no-mistakes-evidence/01KWCMH0E270PV2K0YASY7JW6M/ghwatch-e2e-demo.sh driving the real bin/fm-github-watch.sh through a fake gh: contributor get/set, filter list/on/off, status; baseline poll silent then seen PRs=3; COMMENT(2)+REVIEW(1)+CI(->failure)+MERGED(#153) fire on transitions; read-only seen dir forces failed seen writes and the events re-fire every cycle (lossless), then go silent once writable again; final seen files show #153 state=MERGED with correct high-water marks
  • git status --porcelain confirming the worktree is clean after testing (no transient artifacts leaked)
✅ **Document** - passed

✅ No issues found.

✅ **Lint** - passed

✅ No issues found.

✅ **Push** - passed

✅ No issues found.

@e-jung

e-jung commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

Parallelized the per-PR sweep

--once now polls PRs concurrently with a bounded counting semaphore (FM_GH_CONCURRENCY, default 8) instead of serially. Each worker owns its own per-PR seen file, so the losslessness invariant (events print before the seen marker advances) holds per-worker exactly; a final wait settles all workers before detect_left_open scans the seen dir.

Measured on the live fleet (~22 open PRs): serial --once was ~47.6s → parallel is ~8.6s (5.5x faster). Stable across repeated runs (8.2–8.9s).

This now fits the watcher's 30s check-script kill limit with comfortable headroom, so the script is finally usable as a daemon state/*.check.sh plugin (previously it was timeout-killed every cycle and delivered nothing).

Tests: extended tests/fm-github-watch.test.sh with a parallel-mode losslessness case (12 concurrently-polled PRs, read-only-seen-dir trick asserts events still print before seen advances, plus no cross-contamination of seen files). shellcheck clean; all suites green.

No behavioral changes beyond speed — same filters, same seen-state shape, same CLI. $FM_GH_CONCURRENCY tunes the worker cap (0/non-numeric falls back to 8).

@e-jung

e-jung commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

Update: two refinements that make fm-github-watch.sh viable to leave wired in as a daemon check-script plugin (pushed in 9fe2029).

1. Silent re-baseline on seen-schema migration

A SEEN_SCHEMA version is now stamped into each seen-state file. When a prior file's schema doesn't match the current version, the first poll rewrites it at the current schema (carrying forward prior values) and emits nothing — so deploying a schema change (e.g. the CI debounce) no longer floods once as every PR appears to "transition" off the old format. Applied in both process_pr (open PRs) and detect_left_open (PRs that left the open set). Only subsequent real transitions fire.

Regression test: test_silent_baseline_on_schema_migration — an old-format seen file (stale multiset ci=, no schema=) baselines silently on first poll, then a real success -> failure transition still fires.

2. Exclude FM_GH_IGNORE_CHECKS from the CI roll-up

New config (default: PR must be raised via no-mistakes) — a regex of check-run names dropped from the roll-up before it's computed. A PR that fails only that known fork-routing signature gap (#293) now rolls up to green when its real checks pass, instead of a false failure. Only the roll-up applies the filter; the raw check list and the other filters (comments/reviews/merge) are unchanged. Set it to a custom regex, or empty to disable.

The regex is embedded into the jq program (escaped for a JSON string literal) because gh api has no --arg binding for its --jq filter; a malformed regex fails open to empty (carried forward), never crashing the poll.

Regression test: test_ci_ignore_excludes_known_gap_check — 3 real checks pass + the gap check fails → rolls up to success; flipping a real check to failure still surfaces failure.

Other

  • Adopted set -euo pipefail; fail-open design preserved via existing || true / || return 0 guards.
  • shellcheck-clean; all 19 ghwatch tests green (17 prior + 2 new).
  • Parallel poll (<15s) and per-PR losslessness unchanged.

AI disclosure: Human-reviewed. The PR must be raised via no-mistakes CI check fails on the fork-routing gap #293 (known); the 3 real checks should pass.

@e-jung e-jung changed the title feat(supervision): GitHub events watcher (comments/CI/reviews) with CLI filter knobs feat(ghwatch): GitHub events watcher for PR comments, CI, and reviews Jun 29, 2026
e-jung added 18 commits June 30, 2026 16:06
Address review findings from the no-mistakes pipeline:
- Emit events per-PR (print then advance that PR's seen) instead of
  buffering all-at-end: a watcher 30s timeout now surfaces partial progress
  rather than killing the poll with zero output every cycle.
- atomic_write (temp + rename) so a read-only/crashing state dir never leaves
  a partial seen file; bash builtin printf write()s immediately, keeping the
  per-PR ordering lossless even under SIGKILL.
- Fix open_basenames membership (space-padded) so the last open PR is skipped
  in detect_left_open instead of always re-checked.
- usage() stops before set -u so --help no longer leaks code.
- Add review-detection and ci-detection tests.
Address the second review pass:
- detect_left_open now honors the merge filter (filter merge off suppresses
  merge/close events instead of always emitting them).
- build_seen carries the CI signature forward across a transiently-empty
  fetch (new commit whose check-runs have not populated), so a later status
  change still fires instead of being silently dropped.
- Header: note --daemon for large fleets (check-script timeout) and that ci
  reads the Checks API only.
- Tests: fix a vacuous config assertion; add regression tests for the
  merge-filter suppression and the CI carry-forward window.
…docs

Address the third review pass:
- Distinguish a configured-empty filters= (all filters off -> watcher muted)
  from a missing key (defaults to all on); previously 'all off' reset to
  defaults, so the captain could not actually mute the watcher.
- Derive the contributor from `gh auth` when unset instead of hardcoding a
  username in a shared public-repo script; FM_GH_CONTRIBUTOR and the
  configured value still take precedence.
- count_reviews now excludes the contributor's own reviews (keeps bot and
  maintainer reviews), matching count_comments.
- Document state/.github-watch-config and state/.github-watch-seen/ in the
  AGENTS.md state layout; add a README toolbelt row + env knobs.
- Fix a stale test comment (apply_pending -> atomic_write); add a regression
  test for the all-filters-off mute.
Address the fourth review pass:
- Append ?per_page=100 to the comments, reviews, and check-runs API calls.
  GitHub list endpoints default to 30 items per page, so counts and the CI
  signature silently capped at 30 on active PRs; per_page=100 lifts that.
- The comment event said 'new maintainer comment(s)' but the count includes
  bot comments (coderabbit, greptile) by design, so relabel to 'comment(s)'.
- Update the fake gh in tests to strip the query string before matching.
Address the fifth review pass:
- discover_prs skips discovery when the contributor resolves empty (gh
  missing/unauthed), so an empty --author is never passed to the search
  (an empty author qualifier would match open PRs across every repo).
- atomic_write now stages its temp in a hidden SEEN_DIR/.tmp subdir (same
  filesystem, so the rename stays atomic) so a crash-leaked temp never
  matches detect_left_open's glob and cause a duplicate merge/close event.
- Tests pin the contributor via FM_GH_CONTRIBUTOR so they no longer depend
  on the fake gh implementing `api user`.
Address the sixth review pass:
- discover_prs passes --limit 1000 so open PRs beyond the gh search default
  of 30 are still discovered (the header advertises large-fleet use).
- Document that comment/review/check-run counts cap at 100 per type per PR
  (per_page=100, no pagination) alongside the existing Checks-API caveat.
…count

Address the seventh review pass:
- detect_left_open now treats only MERGED as terminal. CLOSED PRs are
  re-probed each cycle, so a close->reopen->merge between polls still emits
  MERGED instead of being swallowed. Repeat CLOSED events are suppressed by
  skipping when p_state equals seen_state.
- cmd_status excludes the .tmp staging subdir so leaked temps never inflate
  the 'seen PRs' count; detect_left_open also skips *.tmp.* defensively.
- Add a regression test for the close->merge path.
Address the eighth review pass:
- build_seen carries the prior state forward when pr_state returns empty
  (transient gh failure), matching the ci/comments/reviews carry-forward,
  so a single failed state fetch never erases the tracked OPEN/CLOSED/MERGED.
- detect_left_open skips the rewrite entirely when p_state is empty, so it
  cannot clobber a real state with an empty value and trigger a duplicate.
Address the ninth review pass:
- A closed_at timestamp (set when CLOSED is emitted) bounds how long a
  CLOSED PR is re-probed for a close->reopen->merge. Past FM_GH_CLOSE_REPROBE_SECS
  (default 7200s) it is treated as settled, so accumulated closed PRs cannot
  push the fleet past GitHub's rate limit. The default window is generous
  enough that a prompt close->merge still fires.
…SC2015

CI's shellcheck (v0.10.0) flags SC2015 on `A && B || C` patterns that local
0.11.0 missed; the existing repo scripts avoid the pattern. Rewrite the five
guard/assert occurrences in the new files to explicit if-statements.
A --once sweep polled each PR serially: up to 5 gh calls per PR, one PR at a
time. At the captain's ~22 open PRs that cost ~47s, over the watcher's 30s
check-script kill limit, so a daemon check-script plugin got killed every
cycle and delivered nothing.

Parallelize the per-PR loop in poll_once with a bounded counting semaphore
(FM_GH_CONCURRENCY, default 8; >=1, 0/non-numeric falls back to 8). Each
worker is a subshell running process_pr and owns its own per-PR seen file
(seen_file is keyed by owner/repo/pr), so concurrent seen writes never collide.
The losslessness invariant (print before seen) holds per-worker exactly, and
each worker emits its whole event block in a single printf (atomic under
PIPE_BUF), so stdout lines never interleave. A final wait settles all workers
before detect_left_open scans the seen dir.

Measured on the live fleet (~22 PRs): 47.6s -> ~8.6s (5.5x), comfortably under
both the 15s target and the 30s kill limit.

Adds a parallel-mode regression test asserting events still print before seen
advances (via the read-only-seen-dir trick) across 12 concurrently-polled PRs,
and that workers never cross-contaminate each other's seen files.
…ition

The ci filter built a sorted multiset of every check-run conclusion and fired
whenever that multiset changed, so a PR whose checks complete at staggered
times fired once per check landing (observed live: no-mistakes#312 fired
several times as its 7 check-runs trickled in). The captain wants "is the PR
green or not", not per-check noise.

Replace the multiset with a single rolled-up overall state per PR, computed in
one jq pass over the check-runs:
  success  every non-neutral check passed (success/skipped), none still running
  failure  at least one non-neutral check failed (failure/timed_out/cancelled/
           action_required/stale)
  pending  at least one non-neutral check is still queued/in_progress
  neutral  only neutral check-runs present
  (empty)  no check-runs reported yet
Fire one event only when this state flips vs the seen marker
(`CI: <repo>#<pr> -> green|failure|pending|neutral`); stay silent while it is
unchanged. Failure beats pending (a red check already settles the outcome),
matching GitHub's own combined-status precedence.

Losslessness is preserved exactly: the seen marker is still written AFTER the
event prints, the empty-response carry-forward for a just-pushed SHA whose
checks have not populated yet still holds (so a later transition fires), and
per-PR seen files stay independently owned by each parallel worker. The bounded
concurrent poll is untouched.

The test fake gh now runs the watcher's real --jq roll-up over JSON check-run
fixtures (via jq), so the roll-up logic itself is exercised, not just the
comparison. New tests cover the staggered-checks debounce (7 checks trickling
pending->success fires exactly once, not once per check), the
pending->green / green->green / green->failure transitions, and roll-up
precedence (failure beats pending; neutral checks ignored). Existing CI tests
moved to the rolled-up state model; the parallel losslessness test is unchanged
and still passes.

Measured on the live fleet (23 open PRs): --once = 7.74s, well under the 15s
target and the 30s check-script kill limit.
… use

Two refinements that make ghwatch safe to leave wired in as a daemon
check-script plugin, plus strict-mode hardening.

1. Silent re-baseline on seen-schema migration. A SEEN_SCHEMA version is now
   stamped into each seen file. When a prior file's schema does not match the
   current version, the first poll rewrites it at the current schema with
   carried-forward values and emits NOTHING -- so deploying a schema change
   (e.g. the ci debounce) no longer floods once as every PR appears to
   "transition" off the old format. Applied in both process_pr (open PRs) and
   detect_left_open (PRs that left the open set). Only subsequent real
   transitions fire.

2. Exclude FM_GH_IGNORE_CHECKS names from the CI roll-up (default: the known
   fork-routing signature gap #293, "PR must be raised via no-mistakes"). A PR
   that fails only that check now rolls up to green when its real checks pass,
   instead of a false failure. Only the roll-up applies the filter; the raw
   check list and the other filters are unchanged. Set FM_GH_IGNORE_CHECKS to a
   custom regex, or empty to disable. The regex is embedded into the jq program
   (gh api has no --arg binding for --jq); a malformed regex fails open to empty
   (carried forward), never crashing the poll.

Also adopt `set -euo pipefail`; the fail-open design is preserved via the
existing `|| true` / `|| return 0` guards (full suite green under strict mode).

Tests: silent-baseline-on-migration (no flood on schema change; a real
transition still fires afterward); gap-excluded PR rolls up green while a real
failure still surfaces. All 17 prior ghwatch tests stay green.
…sing them as data

ghc() did `command gh "$@" 2>/dev/null || true`: it swallowed stderr and the
exit code, but on an API error (e.g. a transient 401 "Bad credentials") gh api
writes the error JSON to stdout — bypassing --jq — which the script then parsed
as CI data and fired as an event (observed:
"CI: manaflow-ai/cmux#6570 -> { \"message\": \"Bad credentials\" ... }").

Detect the error and SKIP that PR for the cycle: never surface an API error as
an event. ghc() now captures gh's stdout and returns non-zero (suppressing the
body) when gh exits non-zero OR its output is a GitHub error body (a JSON object
with top-level "message" + "documentation_url"). process_pr treats any
non-zero probe return as "skip this PR this cycle": emit nothing, do not write
seen, so the next cycle re-evaluates from the same baseline (lossless).

Also guards the discovery fetch (abort the cycle on failure instead of misreading
an empty list as "everyone merged") and get_contributor's best-effort user
lookup (no set -e trip on a blip).

Tests: new test injects a 401 error JSON via the fake gh (verified it fails on
the old ghc with the exact bogus CI event, passes with the fix) plus a recovery
cycle proving the real transition still fires. shellcheck clean; all 20 green.
@e-jung e-jung changed the title feat(ghwatch): GitHub events watcher for PR comments, CI, and reviews feat(supervision): GitHub PR events watcher (comments, CI, reviews, merges) Jun 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant