You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Phase 6 (#144) ships runtime-check-private-freshness.yml — a Mondays-08:00-UTC alarm that opens an issue when path-scoped drift exists between the manifest's pinned private.ref and claude-configs/main. The alarm only emits a signal when there's drift; "no issue" can mean genuinely fresh, OR auth-broke, OR cron-missed-to-fire — and from the outside all three look identical.
This issue tracks building a heartbeat trip-wire so that "the alarm is silent" becomes a detectable state distinct from "everything is fresh."
Hard SLA — blocks Phase 7
This issue must land before Phase 7 begins. The freshness alarm is informational on day 1 (it ships in Phase 6 as a baseline) but cannot be treated as load-bearing for any operational decision until heartbeat detection exists. Without that, operators have no way to distinguish a quiet alarm from a broken alarm. Phase 7 work that depends on claude-configs freshness assumptions (image rebuilds triggered by config drift, auto-bumps of pinned private.ref, etc.) cannot proceed safely until heartbeat is in place.
Motivation
Inquisitor Pass 2 on Phase 6's plan (2026-05-05) flagged Charge 4: the heartbeat step in Step 6.5.1 is a stub — it gh api-reads issues with a fixed label and pipes to head -1 || true, producing zero observable output that lives past the workflow run. The accompanying comment in the plan explicitly says "Heartbeat detection is a follow-up; this stub captures the intent." The stub was retained in Phase 6 to acknowledge the silent-failure surface; this issue commits to making it real.
Three resolutions were considered (Pass 2 triage):
B. Remove stub, track as follow-up with hard SLA — chosen; this issue is the tracker
C. Don't ship freshness alarm at all in Phase 6 — rejected; baseline visibility is still valuable
Acceptance Criteria
runtime-check-private-freshness.yml writes a deterministic timestamp on every run (success OR failure) — either:
(a) updates a fixed "heartbeat anchor" issue with a label freshness-alarm-heartbeat, OR
(b) updates a status check on a known commit (runtime-build/freshness-heartbeat), OR
(c) writes a /heartbeats/freshness.json file to a dedicated repo / a release asset / a GHCR latest-tagged image label
An external monitor (a separate workflow scheduled at a different cron offset, OR a GitHub status check freshness rule, OR a manual operator runbook) detects when the heartbeat hasn't updated in the past 8 days (the cron is weekly, so 8 days = 2 missed cycles)
The detection step opens a NEW issue or fires a NEW alarm — distinct from the drift-detected alarm — when stale
CLAUDE.md "CI Runtime" section documents the heartbeat anchor, where to look, and the diagnostic runbook for "alarm appears silent"
At least one deliberate-failure rehearsal: disable the workflow's auth (e.g., temporarily revoke its App installation), run a cron cycle, confirm the heartbeat-detection step fires within 8-16 days
The runtime-check-private-freshness.yml workflow's Self-test — assert workflow ran (heartbeat) stub step is removed as part of this issue's PR
Technical Notes
Anchor selection. Option (a) is simplest but couples to GitHub Issues (which can be wiped, archived, or have search behavior change). Option (b) is the most native — status checks have first-class observability in GitHub's UI. Option (c) requires more infra. Recommend (b) if the deployment target is GitHub-only; (a) if you want zero new dependencies.
Detection cadence. External monitor needs to run more frequently than the alarm itself (otherwise it can't catch "alarm missed its slot"). A separate GHA cron at offset (e.g., Tuesdays + Fridays) running at the same cadence as the alarm-watch threshold works.
Distinguish from drift signal. The drift-detected alarm and the heartbeat-stale alarm should be visibly different — different issue labels, different titles ("Phase 6: private-config drift" vs "Phase 6: freshness alarm itself appears stale"). An operator skimming the inbox should not confuse "alarm fired" with "alarm broken."
Auth-failure self-disclosure. If the workflow can't even start (App permissions revoked, secret missing), GHA's normal failure-notification path catches it via run-fail emails. The heartbeat catches the harder case: workflow runs successfully BUT silently doesn't do its job (e.g., the path-scope filter accidentally skips everything, or git log returns 0 results due to a bad ref).
General observability for OTHER Phase 6 workflows (runtime-prune-pending.yml, runtime-rollback.yml, STAGE 5) — those have their own observability surfaces and aren't blocked on this work
Cron-firing infrastructure validation (whether GHA itself reliably fires scheduled workflows at the requested cadence) — pre-existing concern outside this issue's scope
Summary
Phase 6 (#144) ships
runtime-check-private-freshness.yml— a Mondays-08:00-UTC alarm that opens an issue when path-scoped drift exists between the manifest's pinnedprivate.refandclaude-configs/main. The alarm only emits a signal when there's drift; "no issue" can mean genuinely fresh, OR auth-broke, OR cron-missed-to-fire — and from the outside all three look identical.This issue tracks building a heartbeat trip-wire so that "the alarm is silent" becomes a detectable state distinct from "everything is fresh."
Hard SLA — blocks Phase 7
This issue must land before Phase 7 begins. The freshness alarm is informational on day 1 (it ships in Phase 6 as a baseline) but cannot be treated as load-bearing for any operational decision until heartbeat detection exists. Without that, operators have no way to distinguish a quiet alarm from a broken alarm. Phase 7 work that depends on
claude-configsfreshness assumptions (image rebuilds triggered by config drift, auto-bumps of pinnedprivate.ref, etc.) cannot proceed safely until heartbeat is in place.Motivation
Inquisitor Pass 2 on Phase 6's plan (2026-05-05) flagged Charge 4: the heartbeat step in Step 6.5.1 is a stub — it
gh api-reads issues with a fixed label and pipes tohead -1 || true, producing zero observable output that lives past the workflow run. The accompanying comment in the plan explicitly says "Heartbeat detection is a follow-up; this stub captures the intent." The stub was retained in Phase 6 to acknowledge the silent-failure surface; this issue commits to making it real.Three resolutions were considered (Pass 2 triage):
Acceptance Criteria
runtime-check-private-freshness.ymlwrites a deterministic timestamp on every run (success OR failure) — either:freshness-alarm-heartbeat, ORruntime-build/freshness-heartbeat), OR/heartbeats/freshness.jsonfile to a dedicated repo / a release asset / a GHCRlatest-tagged image labelruntime-check-private-freshness.ymlworkflow'sSelf-test — assert workflow ran (heartbeat)stub step is removed as part of this issue's PRTechnical Notes
git logreturns 0 results due to a bad ref).Out of Scope
runtime-prune-pending.yml,runtime-rollback.yml, STAGE 5) — those have their own observability surfaces and aren't blocked on this workReferences
runtime/scripts/...— none yet; new heartbeat-detection script lives here when implemented🤖 Generated by Claude Code on behalf of @cbeaulieu-gt