Bound PR-poller GitHub GraphQL budget: dormancy-tiered cadence + kill-switch#808
Merged
Conversation
Drop ~/.argus/pr-poller.disabled to pause the PR-status poller with zero gh queries; remove it to resume. Toggling needs no daemon restart. Gates pollPRStatesOnce before any DB read or GraphQL call, logged each paused cycle. Interim relief for the GitHub GraphQL budget burn while the dormancy-tiered cadence fix lands. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
… budget GitHub's GraphQL rate limit is cost-based (~1 unit per branch resolved), not request-based, so #782's per-repo batching cut HTTP round-trips but not the budget: ~82 non-terminal branches re-polled every 60s ≈ 4,900 points/hr ≈ the whole 5,000/hr ceiling, ~95% of it spent re-confirming that dormant PR-less branches still have no PR. Tier per-task cadence by most-recent lifecycle ts (max ended/started/ created): <1h every cycle, 1h-24h every 5th, 24h-7d every 15th, >7d every 30th. An open PR (draft/awaiting-review/changes-requested/approved) floors to stride 1 so external review/merge still surfaces in ~60s. Selection is phased by fnv(id) so a tier spreads across its stride window. Expected steady state ~10 lookups/cycle (~600/hr) vs ~82, improving as tasks idle. Cadence-deferred tasks count toward skipped (keeps eligible==written+ errored). pollCycle is in-memory (re-phases on bounce). Bundles the pr-poller.disabled kill-switch from 25827d8b under one OpenSpec change. Specs: openspec/changes/throttle-dormant-pr-poll (pr-status delta) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…spec 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
5c61110 to
08312ec
Compare
Merging this branch will increase overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. Changed unit test files
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The daemon PR-status poller re-queries every eligible branch every 60s. On a real working set (208 tasks, 82 non-terminal branches) that is ~82 GraphQL branch-lookups/minute ≈ ~4,900 GitHub GraphQL points/hour — essentially the entire 5,000/hr budget, leaving the GraphQL bucket exhausted for everything else.
The earlier fixes (#773 terminal-state skip, #782 per-repo batching) helped but did not solve it: GitHub's GraphQL budget is cost-based (~1 unit per branch resolved), not request-based, so collapsing 82 lookups into one HTTP request still costs ~82 units. Measured live with both fixes deployed: ~55–97 graphql/min, on pace to drain the budget every hour. 78 of the 82 eligible branches had no PR at all (
none, non-terminal so never skipped) and 62 had no activity in over a week — ~95% of the budget was spent re-confirming that dormant branches still have no PR.What changes
pollPRStatesOnce.prPollCadenceStridederives a per-task stride from the most recent lifecycle timestamp (max(ended_at, started_at, created_at)): within 1h → every cycle, 1h–24h → every 5th, 24h–7d → every 15th, >7d → every 30th.draft/awaiting-review/changes-requested/approved) is polled every cycle regardless of age, so an externally-merged/approved PR still surfaces within ~60s.fnv(task-id)(prCadenceSelects) so each tier's tasks distribute across the stride window instead of a thundering herd on one cycle.pr-poller.disabledsentinel file under the data dir pauses the poller (zero GitHub queries) with no daemon restart; remove it to resume.Expected steady-state cost on the current working set: ~10 branch-lookups/cycle (≈600 pts/hr) instead of ~82 (≈4,900/hr) — and it improves as tasks idle. Cadence-deferred tasks count toward
skipped, preserving thelen(eligible) == written + erroredcycle invariant.pollCycleis in-memory (re-phases harmlessly on a daemon bounce). No schema change — cadence is derived from existingtaskstimestamps and the existingtask_metaprcache.Testing
prPollCadenceStride(every tier + open-PR floor + most-recent-ts wins),prCadenceSelects(stride-1 always, stride-30 once-per-window), and spread-across-cycles.make pre-prgreen: build, vet, fmt-check, lint-pr, test-cover-gate (89.6% ≥ 88 floor). (vulnreports only pre-existing stdlib advisories, which CI runs continue-on-error.)Specs: OpenSpec change
throttle-dormant-pr-poll(delta on thepr-statuscapability).🤖 Generated with Claude Code
Co-Authored-By: Claude noreply@anthropic.com