Skip to content

Bound PR-poller GitHub GraphQL budget: dormancy-tiered cadence + kill-switch#808

Merged
anutron merged 3 commits into
masterfrom
argus/fix-polling-issue-eating
Jun 25, 2026
Merged

Bound PR-poller GitHub GraphQL budget: dormancy-tiered cadence + kill-switch#808
anutron merged 3 commits into
masterfrom
argus/fix-polling-issue-eating

Conversation

@anutron

@anutron anutron commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Why

The daemon PR-status poller re-queries every eligible branch every 60s. On a real working set (208 tasks, 82 non-terminal branches) that is ~82 GraphQL branch-lookups/minute ≈ ~4,900 GitHub GraphQL points/hour — essentially the entire 5,000/hr budget, leaving the GraphQL bucket exhausted for everything else.

The earlier fixes (#773 terminal-state skip, #782 per-repo batching) helped but did not solve it: GitHub's GraphQL budget is cost-based (~1 unit per branch resolved), not request-based, so collapsing 82 lookups into one HTTP request still costs ~82 units. Measured live with both fixes deployed: ~55–97 graphql/min, on pace to drain the budget every hour. 78 of the 82 eligible branches had no PR at all (none, non-terminal so never skipped) and 62 had no activity in over a week — ~95% of the budget was spent re-confirming that dormant branches still have no PR.

What changes

  • Dormancy-tiered cadence in pollPRStatesOnce. prPollCadenceStride derives a per-task stride from the most recent lifecycle timestamp (max(ended_at, started_at, created_at)): within 1h → every cycle, 1h–24h → every 5th, 24h–7d → every 15th, >7d → every 30th.
  • Open-PR hot floor: a branch whose cached state is an open PR (draft/awaiting-review/changes-requested/approved) is polled every cycle regardless of age, so an externally-merged/approved PR still surfaces within ~60s.
  • Spread by fnv(task-id) (prCadenceSelects) so each tier's tasks distribute across the stride window instead of a thundering herd on one cycle.
  • Operator kill-switch: a pr-poller.disabled sentinel file under the data dir pauses the poller (zero GitHub queries) with no daemon restart; remove it to resume.

Expected steady-state cost on the current working set: ~10 branch-lookups/cycle (≈600 pts/hr) instead of ~82 (≈4,900/hr) — and it improves as tasks idle. Cadence-deferred tasks count toward skipped, preserving the len(eligible) == written + errored cycle invariant. pollCycle is in-memory (re-phases harmlessly on a daemon bounce). No schema change — cadence is derived from existing tasks timestamps and the existing task_meta pr cache.

Testing

  • New table tests for prPollCadenceStride (every tier + open-PR floor + most-recent-ts wins), prCadenceSelects (stride-1 always, stride-30 once-per-window), and spread-across-cycles.
  • Integration tests: a dormant unselected task issues no GraphQL query and keeps its cache; an open-PR/old task is still polled; kill-switch pause + resume.
  • make pre-pr green: build, vet, fmt-check, lint-pr, test-cover-gate (89.6% ≥ 88 floor). (vuln reports only pre-existing stdlib advisories, which CI runs continue-on-error.)

Specs: OpenSpec change throttle-dormant-pr-poll (delta on the pr-status capability).

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

anutron and others added 3 commits June 24, 2026 23:39
Drop ~/.argus/pr-poller.disabled to pause the PR-status poller with zero
gh queries; remove it to resume. Toggling needs no daemon restart. Gates
pollPRStatesOnce before any DB read or GraphQL call, logged each paused
cycle. Interim relief for the GitHub GraphQL budget burn while the
dormancy-tiered cadence fix lands.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
… budget

GitHub's GraphQL rate limit is cost-based (~1 unit per branch resolved),
not request-based, so #782's per-repo batching cut HTTP round-trips but
not the budget: ~82 non-terminal branches re-polled every 60s ≈ 4,900
points/hr ≈ the whole 5,000/hr ceiling, ~95% of it spent re-confirming
that dormant PR-less branches still have no PR.

Tier per-task cadence by most-recent lifecycle ts (max ended/started/
created): <1h every cycle, 1h-24h every 5th, 24h-7d every 15th, >7d every
30th. An open PR (draft/awaiting-review/changes-requested/approved) floors
to stride 1 so external review/merge still surfaces in ~60s. Selection is
phased by fnv(id) so a tier spreads across its stride window. Expected
steady state ~10 lookups/cycle (~600/hr) vs ~82, improving as tasks idle.

Cadence-deferred tasks count toward skipped (keeps eligible==written+
errored). pollCycle is in-memory (re-phases on bounce). Bundles the
pr-poller.disabled kill-switch from 25827d8b under one OpenSpec change.

Specs: openspec/changes/throttle-dormant-pr-poll (pr-status delta)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…spec

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@anutron anutron force-pushed the argus/fix-polling-issue-eating branch from 5c61110 to 08312ec Compare June 25, 2026 06:42
@github-actions

Copy link
Copy Markdown

Merging this branch will increase overall coverage

Impacted Packages Coverage Δ 🤖
github.com/drn/argus/internal/daemon 85.52% (+0.37%) 👍

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/drn/argus/internal/daemon/daemon.go 84.81% (+0.85%) 428 (+29) 363 (+28) 65 (+1) 👍

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Changed unit test files

  • github.com/drn/argus/internal/daemon/daemon_test.go
  • github.com/drn/argus/internal/daemon/pr_poll_test.go

@anutron anutron merged commit fe8856e into master Jun 25, 2026
1 check passed
@anutron anutron deleted the argus/fix-polling-issue-eating branch June 25, 2026 07:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant