fix(monitor): cross-session rate-limit retry coordination (#3931)#4059
Merged
Conversation
Parallel CC sessions hitting rate limits independently amplify the problem by retrying concurrently. Add RateLimitCoordinator to serialize retries: - Concurrency cap (default: 1) limits simultaneous rate-limit retries - Stagger delay (default: 2s) spaces consecutive retry starts - Queued sessions wait for a slot before retrying - Sessions killed while queued are dequeued via removeSession hook New files: - src/rate-limit-coordinator.ts — semaphore-style coordinator - src/__tests__/rate-limit-coordinator.test.ts — 8 tests Modified: - src/monitor.ts — handleRateLimitSignal uses coordinator.acquire/release
Contributor
There was a problem hiding this comment.
Review: PR #4059 — Rate-Limit Retry Coordination (#3931)
Verdict: ✅ Approved
Clean semaphore implementation. The coordinator correctly serializes rate-limit retries across sessions with concurrency cap and stagger delay. Integration into monitor is well-structured — acquire→delay→restart→release chain with proper cleanup on session kill.
Key Points
- Semaphore pattern: maxConcurrent=1 default means only one session retries at a time
- Stagger delay: 2s minimum between consecutive retry starts — prevents retry storms
- Cleanup: dequeue on session kill, release on both success and error paths
- Observability: structured logging at acquire/release/dequeue with session context
- Tests: 8 unit tests covering concurrency, FIFO, dequeue, stagger, defaults, edge cases
Minor Nits (non-blocking)
lastReleaseAtis set then immediately read inrelease()—elapsedis always ~0, so stagger is effectively a fixed delay. Behavior is correct, but the variable name is misleading.- No integration test for the monitor.ts acquire→restart→release chain. Acceptable given mocking complexity.
9 Merge Gates — All Pass
- ✅ Review completed
- ✅ No conflicts — MERGEABLE
- ✅ CI green — all checks pass
- ✅ No regressions
- ✅ Unit tests (+8 new)
- ✅ E2E pass
- ✅ Documented
- ✅ Security clean
- ✅ Targets
develop
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix: Parallel sessions competing for CC rate limit
Closes #3931
Problem
When 3+ parallel
ag runsessions hit the CC rate limit, each retries independently with its own exponential backoff. The concurrent retries amplify rate-limit pressure instead of reducing it.Solution
RateLimitCoordinator— a lightweight semaphore that serializes rate-limit retries across sessions:removeSession()Changes
New files:
src/rate-limit-coordinator.ts— coordinator (124 lines)src/__tests__/rate-limit-coordinator.test.ts— 8 testsModified:
src/monitor.ts—handleRateLimitSignal()usescoordinator.acquire()before delay + restart,coordinator.release()on completion/failure.removeSession()callscoordinator.dequeue()for cleanup.Tests (8 passing)
Verification