-
-
Notifications
You must be signed in to change notification settings - Fork 362
feat: add rate limiting and circuit breaker for /v1/messages endpoint #473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: add rate limiting and circuit breaker for /v1/messages endpoint #473
Conversation
Implements rate limiting middleware and circuit breaker pattern for hard 403 errors. Rate Limiting: - Add token bucket rate limiter with per-IP, per-auth, and per-model dimensions - Configurable via config.yaml with hot-reload support - RequestBodyCaptureMiddleware extracts model from request body Circuit Breaker: - Detect hard 403 errors (CONSUMER_INVALID, SERVICE_DISABLED, PERMISSION_DENIED) - Open circuit breaker with configurable cooldown (default 10 min) - Auto-close on successful requests or cooldown expiry - Thread-safe global config with RWMutex protection Bug Fixes: - Fix formatRetryAfter producing corrupted headers (use strconv.Itoa) - Fix race condition in pickNext calling IsCircuitBreakerOpen under read lock - Fix circuit breaker never tripping for model-bearing requests in MarkResult - Add distinct error (503) when all auths have open circuit breakers Closes router-for-me#443 Amp-Thread-ID: https://ampcode.com/threads/T-019b033d-1f0d-7383-8ab2-db06e29af641
Summary of ChangesHello @edlsh, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the stability and resilience of the API by introducing robust rate limiting and a circuit breaker pattern. The rate limiter prevents abuse and overload by controlling request rates based on IP, authentication key, and model, while the circuit breaker intelligently handles persistent upstream errors, particularly hard 403s, to avoid hammering failing services. These features are fully configurable and hot-reloadable, allowing for flexible and dynamic operational control. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces significant resilience features: rate limiting and a circuit breaker for hard 403 errors. The implementation is robust, thread-safe, and configurable with hot-reload support, which is excellent. The rate limiter correctly handles per-IP, per-auth, and per-model dimensions, and the new RequestBodyCaptureMiddleware is a good way to enable per-model limiting. The circuit breaker logic correctly identifies hard 403s and provides a distinct 503 error when all authentications are unavailable, improving client-side error handling.
I've identified a few areas for improvement:
- There is a potential memory leak in the
InMemoryLimiterStoreas it doesn't evict old token buckets. - A minor performance issue in
RequestBodyCaptureMiddlewaredue to an unnecessary string conversion. - The use of a
gotostatement in the auth manager could be refactored for better readability.
Overall, this is a high-quality contribution that significantly improves the service's stability and protection against client floods and upstream failures. My comments are focused on refining the implementation for long-term performance and maintainability.
- Add stale bucket cleanup to InMemoryLimiterStore (memory leak fix) - Use io.NopCloser(bytes.NewReader) instead of string copy - Replace goto with boolean flag for better readability
Summary
Implements rate limiting middleware and circuit breaker pattern for hard 403 errors.
Closes #443
Changes
Rate Limiting
config.yamlwith hot-reload supportRequestBodyCaptureMiddlewareextracts model from request body for per-model limitingCircuit Breaker
CONSUMER_INVALID,SERVICE_DISABLED,PERMISSION_DENIEDsync.RWMutexprotectionBug Fixes
formatRetryAfterproducing corrupted Retry-After headers (now usesstrconv.Itoa)pickNextcallingIsCircuitBreakerOpenunder read lockMarkResultauth_unavailable, HTTP 503) when all auths have open circuit breakersFiles Changed
internal/api/middleware/rate_limit.gointernal/api/middleware/rate_limit_test.gosdk/cliproxy/auth/circuit_breaker.gosdk/cliproxy/auth/circuit_breaker_test.gosdk/cliproxy/auth/manager.gosdk/cliproxy/auth/types.gointernal/api/server.gointernal/config/config.goconfig.example.yamlTesting
go test ./sdk/cliproxy/auth/... ./internal/api/middleware/... -vAll tests pass.