Skip to content

Conversation

@edlsh
Copy link

@edlsh edlsh commented Dec 9, 2025

Summary

Implements rate limiting middleware and circuit breaker pattern for hard 403 errors.

Closes #443

Changes

Rate Limiting

  • Add token bucket rate limiter with per-IP, per-auth, and per-model dimensions
  • Configurable via config.yaml with hot-reload support
  • RequestBodyCaptureMiddleware extracts model from request body for per-model limiting

Circuit Breaker

  • Detect hard 403 errors: CONSUMER_INVALID, SERVICE_DISABLED, PERMISSION_DENIED
  • Open circuit breaker with configurable cooldown (default 10 min for hard 403, 30 min for soft 403)
  • Auto-close on successful requests or cooldown expiry
  • Thread-safe global config with sync.RWMutex protection

Bug Fixes

  • Fix formatRetryAfter producing corrupted Retry-After headers (now uses strconv.Itoa)
  • Fix race condition in pickNext calling IsCircuitBreakerOpen under read lock
  • Fix circuit breaker never tripping for model-bearing requests in MarkResult
  • Add distinct error code (auth_unavailable, HTTP 503) when all auths have open circuit breakers

Files Changed

File Description
internal/api/middleware/rate_limit.go New rate limiting middleware
internal/api/middleware/rate_limit_test.go Tests for rate limiter
sdk/cliproxy/auth/circuit_breaker.go Circuit breaker logic
sdk/cliproxy/auth/circuit_breaker_test.go Tests for circuit breaker
sdk/cliproxy/auth/manager.go Integration with auth manager
sdk/cliproxy/auth/types.go CircuitBreakerState type
internal/api/server.go Middleware wiring
internal/config/config.go Rate limit config structs
config.example.yaml Example configuration

Testing

go test ./sdk/cliproxy/auth/... ./internal/api/middleware/... -v

All tests pass.

Implements rate limiting middleware and circuit breaker pattern for hard 403 errors.

Rate Limiting:
- Add token bucket rate limiter with per-IP, per-auth, and per-model dimensions
- Configurable via config.yaml with hot-reload support
- RequestBodyCaptureMiddleware extracts model from request body

Circuit Breaker:
- Detect hard 403 errors (CONSUMER_INVALID, SERVICE_DISABLED, PERMISSION_DENIED)
- Open circuit breaker with configurable cooldown (default 10 min)
- Auto-close on successful requests or cooldown expiry
- Thread-safe global config with RWMutex protection

Bug Fixes:
- Fix formatRetryAfter producing corrupted headers (use strconv.Itoa)
- Fix race condition in pickNext calling IsCircuitBreakerOpen under read lock
- Fix circuit breaker never tripping for model-bearing requests in MarkResult
- Add distinct error (503) when all auths have open circuit breakers

Closes router-for-me#443

Amp-Thread-ID: https://ampcode.com/threads/T-019b033d-1f0d-7383-8ab2-db06e29af641
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @edlsh, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the stability and resilience of the API by introducing robust rate limiting and a circuit breaker pattern. The rate limiter prevents abuse and overload by controlling request rates based on IP, authentication key, and model, while the circuit breaker intelligently handles persistent upstream errors, particularly hard 403s, to avoid hammering failing services. These features are fully configurable and hot-reloadable, allowing for flexible and dynamic operational control.

Highlights

  • Rate Limiting Implementation: Introduced a token bucket rate limiter for the /v1/messages endpoint, supporting per-IP, per-authentication key, and per-model dimensions to prevent client floods.
  • Configurable Rate Limits: Rate limiting parameters are now configurable via config.yaml with hot-reload capabilities, allowing dynamic adjustment of capacities and refill rates.
  • Circuit Breaker for Hard 403s: Implemented a circuit breaker pattern to prevent repeated retries on persistent 403 errors (e.g., CONSUMER_INVALID, SERVICE_DISABLED, PERMISSION_DENIED), improving system resilience.
  • Configurable Circuit Breaker Cooldowns: Circuit breaker cooldown periods for hard and soft 403 errors are configurable, with defaults of 10 minutes for hard 403s and 30 minutes for soft 403s.
  • Bug Fixes & Enhancements: Addressed several issues including a bug in formatRetryAfter, a race condition in pickNext, and ensured the circuit breaker correctly trips for model-bearing requests. A new auth_unavailable error (HTTP 503) is introduced when all available authentications have open circuit breakers.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant resilience features: rate limiting and a circuit breaker for hard 403 errors. The implementation is robust, thread-safe, and configurable with hot-reload support, which is excellent. The rate limiter correctly handles per-IP, per-auth, and per-model dimensions, and the new RequestBodyCaptureMiddleware is a good way to enable per-model limiting. The circuit breaker logic correctly identifies hard 403s and provides a distinct 503 error when all authentications are unavailable, improving client-side error handling.

I've identified a few areas for improvement:

  • There is a potential memory leak in the InMemoryLimiterStore as it doesn't evict old token buckets.
  • A minor performance issue in RequestBodyCaptureMiddleware due to an unnecessary string conversion.
  • The use of a goto statement in the auth manager could be refactored for better readability.

Overall, this is a high-quality contribution that significantly improves the service's stability and protection against client floods and upstream failures. My comments are focused on refining the implementation for long-term performance and maintainability.

- Add stale bucket cleanup to InMemoryLimiterStore (memory leak fix)
- Use io.NopCloser(bytes.NewReader) instead of string copy
- Replace goto with boolean flag for better readability
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Suggestion] Add ingress rate limiting and 403 circuit breaker for /v1/messages

1 participant