-
Notifications
You must be signed in to change notification settings - Fork 0
Description
feat: Add rate limiting support (global and per-route)
Add rate limiting to protect backend services from excessive requests. Rate limits should be configurable at both global and per-route levels, with support for total requests-per-second and per-IP limits.
Motivation
Gatekeeper currently validates webhooks via signature verification and IP allowlists, but has no protection against high-volume traffic that could overwhelm backends. Rate limiting would:
- Protect internal services from DoS conditions
- Allow different limits for different webhook providers (some send more traffic than others)
- Provide per-IP limits to prevent a single source from monopolizing capacity
Proposed Configuration
Rate limiters defined globally, referenced by routes (following existing pattern for verifiers/validators):
rate_limiters:
default:
total_rps: 100 # Total requests per second across all IPs
per_ip_rps: 10 # Per-IP requests per second
burst: 20 # Allow short bursts above the limit
high_volume:
total_rps: 500
per_ip_rps: 50
burst: 100
routes:
- hostname: webhooks.example.com
path: /slack
verifier: slack
rate_limiter: default # Optional: uses no limiting if omitted
destination: http://backend:8080
- hostname: webhooks.example.com
path: /github
verifier: github
rate_limiter: high_volume
destination: http://backend:8080Optional global default:
global:
default_rate_limiter: default # Applied to all routes without explicit rate_limiterRequest Flow Position
Rate limiting should occur after route lookup and before signature verification:
- Route lookup
- Rate limiting ← new step
- IP allowlist check
- Read body
- Signature verification
- Validation
- Forward/relay
This ordering:
- Avoids wasting CPU on signature verification for rate-limited requests
- Allows per-route rate limits to apply
- Can use
getClientIP()for per-IP limiting
Algorithm
Leaky bucket (token bucket) is recommended:
- Simple to implement and understand
- Handles bursts gracefully
- Well-supported by Go libraries (e.g.,
golang.org/x/time/rate)
Multi-Replica Considerations
For single-replica deployments, in-memory rate limiting is sufficient.
For multi-replica deployments (which already use Redis for relay coordination), rate limits could optionally use Redis for coordination. This could be a future enhancement:
rate_limiters:
default:
total_rps: 100
per_ip_rps: 10
distributed: true # Use Redis for coordination (future)Initial implementation should focus on in-memory limiting per replica.
Response Behavior
When rate limited:
- Return
429 Too Many Requests - Include
Retry-Afterheader with seconds until the client can retry - Log at WARN level with route, client IP, and limit that was exceeded
- Record metrics:
gatekeeper_rate_limited_total{route, limiter, reason}where reason istotalorper_ip
Acceptance Criteria
- Rate limiters defined in config, validated at startup
- Routes can reference rate limiters by name
- Optional global default rate limiter
-
total_rpslimits total throughput for a route -
per_ip_rpslimits per-IP throughput for a route -
burstallows short-term spikes above steady-state limit - Returns 429 with Retry-After header when limited
- Prometheus metrics for rate limiting events
- Documentation in AGENTS.md and example config
- 100% test coverage