This document details the scaling characteristics, resource planning, and bottleneck analysis for deploying VelocityGate in a high-throughput production environment.
- CPU-Bound Operations: SSL Termination, JWT Verification (
RS256), JSON Parsing, Request Routing.- Scaling Strategy: Horizontal Auto-scaling (HPA). Add more Gateway pods as CPU usage rises.
- IO-Bound Operations: Redis Rate Limit Checks, Database Lookups (API Keys), Proxying to Backend.
- Scaling Strategy: Optimization of Connection Pools, Non-blocking I/O (Reactor Netty).
| Component | Default | Recommended (Prod) | Rationale |
|---|---|---|---|
| Netty Worker Threads | CPU Cores | CPU Cores * 2 |
Handle high concurrent connections non-blockingly. |
| Redis Connections (Lettuce) | Shared | max-active: 50 |
Lettuce is thread-safe; rarely need huge pools unless heavily pipelining. |
| DB Connections (HikariCP) | 10 | 20-50 |
Critical for API Key validation if not caching. Keep small to avoid DB saturation. |
| JVM Heap | 25% RAM | 2GB - 4GB |
Enough for caching keys/configs. Gateway is stateless, so massive heaps aren't needed. |
Estimates based on us-east-1 pricing (On-Demand).
- Compute: 3x
t3.medium(2 vCPU, 4GB RAM) - $0.125/hr - State: AWS ElastiCache (Redis)
cache.t3.micro(Primary + Replica) - $0.034/hr - Database: RDS
db.t3.micro- $0.017/hr - Est. Monthly Cost: ~$130
- Compute: 10x
c6g.large(2 vCPU, 4GB RAM, ARM-based) - $0.68/hr- Why Graviton? Java runs efficiently on ARM, ~20% cheaper.
- State: ElastiCache
cache.m6g.large(Cluster Mode: 3 shards) - $0.48/hr - Est. Monthly Cost: ~$900
- Compute: 50x
c6g.2xlarge(8 vCPU, 16GB RAM) - Auto-scaling group. - State: Redis Cluster with 10 shards (
cache.r6g.xlarge) to distribute key space/IOPS. - Network: AWS PrivateLink to minimize NAT Gateway costs.
- Est. Monthly Cost: ~$8,000+
Symptom: High latency on rate limit checks; Redis engine_cpu_utilization > 80%.
Cause: Complex Lua scripts (e.g., Sliding Window with huge ZSETS) blocking the single Redis thread.
Solutions:
- Sharding: Enable Redis Cluster. Distribute keys (
{tenant}:rate_limit) across slots. - Algorithm: Switch from Sliding Window to Token Bucket (O(1) complexity).
- Local Cache: Enable in-memory
Caffeinecache in Gateway for "Hot" keys (sacrifices strict consistency).
Symptom: dropped_packets, high p99 latency but low CPU.
Solutions:
- Compression: Enable GZIP/Brotli in
application.yml. - HTTP/2: Enable end-to-end HTTP/2 to multiplex connections.
Symptom: Periodic latency spikes (Stop-the-world pauses). Solutions:
- G1GC / ZGC: Use modern collectors.
java -XX:+UseZGC -jar app.jar. - Object Allocation: Reduce allocation in hot paths (reuse buffers).
We use a standard High Availability pattern:
- Deployment: Stateless Gateway pods.
- HPA: Scales on CPU (target 70%) or Custom Metric (RPS).
- PodDisruptionBudget: Ensures > 60% availability during node upgrades.
- Affinity: Anti-affinity to spread pods across Availability Zones.
(See k8s/ directory for full manifests)
We validated the architecture using k6 on a 3-node cluster.
| Pods | Input RPS | Success Rate | P95 Latency | Verdict |
|---|---|---|---|---|
| 1 | 2,000 | 100% | 15ms | Baseline |
| 1 | 4,000 | 85% | 1500ms | Saturation (CPU 98%) |
| 3 | 6,000 | 100% | 18ms | Linear Scaling Confirmed |
- Single Node: Capped at ~25k ops/sec due to Lua script overhead.
- 3-Node Cluster: Achieved ~70k ops/sec.
- Conclusion: Rate limiting logic scales linearly with Redis shards.