Stop panicking about Claude Max rate limits. Smart proxy that delegates work to Z.ai GLM models, eliminating expensive pay-as-you-go overages while preserving your Claude subscription for reasoning tasks.
You're paying $200/month for Claude Max x20 - that's large volume, but it's not unlimited. When you hit the limit mid-week, you face two bad options:
- Pay expensive overages - $3/$15 per million tokens adds $200-400/month in unexpected costs
- Stop working - Wait for quota refresh and lose days of productivity
Neither is acceptable when you're trying to ship.
This proxy routes strategically based on model selection:
Claude Code ──→ localhost:3001 ──→ claude-proxy
│
┌─────────────────┼─────────────────┐
│ │ │
Sonnet Haiku Opus
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────────┐ ┌──────────┐
│ Z.ai │ │ Z.ai │ │Anthropic │
│ GLM-4.7 │ │ GLM-4.5-Air │ │Claude API│
└──────────┘ └──────────────┘ └──────────┘
$30/mo (included) $200/mo
Result: Continuous productivity without expensive overages. Opus stays on Claude for reasoning. Everything else routes to Z.ai.
Without proxy:
- Hit limits mid-week
- Pay $200-400 in monthly overages OR stop working
- Stressful quota management
- Total cost: $400-600/month
With proxy:
- Opus for critical thinking (Claude API)
- Sonnet and Haiku for execution (Z.ai)
- Never hit expensive pay-as-you-go
- Total cost: $230/month ($200 Max + $30 Z.ai)
Savings: $170-370/month by avoiding overages
The routing strategy is based on real benchmark data, not vibes:
| Benchmark | Claude Sonnet 4.5 | GLM-4.7 | Source |
|---|---|---|---|
| SWE-bench Verified | 77.2% | 73.8% | Z.ai docs, Caylent |
| LiveCodeBench v6 | 64.0% | 84.9% | Z.ai docs |
| Terminal Bench 2.0 | 42.8% | 41.0% | Z.ai docs |
| τ²-Bench | 87.2% | 87.4% | HuggingFace |
GLM-4.7 wins on live coding and agent benchmarks. Sonnet leads on SWE-bench Verified by 3.4 points. For execution tasks (implementing endpoints, writing tests, refactoring), that gap is negligible. Complex reasoning stays on Opus via Anthropic.
| Task Type | Model | Routed To | Why |
|---|---|---|---|
| System architecture | Opus 4.6 | Anthropic | Deep reasoning required |
| Security review | Opus 4.6 | Anthropic | Critical decisions need premium quality |
| Algorithm optimization | Opus 4.6 | Anthropic | Complex analysis worth the quota |
| API endpoint implementation | Sonnet 4.5 | Z.ai GLM-4.7 | 84.9% LiveCodeBench, saves quota |
| Refactoring functions | Sonnet 4.5 | Z.ai GLM-4.7 | 73.8% SWE-bench, 3.4pt gap acceptable |
| Writing unit tests | Sonnet 4.5 | Z.ai GLM-4.7 | Formulaic, preserves limits |
| Quick file operations | Haiku 4.5 | Z.ai GLM-4.5-Air | Fast, offloads from Claude |
| Code analysis | Haiku 4.5 | Z.ai GLM-4.5-Air | Efficient scanning |
The proxy handles routing automatically - you just pick the model that matches your task.
Prerequisites: Node.js 20+, pnpm, Claude Max subscription, Z.ai subscription
-
Clone and install:
git clone https://github.com/pbuchman/claude-proxy.git cd claude-proxy pnpm install -
Configure:
cp .env.example .env # Edit .env and add your Z.ai API key: # ZAI_API_KEY=your-zai-key-here
-
Start the proxy:
# Development (hot reload) pnpm dev # Production pnpm build && pnpm start
The proxy starts on
http://localhost:3001. Open the dashboard athttp://localhost:3001/uito monitor routing and usage. -
Point Claude Code to proxy:
# Add to ~/.zshrc or ~/.bashrc: export ANTHROPIC_BASE_URL=http://localhost:3001
-
Monitor routing:
Open the dashboard to track requests, token usage, and routing decisions in real-time:
open http://localhost:3001/ui
The router inspects the model field in every request:
model.includes('sonnet')→ Z.ai withZAI_API_KEYmodel.includes('haiku')→ Z.ai withZAI_API_KEY, model remapped toGLM-4.5-Air- Everything else (Opus, unknown) → Anthropic with original
x-api-keypassthrough
Haiku requires explicit model remapping because Z.ai doesn't recognize Claude model names for that tier. Sonnet is mapped to GLM-4.7 server-side by Z.ai.
Responses are Server-Sent Events (SSE). The proxy forwards chunks to the client in real-time while parsing token usage in parallel.
Token usage location differs by backend:
- Anthropic puts
input_tokens+ cache tokens inmessage_start,output_tokensinmessage_delta - Z.ai puts all tokens in
message_delta(message_startcontains zeros)
The parser merges both sources from every event, taking the latest non-zero value. Non-streaming JSON responses are also parsed for usage.
When you switch between backends mid-conversation (e.g. Sonnet on Z.ai, then Opus on Anthropic), thinking blocks in the conversation history carry cryptographic signatures that are only valid for the originating backend. The proxy strips all thinking blocks from assistant messages before forwarding, preventing 400 Invalid signature errors.
Logging is split between file and console:
JSONL file (logs/requests.jsonl) - usage metrics only:
{
"ts": "2026-02-07T10:30:00.000Z",
"model": "claude-sonnet-4-5",
"backend": "zai",
"input_tokens": 1250,
"output_tokens": 890,
"cache_creation_input_tokens": 0,
"cache_read_input_tokens": 0,
"status": "success"
}Console (pino) - full operational context:
[sonnet|zai] 1.3k→890 3421ms "Write hello world in Python"
Console output includes latency, prompt snippets, and masked API keys (Anthropic only). None of that goes to the file.
Access at http://localhost:3001/ui
- Current week requests: Z.ai vs Claude breakdown
- 3-hour view: 3-minute buckets for recent activity monitoring
- 24-hour chart: Hourly request distribution by model
- Weekly trends: Custom week boundary (e.g., Thursday 22:00 start)
- Provider split: Delegation strategy visualization
- Interactive filters: Toggle models/providers to focus on specific metrics
# Server
PORT=3001 # Proxy listens here
HOST=127.0.0.1 # Bind address
# Z.ai (for Sonnet + Haiku routing)
ZAI_API_KEY=your-key-here # Required
ZAI_BASE_URL=https://api.z.ai/api/anthropic
# Anthropic (passthrough - uses original key from request)
ANTHROPIC_BASE_URL=https://api.anthropic.com
# Logging
LOG_DIR=./logs
LOG_LEVEL=info # debug | info | warn | error
# Week boundary (customize for your quota cycle)
WEEK_START_DAY=4 # 0=Sunday, 4=Thursday
WEEK_START_HOUR=22 # 0-23Only ZAI_API_KEY is required. Everything else has sensible defaults.
| Command | Description |
|---|---|
pnpm dev |
Start with hot reload |
pnpm build |
Build TypeScript to dist/ |
pnpm start |
Run production build |
pnpm check |
Full pipeline: typecheck → lint → format → build → test |
pnpm test |
Run test suite |
pnpm test:watch |
Run tests in watch mode |
pnpm typecheck |
Type checking without build |
pnpm lint |
ESLint check (pnpm lint:fix to auto-fix) |
pnpm format |
Prettier format (pnpm format:check to verify) |
pnpm stats |
View CLI statistics |
| Subscription | Monthly Cost | Purpose |
|---|---|---|
| Claude Max x20 | $200 | Reasoning (Opus) |
| Z.ai GLM Coding Plan | $30 ($90/quarter) | Execution (Sonnet → GLM-4.7, Haiku → GLM-4.5-Air) |
| Total | $230 | Predictable cost, no overages |
# Run test suite
pnpm test
# Test Sonnet routing (→ Z.ai GLM-4.7)
curl -X POST http://localhost:3001/v1/messages \
-H "x-api-key: sk-ant-xxx" \
-H "content-type: application/json" \
-d '{"model":"claude-sonnet-4-5","max_tokens":100,"messages":[{"role":"user","content":"Hello"}]}'
# Test Haiku routing (→ Z.ai GLM-4.5-Air)
curl -X POST http://localhost:3001/v1/messages \
-H "x-api-key: sk-ant-xxx" \
-H "content-type: application/json" \
-d '{"model":"claude-haiku-4-5","max_tokens":100,"messages":[{"role":"user","content":"Hello"}]}'
# Test Opus routing (→ Anthropic)
curl -X POST http://localhost:3001/v1/messages \
-H "x-api-key: sk-ant-xxx" \
-H "content-type: application/json" \
-d '{"model":"claude-opus-4-6","max_tokens":100,"messages":[{"role":"user","content":"Hello"}]}'| Module | Purpose |
|---|---|
index.ts |
Fastify server setup, route registration, request lifecycle |
config.ts |
Environment variable loading with validation |
router.ts |
Model → backend routing + model remapping |
proxy.ts |
HTTP forwarding, SSE streaming, token extraction, thinking block stripping |
logger.ts |
Console logging (pino) + JSONL usage file |
metrics.ts |
Log aggregation, week boundary calculations |
ui.ts |
Dashboard routes, static file serving, time-series aggregation |
stats.ts |
CLI for viewing usage statistics |
- API keys never written to log files
- Z.ai requests: client's API key not logged to console either (it's swapped, not used)
- Anthropic requests: key shown in console only (masked:
sk-an...wigAA) - Z.ai key stored in
.env(gitignored) - Localhost binding by default (
127.0.0.1) - No message content in logs - only prompt snippets in console, not in files
Proxy won't start:
# Check Z.ai API key is set
grep ZAI_API_KEY .env
# Check port availability
lsof -i :3001Claude Code not routing through proxy:
# Verify env var is set
echo $ANTHROPIC_BASE_URL
# Should show: http://localhost:3001
# Restart terminal after setting
source ~/.zshrcRouting going to wrong backend:
# Check live logs
tail -f logs/requests.jsonl
# Routing rules:
# "sonnet" in model name → Z.ai GLM-4.7
# "haiku" in model name → Z.ai GLM-4.5-Air
# everything else → Anthropic"Invalid signature in thinking block" error: This happens when switching backends mid-conversation. The proxy strips thinking blocks automatically - make sure you're running the latest version.
- Claude Max users hitting rate limits - Avoid expensive overages
- High-volume developers - Strategic delegation for continuous productivity
- Teams managing subscription costs - Predictable monthly budget
- You rarely hit Claude subscription limits
- You only have Free or Pro tier (no overage risk)
- You always need premium Claude quality for everything
- You don't use Claude Code or API clients
Contributions welcome! Fork, branch, PR.
MIT License - see LICENSE file.
Stop worrying about rate limits. Start shipping continuously.
Made by Piotr Buchman
