claude-proxy

Stop panicking about Claude Max rate limits. Smart proxy that delegates work to Z.ai GLM models, eliminating expensive pay-as-you-go overages while preserving your Claude subscription for reasoning tasks.

The Problem

You're paying $200/month for Claude Max x20 - that's large volume, but it's not unlimited. When you hit the limit mid-week, you face two bad options:

Pay expensive overages - $3/$15 per million tokens adds $200-400/month in unexpected costs
Stop working - Wait for quota refresh and lose days of productivity

Neither is acceptable when you're trying to ship.

The Solution

This proxy routes strategically based on model selection:

Claude Code ──→ localhost:3001 ──→ claude-proxy
                                      │
                    ┌─────────────────┼─────────────────┐
                    │                 │                  │
                 Sonnet            Haiku              Opus
                    │                 │                  │
                    ▼                 ▼                  ▼
              ┌──────────┐    ┌──────────────┐    ┌──────────┐
              │   Z.ai   │    │     Z.ai     │    │Anthropic │
              │  GLM-4.7 │    │ GLM-4.5-Air  │    │Claude API│
              └──────────┘    └──────────────┘    └──────────┘
                  $30/mo         (included)          $200/mo

Result: Continuous productivity without expensive overages. Opus stays on Claude for reasoning. Everything else routes to Z.ai.

Why This Matters

Without proxy:

Hit limits mid-week
Pay $200-400 in monthly overages OR stop working
Stressful quota management
Total cost: $400-600/month

With proxy:

Opus for critical thinking (Claude API)
Sonnet and Haiku for execution (Z.ai)
Never hit expensive pay-as-you-go
Total cost: $230/month ($200 Max + $30 Z.ai)

Savings: $170-370/month by avoiding overages

GLM-4.7 vs Claude Sonnet 4.5: The Numbers

The routing strategy is based on real benchmark data, not vibes:

Benchmark	Claude Sonnet 4.5	GLM-4.7	Source
SWE-bench Verified	77.2%	73.8%	Z.ai docs, Caylent
LiveCodeBench v6	64.0%	84.9%	Z.ai docs
Terminal Bench 2.0	42.8%	41.0%	Z.ai docs
τ²-Bench	87.2%	87.4%	HuggingFace

GLM-4.7 wins on live coding and agent benchmarks. Sonnet leads on SWE-bench Verified by 3.4 points. For execution tasks (implementing endpoints, writing tests, refactoring), that gap is negligible. Complex reasoning stays on Opus via Anthropic.

When to Use Which Model

Task Type	Model	Routed To	Why
System architecture	Opus 4.6	Anthropic	Deep reasoning required
Security review	Opus 4.6	Anthropic	Critical decisions need premium quality
Algorithm optimization	Opus 4.6	Anthropic	Complex analysis worth the quota
API endpoint implementation	Sonnet 4.5	Z.ai GLM-4.7	84.9% LiveCodeBench, saves quota
Refactoring functions	Sonnet 4.5	Z.ai GLM-4.7	73.8% SWE-bench, 3.4pt gap acceptable
Writing unit tests	Sonnet 4.5	Z.ai GLM-4.7	Formulaic, preserves limits
Quick file operations	Haiku 4.5	Z.ai GLM-4.5-Air	Fast, offloads from Claude
Code analysis	Haiku 4.5	Z.ai GLM-4.5-Air	Efficient scanning

The proxy handles routing automatically - you just pick the model that matches your task.

Quickstart

Prerequisites: Node.js 20+, pnpm, Claude Max subscription, Z.ai subscription

5-Minute Setup

Clone and install:

git clone https://github.com/pbuchman/claude-proxy.git
cd claude-proxy
pnpm install

Configure:

cp .env.example .env
# Edit .env and add your Z.ai API key:
# ZAI_API_KEY=your-zai-key-here

Start the proxy:
```
# Development (hot reload)
pnpm dev

# Production
pnpm build && pnpm start
```
The proxy starts on http://localhost:3001. Open the dashboard at http://localhost:3001/ui to monitor routing and usage.

Point Claude Code to proxy:

# Add to ~/.zshrc or ~/.bashrc:
export ANTHROPIC_BASE_URL=http://localhost:3001

Monitor routing:

Open the dashboard to track requests, token usage, and routing decisions in real-time:
```
open http://localhost:3001/ui
```

How It Works

Routing

The router inspects the model field in every request:

model.includes('sonnet') → Z.ai with ZAI_API_KEY
model.includes('haiku') → Z.ai with ZAI_API_KEY, model remapped to GLM-4.5-Air
Everything else (Opus, unknown) → Anthropic with original x-api-key passthrough

Haiku requires explicit model remapping because Z.ai doesn't recognize Claude model names for that tier. Sonnet is mapped to GLM-4.7 server-side by Z.ai.

Streaming & Token Extraction

Responses are Server-Sent Events (SSE). The proxy forwards chunks to the client in real-time while parsing token usage in parallel.

Token usage location differs by backend:

Anthropic puts input_tokens + cache tokens in message_start, output_tokens in message_delta
Z.ai puts all tokens in message_delta (message_start contains zeros)

The parser merges both sources from every event, taking the latest non-zero value. Non-streaming JSON responses are also parsed for usage.

Thinking Block Stripping

When you switch between backends mid-conversation (e.g. Sonnet on Z.ai, then Opus on Anthropic), thinking blocks in the conversation history carry cryptographic signatures that are only valid for the originating backend. The proxy strips all thinking blocks from assistant messages before forwarding, preventing 400 Invalid signature errors.

Request Logging

Logging is split between file and console:

JSONL file (logs/requests.jsonl) - usage metrics only:

{
  "ts": "2026-02-07T10:30:00.000Z",
  "model": "claude-sonnet-4-5",
  "backend": "zai",
  "input_tokens": 1250,
  "output_tokens": 890,
  "cache_creation_input_tokens": 0,
  "cache_read_input_tokens": 0,
  "status": "success"
}

Console (pino) - full operational context:

[sonnet|zai] 1.3k→890     3421ms  "Write hello world in Python"

Console output includes latency, prompt snippets, and masked API keys (Anthropic only). None of that goes to the file.

Dashboard

Access at http://localhost:3001/ui

Current week requests: Z.ai vs Claude breakdown
3-hour view: 3-minute buckets for recent activity monitoring
24-hour chart: Hourly request distribution by model
Weekly trends: Custom week boundary (e.g., Thursday 22:00 start)
Provider split: Delegation strategy visualization
Interactive filters: Toggle models/providers to focus on specific metrics

Configuration

# Server
PORT=3001                  # Proxy listens here
HOST=127.0.0.1             # Bind address

# Z.ai (for Sonnet + Haiku routing)
ZAI_API_KEY=your-key-here  # Required
ZAI_BASE_URL=https://api.z.ai/api/anthropic

# Anthropic (passthrough - uses original key from request)
ANTHROPIC_BASE_URL=https://api.anthropic.com

# Logging
LOG_DIR=./logs
LOG_LEVEL=info             # debug | info | warn | error

# Week boundary (customize for your quota cycle)
WEEK_START_DAY=4           # 0=Sunday, 4=Thursday
WEEK_START_HOUR=22         # 0-23

Only ZAI_API_KEY is required. Everything else has sensible defaults.

Commands

Command	Description
`pnpm dev`	Start with hot reload
`pnpm build`	Build TypeScript to `dist/`
`pnpm start`	Run production build
`pnpm check`	Full pipeline: typecheck → lint → format → build → test
`pnpm test`	Run test suite
`pnpm test:watch`	Run tests in watch mode
`pnpm typecheck`	Type checking without build
`pnpm lint`	ESLint check (`pnpm lint:fix` to auto-fix)
`pnpm format`	Prettier format (`pnpm format:check` to verify)
`pnpm stats`	View CLI statistics

Cost Breakdown

Subscription	Monthly Cost	Purpose
Claude Max x20	$200	Reasoning (Opus)
Z.ai GLM Coding Plan	$30 ($90/quarter)	Execution (Sonnet → GLM-4.7, Haiku → GLM-4.5-Air)
Total	$230	Predictable cost, no overages

Testing

# Run test suite
pnpm test

# Test Sonnet routing (→ Z.ai GLM-4.7)
curl -X POST http://localhost:3001/v1/messages \
  -H "x-api-key: sk-ant-xxx" \
  -H "content-type: application/json" \
  -d '{"model":"claude-sonnet-4-5","max_tokens":100,"messages":[{"role":"user","content":"Hello"}]}'

# Test Haiku routing (→ Z.ai GLM-4.5-Air)
curl -X POST http://localhost:3001/v1/messages \
  -H "x-api-key: sk-ant-xxx" \
  -H "content-type: application/json" \
  -d '{"model":"claude-haiku-4-5","max_tokens":100,"messages":[{"role":"user","content":"Hello"}]}'

# Test Opus routing (→ Anthropic)
curl -X POST http://localhost:3001/v1/messages \
  -H "x-api-key: sk-ant-xxx" \
  -H "content-type: application/json" \
  -d '{"model":"claude-opus-4-6","max_tokens":100,"messages":[{"role":"user","content":"Hello"}]}'

Module Structure

Module	Purpose
`index.ts`	Fastify server setup, route registration, request lifecycle
`config.ts`	Environment variable loading with validation
`router.ts`	Model → backend routing + model remapping
`proxy.ts`	HTTP forwarding, SSE streaming, token extraction, thinking block stripping
`logger.ts`	Console logging (pino) + JSONL usage file
`metrics.ts`	Log aggregation, week boundary calculations
`ui.ts`	Dashboard routes, static file serving, time-series aggregation
`stats.ts`	CLI for viewing usage statistics

Security

API keys never written to log files
Z.ai requests: client's API key not logged to console either (it's swapped, not used)
Anthropic requests: key shown in console only (masked: sk-an...wigAA)
Z.ai key stored in .env (gitignored)
Localhost binding by default (127.0.0.1)
No message content in logs - only prompt snippets in console, not in files

Troubleshooting

Proxy won't start:

# Check Z.ai API key is set
grep ZAI_API_KEY .env

# Check port availability
lsof -i :3001

Claude Code not routing through proxy:

# Verify env var is set
echo $ANTHROPIC_BASE_URL
# Should show: http://localhost:3001

# Restart terminal after setting
source ~/.zshrc

Routing going to wrong backend:

# Check live logs
tail -f logs/requests.jsonl

# Routing rules:
# "sonnet" in model name → Z.ai GLM-4.7
# "haiku" in model name  → Z.ai GLM-4.5-Air
# everything else        → Anthropic

"Invalid signature in thinking block" error: This happens when switching backends mid-conversation. The proxy strips thinking blocks automatically - make sure you're running the latest version.

Perfect For

Claude Max users hitting rate limits - Avoid expensive overages
High-volume developers - Strategic delegation for continuous productivity
Teams managing subscription costs - Predictable monthly budget

Not For You If

You rarely hit Claude subscription limits
You only have Free or Pro tier (no overage risk)
You always need premium Claude quality for everything
You don't use Claude Code or API clients

Contributing

Contributions welcome! Fork, branch, PR.

License

MIT License - see LICENSE file.

Stop worrying about rate limits. Start shipping continuously.

Made by Piotr Buchman

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.claude		.claude
docs		docs
public		public
src		src
.env.example		.env.example
.gitignore		.gitignore
.npmrc		.npmrc
.prettierignore		.prettierignore
.prettierrc		.prettierrc
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
eslint.config.js		eslint.config.js
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

claude-proxy

The Problem

The Solution

Why This Matters

GLM-4.7 vs Claude Sonnet 4.5: The Numbers

When to Use Which Model

Quickstart

5-Minute Setup

How It Works

Routing

Streaming & Token Extraction

Thinking Block Stripping

Request Logging

Dashboard

Configuration

Commands

Cost Breakdown

Testing

Module Structure

Security

Troubleshooting

Perfect For

Not For You If

Contributing

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

pbuchman/claude-proxy

Folders and files

Latest commit

History

Repository files navigation

claude-proxy

The Problem

The Solution

Why This Matters

GLM-4.7 vs Claude Sonnet 4.5: The Numbers

When to Use Which Model

Quickstart

5-Minute Setup

How It Works

Routing

Streaming & Token Extraction

Thinking Block Stripping

Request Logging

Dashboard

Configuration

Commands

Cost Breakdown

Testing

Module Structure

Security

Troubleshooting

Perfect For

Not For You If

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages