Skip to content

pbuchman/claude-proxy

Repository files navigation

claude-proxy

License: MIT Node: >=20

Stop panicking about Claude Max rate limits. Smart proxy that delegates work to Z.ai GLM models, eliminating expensive pay-as-you-go overages while preserving your Claude subscription for reasoning tasks.

The Problem

You're paying $200/month for Claude Max x20 - that's large volume, but it's not unlimited. When you hit the limit mid-week, you face two bad options:

  1. Pay expensive overages - $3/$15 per million tokens adds $200-400/month in unexpected costs
  2. Stop working - Wait for quota refresh and lose days of productivity

Neither is acceptable when you're trying to ship.

The Solution

This proxy routes strategically based on model selection:

Claude Code ──→ localhost:3001 ──→ claude-proxy
                                      │
                    ┌─────────────────┼─────────────────┐
                    │                 │                  │
                 Sonnet            Haiku              Opus
                    │                 │                  │
                    ▼                 ▼                  ▼
              ┌──────────┐    ┌──────────────┐    ┌──────────┐
              │   Z.ai   │    │     Z.ai     │    │Anthropic │
              │  GLM-4.7 │    │ GLM-4.5-Air  │    │Claude API│
              └──────────┘    └──────────────┘    └──────────┘
                  $30/mo         (included)          $200/mo

Result: Continuous productivity without expensive overages. Opus stays on Claude for reasoning. Everything else routes to Z.ai.

claude-proxy

Why This Matters

Without proxy:

  • Hit limits mid-week
  • Pay $200-400 in monthly overages OR stop working
  • Stressful quota management
  • Total cost: $400-600/month

With proxy:

  • Opus for critical thinking (Claude API)
  • Sonnet and Haiku for execution (Z.ai)
  • Never hit expensive pay-as-you-go
  • Total cost: $230/month ($200 Max + $30 Z.ai)

Savings: $170-370/month by avoiding overages

GLM-4.7 vs Claude Sonnet 4.5: The Numbers

The routing strategy is based on real benchmark data, not vibes:

Benchmark Claude Sonnet 4.5 GLM-4.7 Source
SWE-bench Verified 77.2% 73.8% Z.ai docs, Caylent
LiveCodeBench v6 64.0% 84.9% Z.ai docs
Terminal Bench 2.0 42.8% 41.0% Z.ai docs
τ²-Bench 87.2% 87.4% HuggingFace

GLM-4.7 wins on live coding and agent benchmarks. Sonnet leads on SWE-bench Verified by 3.4 points. For execution tasks (implementing endpoints, writing tests, refactoring), that gap is negligible. Complex reasoning stays on Opus via Anthropic.

When to Use Which Model

Task Type Model Routed To Why
System architecture Opus 4.6 Anthropic Deep reasoning required
Security review Opus 4.6 Anthropic Critical decisions need premium quality
Algorithm optimization Opus 4.6 Anthropic Complex analysis worth the quota
API endpoint implementation Sonnet 4.5 Z.ai GLM-4.7 84.9% LiveCodeBench, saves quota
Refactoring functions Sonnet 4.5 Z.ai GLM-4.7 73.8% SWE-bench, 3.4pt gap acceptable
Writing unit tests Sonnet 4.5 Z.ai GLM-4.7 Formulaic, preserves limits
Quick file operations Haiku 4.5 Z.ai GLM-4.5-Air Fast, offloads from Claude
Code analysis Haiku 4.5 Z.ai GLM-4.5-Air Efficient scanning

The proxy handles routing automatically - you just pick the model that matches your task.

Quickstart

Prerequisites: Node.js 20+, pnpm, Claude Max subscription, Z.ai subscription

5-Minute Setup

  1. Clone and install:

    git clone https://github.com/pbuchman/claude-proxy.git
    cd claude-proxy
    pnpm install
  2. Configure:

    cp .env.example .env
    # Edit .env and add your Z.ai API key:
    # ZAI_API_KEY=your-zai-key-here
  3. Start the proxy:

    # Development (hot reload)
    pnpm dev
    
    # Production
    pnpm build && pnpm start

    The proxy starts on http://localhost:3001. Open the dashboard at http://localhost:3001/ui to monitor routing and usage.

  4. Point Claude Code to proxy:

    # Add to ~/.zshrc or ~/.bashrc:
    export ANTHROPIC_BASE_URL=http://localhost:3001
  5. Monitor routing:

    Open the dashboard to track requests, token usage, and routing decisions in real-time:

    open http://localhost:3001/ui

How It Works

Routing

The router inspects the model field in every request:

  • model.includes('sonnet') → Z.ai with ZAI_API_KEY
  • model.includes('haiku') → Z.ai with ZAI_API_KEY, model remapped to GLM-4.5-Air
  • Everything else (Opus, unknown) → Anthropic with original x-api-key passthrough

Haiku requires explicit model remapping because Z.ai doesn't recognize Claude model names for that tier. Sonnet is mapped to GLM-4.7 server-side by Z.ai.

Streaming & Token Extraction

Responses are Server-Sent Events (SSE). The proxy forwards chunks to the client in real-time while parsing token usage in parallel.

Token usage location differs by backend:

  • Anthropic puts input_tokens + cache tokens in message_start, output_tokens in message_delta
  • Z.ai puts all tokens in message_delta (message_start contains zeros)

The parser merges both sources from every event, taking the latest non-zero value. Non-streaming JSON responses are also parsed for usage.

Thinking Block Stripping

When you switch between backends mid-conversation (e.g. Sonnet on Z.ai, then Opus on Anthropic), thinking blocks in the conversation history carry cryptographic signatures that are only valid for the originating backend. The proxy strips all thinking blocks from assistant messages before forwarding, preventing 400 Invalid signature errors.

Request Logging

Logging is split between file and console:

JSONL file (logs/requests.jsonl) - usage metrics only:

{
  "ts": "2026-02-07T10:30:00.000Z",
  "model": "claude-sonnet-4-5",
  "backend": "zai",
  "input_tokens": 1250,
  "output_tokens": 890,
  "cache_creation_input_tokens": 0,
  "cache_read_input_tokens": 0,
  "status": "success"
}

Console (pino) - full operational context:

[sonnet|zai] 1.3k→890     3421ms  "Write hello world in Python"

Console output includes latency, prompt snippets, and masked API keys (Anthropic only). None of that goes to the file.

Dashboard

Access at http://localhost:3001/ui

  • Current week requests: Z.ai vs Claude breakdown
  • 3-hour view: 3-minute buckets for recent activity monitoring
  • 24-hour chart: Hourly request distribution by model
  • Weekly trends: Custom week boundary (e.g., Thursday 22:00 start)
  • Provider split: Delegation strategy visualization
  • Interactive filters: Toggle models/providers to focus on specific metrics

Configuration

# Server
PORT=3001                  # Proxy listens here
HOST=127.0.0.1             # Bind address

# Z.ai (for Sonnet + Haiku routing)
ZAI_API_KEY=your-key-here  # Required
ZAI_BASE_URL=https://api.z.ai/api/anthropic

# Anthropic (passthrough - uses original key from request)
ANTHROPIC_BASE_URL=https://api.anthropic.com

# Logging
LOG_DIR=./logs
LOG_LEVEL=info             # debug | info | warn | error

# Week boundary (customize for your quota cycle)
WEEK_START_DAY=4           # 0=Sunday, 4=Thursday
WEEK_START_HOUR=22         # 0-23

Only ZAI_API_KEY is required. Everything else has sensible defaults.

Commands

Command Description
pnpm dev Start with hot reload
pnpm build Build TypeScript to dist/
pnpm start Run production build
pnpm check Full pipeline: typecheck → lint → format → build → test
pnpm test Run test suite
pnpm test:watch Run tests in watch mode
pnpm typecheck Type checking without build
pnpm lint ESLint check (pnpm lint:fix to auto-fix)
pnpm format Prettier format (pnpm format:check to verify)
pnpm stats View CLI statistics

Cost Breakdown

Subscription Monthly Cost Purpose
Claude Max x20 $200 Reasoning (Opus)
Z.ai GLM Coding Plan $30 ($90/quarter) Execution (Sonnet → GLM-4.7, Haiku → GLM-4.5-Air)
Total $230 Predictable cost, no overages

Testing

# Run test suite
pnpm test

# Test Sonnet routing (→ Z.ai GLM-4.7)
curl -X POST http://localhost:3001/v1/messages \
  -H "x-api-key: sk-ant-xxx" \
  -H "content-type: application/json" \
  -d '{"model":"claude-sonnet-4-5","max_tokens":100,"messages":[{"role":"user","content":"Hello"}]}'

# Test Haiku routing (→ Z.ai GLM-4.5-Air)
curl -X POST http://localhost:3001/v1/messages \
  -H "x-api-key: sk-ant-xxx" \
  -H "content-type: application/json" \
  -d '{"model":"claude-haiku-4-5","max_tokens":100,"messages":[{"role":"user","content":"Hello"}]}'

# Test Opus routing (→ Anthropic)
curl -X POST http://localhost:3001/v1/messages \
  -H "x-api-key: sk-ant-xxx" \
  -H "content-type: application/json" \
  -d '{"model":"claude-opus-4-6","max_tokens":100,"messages":[{"role":"user","content":"Hello"}]}'

Module Structure

Module Purpose
index.ts Fastify server setup, route registration, request lifecycle
config.ts Environment variable loading with validation
router.ts Model → backend routing + model remapping
proxy.ts HTTP forwarding, SSE streaming, token extraction, thinking block stripping
logger.ts Console logging (pino) + JSONL usage file
metrics.ts Log aggregation, week boundary calculations
ui.ts Dashboard routes, static file serving, time-series aggregation
stats.ts CLI for viewing usage statistics

Security

  • API keys never written to log files
  • Z.ai requests: client's API key not logged to console either (it's swapped, not used)
  • Anthropic requests: key shown in console only (masked: sk-an...wigAA)
  • Z.ai key stored in .env (gitignored)
  • Localhost binding by default (127.0.0.1)
  • No message content in logs - only prompt snippets in console, not in files

Troubleshooting

Proxy won't start:

# Check Z.ai API key is set
grep ZAI_API_KEY .env

# Check port availability
lsof -i :3001

Claude Code not routing through proxy:

# Verify env var is set
echo $ANTHROPIC_BASE_URL
# Should show: http://localhost:3001

# Restart terminal after setting
source ~/.zshrc

Routing going to wrong backend:

# Check live logs
tail -f logs/requests.jsonl

# Routing rules:
# "sonnet" in model name → Z.ai GLM-4.7
# "haiku" in model name  → Z.ai GLM-4.5-Air
# everything else        → Anthropic

"Invalid signature in thinking block" error: This happens when switching backends mid-conversation. The proxy strips thinking blocks automatically - make sure you're running the latest version.

Perfect For

  • Claude Max users hitting rate limits - Avoid expensive overages
  • High-volume developers - Strategic delegation for continuous productivity
  • Teams managing subscription costs - Predictable monthly budget

Not For You If

  • You rarely hit Claude subscription limits
  • You only have Free or Pro tier (no overage risk)
  • You always need premium Claude quality for everything
  • You don't use Claude Code or API clients

Contributing

Contributions welcome! Fork, branch, PR.

License

MIT License - see LICENSE file.


Stop worrying about rate limits. Start shipping continuously.

Made by Piotr Buchman

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •