Skip to content

borhen68/TokenTamer

Repository files navigation

🚀 TokenTamer

CI Python 3.9+ License: MIT

A drop-in proxy that compresses bloated code context in real-time, cutting LLM API costs by 50–80% on plain-chat coding agents.

TokenTamer is a middleware proxy that sits between an AI coding agent and the LLM API. It intercepts raw payloads, parses code with AST, and replaces "background" files with structural skeletons. The agent still sees signatures, classes, and imports — it just stops paying for function bodies it isn't editing.

⚠️ Alpha software. This is a real project in active development, not a polished SaaS. Please read the support matrix below before installing.

🧪 Support Status

Client HTTPS interception Compression active Notes
Aider (--openai-api-base) ✅ Not needed ✅ Full Best supported. Use the proxy URL directly.
Cursor (custom base URL) ✅ Not needed ✅ Full Best supported.
Plain curl / SDK calls ✅ Not needed ✅ Full Great for testing.
Claude Code (hardcoded endpoint) ✅ Works ✅ Tool-aware Stale file reads in tool_result get skeletonized; latest read stays intact.
Codex CLI (hardcoded endpoint) ✅ Works ✅ Tool-aware Same engine via /v1/responses.

How tool-aware compression works. Agents like Claude Code call Read(file) repeatedly. The conversation accumulates the same file dumped multiple times. TokenTamer tracks every tool_use → file mapping, then skeletonizes the older tool_result reads while keeping the most recent read of each file 100% intact. tool_use blocks and tool definitions are never touched.

If something ever breaks, hit the kill switch:

token-tamer --ssl --port 443 --passthrough            # disable all compression
# or
token-tamer --ssl --port 443 --no-tool-compression    # disable only tool-aware path

🚨 Known Limitations

  • Compression depends on re-reads. Single-read sessions get no tool savings (just text compression). Long sessions where the agent re-reads files benefit the most.
  • Heuristic file detection. We look for file_path / path / filename keys in tool inputs. Exotic agents with unusual schemas may be missed.
  • Multi-turn cross-request caching is not yet implemented.
  • macOS only for the one-line cert setup. Linux/Windows users need to trust the CA manually.
  • No production benchmarks yet. Savings numbers come from unit tests with synthetic payloads, not real long Claude Code sessions.

🗺️ Roadmap

  • v0.2 — Tool-aware compression (✅ shipped)
  • v0.3 — Anthropic prompt caching / long-lived session hijacking (✅ shipped)
  • v0.4 — Tree-sitter for proper multi-language AST (current C-style support is a brace-balance heuristic)
  • v0.5 — Web dashboard with per-file compression heatmap + live cache hit metrics

✨ Features

  • 🔌 Drop-in proxy — No changes needed to your coding agent. Just change the API base URL.
  • 🔁 Long-lived session hijacking — Injects Anthropic cache_control breakpoints into outbound requests. Long Claude Code sessions see up to 90% off input tokens (cached input is $0.30/Mtoken vs $3.00/Mtoken regular).
  • 🧠 Smart active file detection — Automatically identifies which files you're working on and leaves them 100% intact.
  • 🌳 AST-based compression — Strips function bodies while preserving signatures, imports, and class structures.
  • 🔧 Tool-aware compression — Skeletonizes stale tool_result reads while preserving the latest read of each file. Safe with agents that use function calling.
  • 💰 Real-time cost tracking — Beautiful terminal dashboard showing tokens saved and money saved.
  • 🔄 Full streaming support — Transparent SSE streaming for both OpenAI and Anthropic APIs.
  • ⚡ Zero latency overhead — Compression happens locally in milliseconds.

🚀 Quick Start (5 Minutes)

Prerequisites

  • Python 3.9 or newer (python3 --version)
  • macOS, Linux, or Windows (Windows = manual cert trust step)
  • openssl (pre-installed on macOS & most Linux)

1. Install

git clone https://github.com/borhen68/TokenTamer.git
cd TokenTamer

# Recommended: use a virtual environment to avoid messing with system Python
python3 -m venv venv
source venv/bin/activate            # Windows: venv\Scripts\activate

pip install -e .

Verify it installed:

token-tamer --version
# → TokenTamer 0.2.0

2. Choose Your Path

👉 Path A — Aider, Cursor, or your own SDK code (no SSL setup needed):

token-tamer --port 8000 --no-dashboard

Then point your tool's API base URL at http://127.0.0.1:8000/v1:

aider --openai-api-base http://127.0.0.1:8000/v1

For Cursor: Settings → Models → OpenAI API Base → http://127.0.0.1:8000/v1. Done.

👉 Path B — Claude Code or Codex CLI (SSL setup, one-time):

These tools hardcode the API URL. We use HTTPS interception:

# Step 1 — Generate the local certificate (just runs and exits)
token-tamer --ssl --port 8443 --no-dashboard &
sleep 2 && kill %1

# Step 2 — Trust the certificate (macOS)
sudo security add-trusted-cert -d -r trustRoot \
  -k /Library/Keychains/System.keychain \
  ~/.config/token-tamer/certs/ca-cert.pem

# Step 3 — Redirect API domains to localhost
echo "127.0.0.1 api.openai.com"     | sudo tee -a /etc/hosts
echo "127.0.0.1 api.anthropic.com"  | sudo tee -a /etc/hosts

# Step 4 — Run TokenTamer on port 443 (sudo required for low ports)
sudo $(which token-tamer) --ssl --port 443 --no-dashboard

Leave that terminal open, then in a new terminal:

claude "create a snake game"     # or
codex "refactor this module"

You're now intercepting + compressing. 🎉

3. Verify It's Working

# Path A check:
curl http://127.0.0.1:8000/health

# Path B check:
curl https://api.openai.com/health    # Should return TokenTamer's JSON, not OpenAI's

Both should return:

{"status":"ok","version":"0.2.0","requests_processed":0,"tokens_saved":0}

4. Cleanup (Uninstall)

# Remove /etc/hosts entries
sudo sed -i.bak '/api.openai.com/d;/api.anthropic.com/d' /etc/hosts

# Untrust the cert
sudo security remove-trusted-cert -d ~/.config/token-tamer/certs/ca-cert.pem

# Uninstall the package
pip uninstall token-tamer

🆘 Troubleshooting

Symptom Fix
command not found: token-tamer Activate your venv: source venv/bin/activate
ModuleNotFoundError: No module named 'uvicorn' Same — venv not active
address already in use on port 8000 lsof -ti :8000 | xargs kill -9
Permission denied on port 443 Use sudo for ports <1024, or pick a higher port
SSL certificate problem from curl Re-run the security add-trusted-cert step, then open a NEW terminal
Claude Code hangs / errors Hit the kill switch: restart with --passthrough
Compression broke something Restart with --no-tool-compression and file an issue

API Keys

TokenTamer resolves API keys in this priority order:

  1. Request headers — Keys sent by your agent (default behavior, zero config needed)
  2. Environment variablesOPENAI_API_KEY / ANTHROPIC_API_KEY
  3. Config fileconfig.yaml

📊 How It Works

Your Agent                    TokenTamer                      LLM API
    │                              │                              │
    │── 100k token payload ──────▶│                              │
    │                              │── Identify active files      │
    │                              │── Skeletonize background     │
    │                              │── 15k token payload ────────▶│
    │                              │                              │
    │                              │◀──── Streaming response ─────│
    │◀── Streaming response ──────│                              │
    │                              │                              │
    │                              │── Dashboard: saved $2.45! 💰 │

Before (Heavy Token Cost):

def calculate_tax(amount: float, region: str) -> float:
    """Calculates regional tax rates based on complex logic."""
    rate = get_base_rate(region)
    adjustments = fetch_adjustments(region, amount)
    if amount > THRESHOLD:
        rate *= 1.05
    # ... 50 more lines of complex math ...
    return final_tax

After (Lightweight Skeleton):

# [TOKEN-GUARD: Compressed — structural skeleton only]
def calculate_tax(amount: float, region: str) -> float: ...

The LLM still knows calculate_tax exists and how to call it, but doesn't waste tokens reading the implementation.

🔁 Long-Lived Session Hijacking (Anthropic Prompt Caching)

This is TokenTamer's most powerful cost-cutting feature. Anthropic offers a 90% discount on cached input tokens:

Token type Price per 1M tokens
Regular input $3.00
Cached input $0.30

The catch: Claude Code (and most agents) don't use it well. They mutate the conversation prefix every turn, so the cache never hits. TokenTamer fixes this.

How it works

Without TokenTamer — each turn re-sends the entire conversation:

Turn 5: [sys, tools, msg1, msg2, msg3, msg4, msg5] → $$$$
Turn 6: [sys, tools, msg1, msg2, msg3, msg4, msg5, msg6] → $$$$$
Turn 7: [sys, tools, msg1...msg7] → $$$$$$$

With TokenTamer — stable prefix cached, only new messages billed:

Turn 5: [sys, tools, msg1, msg2, msg3] [msg4, msg5] → cache | $$
Turn 6: [sys, tools, msg1, msg2, msg3] [msg4, msg5, msg6] → cache | $$
Turn 7: [sys, tools, msg1, msg2, msg3] [msg4, msg5, msg6, msg7] → cache | $$

TokenTamer injects cache_control breakpoints at three stable positions:

  1. After tools array — rarely changes between turns
  2. After system prompt — fixed for the whole session
  3. After conversation prefix — everything except the last 2 turns

Result: a 50-turn Claude Code session drops from ~$5.00 to ~$0.50. Same model. Same reasoning. Same output. Just smarter billing.

Verification

When active, every Anthropic response includes cache headers:

curl -I http://127.0.0.1:8000/v1/messages ...
# X-TokenTamer-Cache-Breakpoints: 3
# X-TokenTamer-Cache-Tokens: 12400

Opt-out

If you ever need to disable it:

token-tamer --no-session-cache

⚙️ Configuration

Create a config.yaml in your working directory:

proxy:
  host: "127.0.0.1"
  port: 8000

upstream:
  openai_url: "https://api.openai.com"
  anthropic_url: "https://api.anthropic.com"

context:
  repo_path: "/path/to/your/codebase"  # Enables semantic active-file detection

skeletonizer:
  keep_docstrings: false      # Preserve function docstrings?
  keep_class_attrs: true      # Keep class-level attributes?

pricing:                       # Per 1M tokens for cost estimation
  gpt-4o:
    input: 2.50
    output: 10.00
  claude-sonnet-4-20250514:
    input: 3.00
    output: 15.00

🌐 Multi-Language Support

TokenTamer skeletonizes more than just Python:

Language Method Status
Python Native AST
JavaScript Brace-balance heuristic
TypeScript Brace-balance heuristic
Go Brace-balance heuristic
Rust Brace-balance heuristic
Java / C# / C / C++ Brace-balance heuristic

🧠 Semantic Active-File Detection

If you provide a repo_path in config.yaml and install sentence-transformers, TokenTamer uses embeddings to detect which files are semantically relevant to your query — even if you don't mention them by name.

pip install sentence-transformers scikit-learn

🛠 Supported APIs

Provider Endpoint Status
OpenAI /v1/chat/completions
OpenAI /v1/completions ✅ (pass-through)
OpenAI /v1/models ✅ (pass-through)
Anthropic /v1/messages

🧪 Testing

pip install -e ".[dev]"
pytest tests/ -v

📋 Requirements

  • Python 3.10+
  • Dependencies: FastAPI, uvicorn, httpx, tiktoken, rich, pyyaml

📜 License

MIT

About

A drop-in proxy that compresses bloated code context in real-time, cutting LLM API costs by 50–80% without losing what the model actually needs to know.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages