Your LLM already knows most of its tools. Stop re-teaching it every request.
Every API request to Claude resends all tool definitions -- hundreds of tools, full JSON schemas, detailed descriptions. For a setup with 170+ MCP tools, that's ~26K tokens of redundant schema per request.
Text compression proxies (tamp, wet, squeezr) focus on compressing tool results. The real waste is tool definitions.
In our production workload with 170+ tools:
| Approach | Tokens saved |
|---|---|
| Text compression (results) | 8M |
| Schema compression (definitions) | 279M |
You're optimizing the wrong thing.
Methodology: Data collected over 8 days of real Claude Code usage (10,893 requests, 173 MCP tools). Token counting via
@anthropic-ai/tokenizer. The 8M text compression comparison comes from running the same traffic through tamp's text-only stages. Tool call counts, error rates, and state transitions from the self-learning state file (~/.latent-tools/state.json).
npx latent-tools # Start proxy
export ANTHROPIC_BASE_URL=http://localhost:7778 # Point Claude Code
claude # Done.latent-tools uses an adaptive state machine that progressively strips tool descriptions based on observed success rates. Each tool transitions independently:
NEW ──→ LIGHT ──→ CANARY ──→ NUKED
↑ │
└── PROTECTED ←┘
(error spike)
NEW -- Full schema sent as-is. No compression. The tool is being observed.
LIGHT -- Description truncated to its first sentence. Parameters and types remain intact.
CANARY -- All descriptions removed. The tool is in an observation period, tracking whether the model still uses it correctly.
NUKED -- All descriptions removed, stable. The model has proven it can use this tool from its training data alone. This is a terminal state — the only exit is a model version change (which demotes NUKED → LIGHT for re-validation).
PROTECTED -- Rolled back from CANARY after an error spike. Enters a 30-day cooldown before re-attempting compression (returns to LIGHT).
Claude Code's built-in tools (Read, Write, Bash, Grep, etc.) and well-known MCP tools (GitHub, Gmail, Google Calendar) start at NUKED. Claude has strong training data for these -- there's no reason to send their descriptions.
The proxy intercepts tool_result blocks in follow-up requests to track success and failure per tool. Successful calls advance the tool through the state machine. Errors trigger rollbacks. No external feedback loop required.
| Metric | Value |
|---|---|
| Requests processed | 10,893 |
| Avg tokens saved/request | 25,662 |
| Total tokens saved | 279.5M |
| Tools tracked | 173 |
| Tools at max compression | 115 (66%) |
| Tool calls after compression | 1,529,820 |
| Success rate | 98.15%* |
| Rollbacks | 0 |
* Success rate measures explicit error responses (is_error flags and error-pattern matching in tool results). It does not capture tool selection correctness — if the model silently chooses a different approach instead of using a compressed tool, that miss is invisible to the proxy. This is the primary silent failure risk.
Usage: latent-tools [options]
Options:
--port, -p <port> Port (default: 7778)
--upstream, -u <url> Upstream API (default: https://api.anthropic.com)
--verbose, -v Enable logging
--help, -h Help
--version Version
| Method | Path | Description |
|---|---|---|
| POST | /v1/messages |
Proxy with schema compression |
| GET | /health |
Health check |
| GET | /stats |
Token savings statistics |
latent-tools compresses schemas; text compression proxies compress results. They stack:
Claude Code → latent-tools (:7778) → tamp (:7779) → api.anthropic.com
Set --upstream http://localhost:7779 to chain through tamp or any other proxy.
- Targets Anthropic Messages API (
POST /v1/messages) - Handles gzip, deflate, br, and zstd request encoding
- Bodies over 50MB are passed through uncompressed
- Non-Messages API paths (except
/healthand/stats, which are handled locally) are proxied unchanged to the upstream
- Learning state and stats are persisted to
~/.latent-tools/ - No data is sent anywhere except your configured upstream API
- The learning file contains tool names and call counts, no conversation content
Silent failure risk. When a tool schema is heavily compressed, the model may not recognize when to use that tool. Unlike parameter errors (which produce error responses the self-learning system can detect), tool selection misses are invisible -- the model simply uses a different approach. This is an inherent trade-off of latent knowledge exploitation.
Model version sensitivity. Different model versions have different latent knowledge. When latent-tools detects a model version change, it automatically demotes tools for re-validation: learned NUKED tools are demoted to LIGHT, PROTECTED tools are demoted to NEW, and LIGHT/CANARY tools are left unchanged. Seeded (well-known) tools remain at NUKED regardless of model changes.
Seed list assumptions. The built-in seed list assumes Claude has strong training data for common tools (Read, Write, Bash, GitHub MCP, etc.). If Anthropic changes how these tools are handled in a future model, the seed list may need updating.
Seed list bypass. Seeded tools (Claude Code built-ins, common MCP tools) start at NUKED and bypass the self-learning state machine entirely. If a future model loses familiarity with a seeded tool, the proxy won't auto-detect the problem. The denylist prevents destructive-sounding tools (names matching delete, remove, destroy, etc.) from ever being fully compressed, capping them at LIGHT.
latent-tools was born as a patch inside tamp. Thanks to @sliday for building the proxy foundation that inspired this project.
MIT