Know exactly what you spend on AI β and automatically spend less.
A local proxy that sits between your tools and the LLM APIs. It tracks every token, calculates real costs, and optimizes requests on the fly. Works with Claude Code, VS Code, Claude Desktop, OpenClaw, and 15+ API providers (OpenAI, Groq, Mistral, DeepSeek, xAI, Perplexity, Cerebras, Together, Fireworks, Cohere, OpenRouter, Ollama, Amazon Bedrock, Google Gemini, and more).
If you use Claude Code or VS Code AI tools heavily, you're probably spending $2β10/day without knowing exactly where it goes. This tool:
- Shows you a real-time breakdown by tool, model, session, and task type
- Automatically reduces your bill by 50β80% via prompt caching and smart model routing
- Works locally β your prompts never leave your machine
Full spend analytics at http://localhost:8082/dashboard:
- Cost by day / week / month β trend chart with daily breakdown
- By task type β Coding vs Shell vs Agent vs Planning vs Web
- By model β Claude Haiku / Sonnet / Opus / GPT-4o / etc.
- By source β Claude Code, Claude Desktop, VS Code Extensions, OpenClaw, GitHub Copilot (usage only), API providers
- Cache analytics β hit rate, money saved vs. what you'd pay without caching
- Session view β every conversation: tokens in/out, cost, tools used
- RAW Logs β last 500 requests with full prompt preview
- Health grade AβF β scores your efficiency and tells you what to fix
Always-visible cost tracker in your menu bar. Click to see:
- Today's spend + health grade
- 7-day trend chart with hover tooltips
- Breakdown by task, model, cache, tools
- Smart Routing optimizer stats
- One-click sync from local logs
The proxy applies these silently on every request:
| Optimization | What it does | Typical savings |
|---|---|---|
| Prompt Cache | Auto-tags large system prompts and user messages for caching | 60β90% on repeat reads |
| Smart Routing | Routes simple requests to Haiku instead of Sonnet/Opus | 5β25Γ cheaper per request |
| Thinking Budget | Caps budget_tokens for extended thinking based on complexity |
80β90% on thinking tokens |
| Message Trim | Removes old messages when context exceeds 50k tokens | Prevents runaway costs |
| Session Cap | Limits history to 40 messages per session | Prevents VS Code "prompt too long" errors |
| Deduplication | Returns cached response for identical requests within 5s | 100% on accidental double-sends |
Smart Routing scores each prompt 0β10 (length, keywords, code presence) and silently downgrades model when complexity is low. You get the same response, cheaper.
β οΈ macOS only β This tool requires macOS (Monterey 12+) and the launchd system. Linux and Windows are not currently supported.
Requirements: Python 3.9+
Step 1 β first time only:
cd ~ && git clone https://github.com/mr-beaver/tokencost && cd tokencost && bash onbording.shThe setup script installs everything and adds a tokencost command to your terminal.
Step 2 β every time after (restart / update):
Open Terminal and type:
tokencost
First install adds this command automatically. Open a new terminal tab after step 1 for it to appear.
The setup script:
- Creates a Python virtualenv and installs dependencies
- Imports your full Claude usage history from local logs
- Sets
ANTHROPIC_BASE_URL=http://localhost:8082in~/.zshrcand macOS launchd - Adds
tokencostalias for quick restarts - Starts the proxy and opens the dashboard
After install, all your Claude Code and VS Code requests flow through the proxy automatically.
The pre-built app is included in the repo β onbording.sh installs and launches it automatically.
To install manually:
cp -R menubar/TokenCostBar.app ~/Applications/
open ~/Applications/TokenCostBar.appBuild from source (requires Xcode Command Line Tools):
cd menubar && bash build.sh
mv TokenCostBar.app ~/Applications/
open ~/Applications/TokenCostBar.app- Claude Code / Claude CLI β from
~/.claude/projects/**/*.jsonl - Claude Desktop β from
~/Library/.../local-agent-mode-sessions/**/*.jsonl - OpenClaw β from
~/.openclaw/agents/**/*.jsonl - VS Code Extensions (Cline, Roo Code, Kilo Code, IBM Bob) β from
VSCode/workspaceStorage/*/tasks/ui_messages.json - GitHub Copilot β usage tracking from VS Code logs (model + requests, pricing unavailable)
Set the base URL for each provider you want to track:
export ANTHROPIC_BASE_URL=http://localhost:8082 # Claude (auto-set by onbording.sh)
export OPENAI_BASE_URL=http://localhost:8082/openai # OpenAI
export GROQ_API_BASE=http://localhost:8082/groq # Groq
export MISTRAL_API_BASE=http://localhost:8082/mistral # Mistral
export DEEPSEEK_API_BASE=http://localhost:8082/deepseek # DeepSeekSupported: Anthropic Β· OpenAI Β· Groq Β· Mistral Β· DeepSeek Β· xAI Β· Perplexity Β· Cerebras Β· Together Β· Fireworks Β· Cohere Β· OpenRouter Β· Ollama Β· Amazon Bedrock Β· Google Gemini
218 models in the pricing database.
Your tool (Claude Code / VS Code / etc.)
β
localhost:8082 β proxy runs here
β scores complexity, applies optimizations, logs tokens+cost
api.anthropic.com / api.openai.com / etc.
β
Response back through proxy β your tool
The proxy is transparent β your tools see it as the real API. Zero changes to your workflow.
When enabled (onbording.sh β option 1), the proxy scores the last user message:
| Score | Original model | Routes to | Savings |
|---|---|---|---|
| 0β2 | Sonnet | Haiku | ~5Γ cheaper |
| 0β2 | Opus | Haiku | ~25Γ cheaper |
| 3β5 | Opus | Sonnet | ~5Γ cheaper |
| 6β10 | any | unchanged | β |
Simple questions (what is X, explain Y), short messages, and tool-chain intermediates score 0β2. Long coding tasks with keywords like implement, refactor, debug score 6+.
curl http://localhost:8082/stats?period=today # today's stats (JSON)
curl http://localhost:8082/stats?period=7d # last 7 days
curl http://localhost:8082/stats?period=30d # last 30 days
curl http://localhost:8082/raw-logs?limit=100 # recent requests- Everything stored locally in
tracker.db(SQLite) - Nothing is sent to any external service
- The proxy only reads your token usage metadata β not the content of your conversations (prompt previews are stored locally, never transmitted)
- Stop the proxy anytime:
bash onbording.sh β option 2
bash onbording.sh # choose option 2 β stops proxy, removes env vars
β οΈ Don't kill the proxy process manually if you're using Claude Code β Claude routes through it and will crash with exit 143. Always useonbording.sh.
tokencost/
βββ proxy.py β FastAPI proxy, port 8082
βββ optimizer.py β Request optimizations (cache, routing, trim)
βββ db.py β SQLite schema, cost calculations, analytics
βββ import_history.py β Import historical logs from Claude CLI, Desktop, OpenClaw
βββ projects.py β Project/session tracking
βββ dashboard.html β Web dashboard UI
βββ onbording.sh β Setup / start / stop script
βββ menubar/ β macOS SwiftUI menu bar app
βββ build.sh
βββ Sources/TokenCostBar/
βββ MenuBarView.swift
βββ StatsModel.swift
MIT