Transparent HTTP proxy that captures a Claude Code first-turn request, tokenizes each
component (system prompt blocks, tools, message blocks, claudeMd sub-sections) with the
Qwen3 BPE, calibrates against the server's actual token usage, and writes a breakdown
as CSV (default) and/or markdown. Two extra drill-down tables attribute the Agent
tool description and the skills-catalog message block to their originating plugins.
git clone <this-repo>
cd claude_boot_stats
pip install -e .The Qwen3 tokenizer (~20 MB of config + vocab) is downloaded from Hugging Face on first run; subsequent runs are offline.
# default: proxy on 127.0.0.1:8788, CSV output trio (main + agents + skills + mcp + ...)
python -m claude_boot_stats --csv ./cbs.csv
# markdown only
python -m claude_boot_stats --md ./cbs.md
# both
python -m claude_boot_stats --csv ./cbs.csv --md ./cbs.mdThen launch Claude Code pointing at the proxy:
ANTHROPIC_BASE_URL=http://127.0.0.1:8788 claudeReports are overwritten on each captured request (use --once to stop after the first).
--csv <path>writes three files:<path>(main breakdown),<path-stem>.agents.csv,<path-stem>.skills.csv. Each agent / skill row carries its originating plugin (prefix before the first:in the identifier;corewhen there is no prefix).--md <path>writes one markdown file containing all three tables + subtotals.- If neither is given, the default is
./claude_boot_stats.csv.
| flag | default | meaning |
|---|---|---|
--port |
8788 | local proxy port |
--upstream |
https://api.anthropic.com |
where to forward requests |
--csv |
(default on if --md absent → ./claude_boot_stats.csv) |
CSV output stem |
--md |
unset | markdown output path |
--min-body-bytes |
10000 | ignore small bodies (quota pings etc.) |
--once |
false | write one report then keep proxying |
Every row in the report shows a qwen_tokens column — the raw Qwen3 BPE count for that
component. The proxy captures the response's
usage.input_tokens + cache_creation_input_tokens + cache_read_input_tokens
from the first message_start SSE event (or the non-stream JSON .usage field). When
usage is available (cal_source = auto), each row also gets a claude_tokens column
equal to qwen_tokens × (claude_total / qwen_total), so the claude column sums to the
ground truth and gives a proportional per-row allocation.
When usage is not captured (cal_source = none), the report shows the qwen_tokens
column only — no multiplier is fabricated. Qwen3 and Claude's tokenizer are close but
not identical, and the ratio depends heavily on content mix (JSON tool schemas vs.
prose), so there is no single "safe" default to substitute.