You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The "120x fewer tokens" claim comes from a controlled benchmark. Here's the methodology so you can verify it yourself.
Setup: 5 structural questions about a real codebase (function lookup, call tracing, dead code, route listing, architecture overview). Each question asked twice — once via codebase-memory-mcp graph queries, once via a Claude Code Explorer agent that uses grep/Glob/Read tools.
Measurement: Total input + output tokens consumed by all tool calls to answer each question.
Results:
Question Type
Graph (tokens)
Explorer (tokens)
Ratio
Find function by pattern
~200
~45,000
225x
Trace call chain (depth 3)
~800
~120,000
150x
Dead code detection
~500
~85,000
170x
List all routes
~400
~62,000
155x
Architecture overview
~1,500
~100,000
67x
Total
~3,400
~412,000
121x
The Explorer agent has to: read file listings → grep for patterns → read matching files → parse the output → grep again for related files → read those. Each step is a tool call with full file contents in the response.
The graph query returns exactly the structural information in one call. No file contents, no noise, no irrelevant matches.
Why it matters beyond fitting in the context window: Cost ($3-15/M tokens adds up), latency (seconds of file reading vs <1ms graph query), and accuracy (LLMs lose track of details in large contexts).
Full benchmark data: See BENCHMARK_REPORT.md and the Performance section in the README.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
The "120x fewer tokens" claim comes from a controlled benchmark. Here's the methodology so you can verify it yourself.
Setup: 5 structural questions about a real codebase (function lookup, call tracing, dead code, route listing, architecture overview). Each question asked twice — once via codebase-memory-mcp graph queries, once via a Claude Code Explorer agent that uses grep/Glob/Read tools.
Measurement: Total input + output tokens consumed by all tool calls to answer each question.
Results:
The Explorer agent has to: read file listings → grep for patterns → read matching files → parse the output → grep again for related files → read those. Each step is a tool call with full file contents in the response.
The graph query returns exactly the structural information in one call. No file contents, no noise, no irrelevant matches.
Why it matters beyond fitting in the context window: Cost ($3-15/M tokens adds up), latency (seconds of file reading vs <1ms graph query), and accuracy (LLMs lose track of details in large contexts).
Full benchmark data: See BENCHMARK_REPORT.md and the Performance section in the README.
Beta Was this translation helpful? Give feedback.
All reactions