A typical one-hour development session using the context7 MCP server (documentation lookup) without and with mcpkill.
You are building a Next.js 16 app. You query context7 repeatedly throughout the session to look up docs on Server Actions, caching, and routing. Each context7 response is a large Markdown dump — typically 7 000–9 000 tokens.
# Install mcpkill and register context7 through it
mcpkill install context7 -- npx -y @upstash/context7-mcp[mcpkill] Step 1/2 — warming up embedding model …
[mcpkill] ✓ Model ready (~/.fastembed_cache)
[mcpkill] Step 2/2 — registering 'context7' in Claude Code (scope: user) …
[mcpkill] ✓ Done! Restart Claude Code (or open a new chat) to activate.
Verify with: claude mcp list
That's it. Claude now sees context7 exactly as before — mcpkill is
invisible in the middle.
Prompt: "How do Server Actions work in Next.js 16?"
context7 returns a 47-section Markdown file covering the entire Server Actions API. mcpkill intercepts the response:
[mcpkill] CACHE MISS [8 340t] — chunking 42 108 bytes
[mcpkill] → returning 4/4 chunks (~610t)
Claude receives 610 tokens of the most relevant sections instead of 8 340.
The full response is chunked, embedded, and stored in ~/.mcpkill.db.
Prompt: "Show me how to revalidate data after a Server Action."
Same tool call to context7, same large Markdown response — but this time mcpkill recognises the query is semantically similar to Query 1 (cosine ≥ 0.85). It never forwards the response to the embedder:
[mcpkill] CACHE HIT [47 chunks] original=8340t
[mcpkill] → returning 4/4 chunks (~580t)
context7 is called, but 0 tokens reach Claude's context from the raw dump.
mcpkill re-ranks the stored chunks against the new query and returns the four
most relevant sections — including revalidatePath and revalidateTag.
Prompt: "How does the Next.js App Router handle nested layouts?"
New topic. Cache miss. context7 returns another large dump (~6 200 tokens):
[mcpkill] CACHE MISS [6 180t] — chunking 31 450 bytes
[mcpkill] → returning 4/4 chunks (~540t)
Prompt: "Can I have a layout per route group in Next.js?"
Semantically close to Query 3:
[mcpkill] CACHE HIT [38 chunks] original=6180t
[mcpkill] → returning 4/4 chunks (~490t)
Prompt: "How do I call a Server Action from a client component?"
Overlaps with Query 1 (Server Actions) but focuses on a different angle. Cosine similarity is 0.87 — above the 0.85 threshold:
[mcpkill] CACHE HIT [47 chunks] original=8340t
[mcpkill] → returning 4/4 chunks (~620t)
The 'use client' + startTransition chunks surface to the top because
they score highest against the new query embedding.
mcpkill stats┌─────────────────────────────────────────┐
│ mcpkill cache stats │
├─────────────────────────────────────────┤
│ Cache entries 2 │
│ Stored chunks 85 │
│ Cache hits 3 (60%) │
├─────────────────────────────────────────┤
│ Tokens (original) 22 860 │
│ Tokens (returned) 2 850 │
│ Tokens saved ~20 010 (88%) │
├─────────────────────────────────────────┤
│ DB size 0.14 MB │
└─────────────────────────────────────────┘
5 queries, 88% token reduction. On a paid plan where context tokens cost money, that's roughly 12× cheaper for documentation lookups.
Query 1 (cold)
Claude ──► mcpkill ──► context7
│
└─ MISS: chunk(8340t) → embed 47 chunks → store → return top-4 (610t)
Query 2 (warm, same topic)
Claude ──► mcpkill ──► context7
│
└─ HIT (cosine 0.91): re-rank 47 stored chunks → return top-4 (580t)
context7 response discarded before token counting
Query 5 (warm, related topic)
Claude ──► mcpkill ──► context7
│
└─ HIT (cosine 0.87): same 47 chunks, different top-4 (620t)
chunk about startTransition now ranks #1
The embedding model runs locally (~5 ms per query, all-MiniLM-L6-v2). No data leaves your machine beyond the normal MCP call to context7.
| Situation | Flag to adjust |
|---|---|
| Getting chunks that are too broad | --max-chunks 2 |
| Cache hits on unrelated queries | --threshold 0.92 |
| Stale docs (fast-moving library) | --ttl-days 1 |
| Large codebase, many MCP servers | --max-db-mb 500 |
| Debugging what gets filtered | --dry-run --verbose |
# Example: stricter cache, more chunks, weekly TTL
mcpkill --threshold 0.90 --max-chunks 6 --ttl-days 7 -- npx -y @upstash/context7-mcpOr persist in ~/.mcpkill.toml so every session picks it up automatically:
threshold = 0.90
max_chunks = 6
ttl_days = 7mcpkill wraps any stdio MCP server — not just context7:
// .claude/settings.json
{
"mcpServers": {
"context7": {
"command": "mcpkill",
"args": ["--", "npx", "-y", "@upstash/context7-mcp"]
},
"filesystem": {
"command": "mcpkill",
"args": ["--", "npx", "-y", "@modelcontextprotocol/server-filesystem", "/home/user/project"]
},
"github": {
"command": "mcpkill",
"args": ["--max-chunks", "6", "--", "npx", "-y", "@modelcontextprotocol/server-github"]
}
}
}Servers with small, targeted responses (e.g. a single-row database lookup) will rarely benefit from caching. Servers that return large blobs of text — documentation, file trees, search results — see the biggest savings.