Skip to content

Latest commit

 

History

History
221 lines (161 loc) · 6.54 KB

File metadata and controls

221 lines (161 loc) · 6.54 KB

Use Case — Building a Next.js App with context7

A typical one-hour development session using the context7 MCP server (documentation lookup) without and with mcpkill.


The scenario

You are building a Next.js 16 app. You query context7 repeatedly throughout the session to look up docs on Server Actions, caching, and routing. Each context7 response is a large Markdown dump — typically 7 000–9 000 tokens.


Setup (one-time, ~30 s)

# Install mcpkill and register context7 through it
mcpkill install context7 -- npx -y @upstash/context7-mcp
[mcpkill] Step 1/2 — warming up embedding model …
[mcpkill] ✓ Model ready (~/.fastembed_cache)
[mcpkill] Step 2/2 — registering 'context7' in Claude Code (scope: user) …
[mcpkill] ✓ Done! Restart Claude Code (or open a new chat) to activate.
  Verify with: claude mcp list

That's it. Claude now sees context7 exactly as before — mcpkill is invisible in the middle.


Session walkthrough

Query 1 — cold start

Prompt: "How do Server Actions work in Next.js 16?"

context7 returns a 47-section Markdown file covering the entire Server Actions API. mcpkill intercepts the response:

[mcpkill] CACHE MISS [8 340t] — chunking 42 108 bytes
[mcpkill] → returning 4/4 chunks (~610t)

Claude receives 610 tokens of the most relevant sections instead of 8 340. The full response is chunked, embedded, and stored in ~/.mcpkill.db.


Query 2 — semantic cache hit

Prompt: "Show me how to revalidate data after a Server Action."

Same tool call to context7, same large Markdown response — but this time mcpkill recognises the query is semantically similar to Query 1 (cosine ≥ 0.85). It never forwards the response to the embedder:

[mcpkill] CACHE HIT  [47 chunks] original=8340t
[mcpkill] → returning 4/4 chunks (~580t)

context7 is called, but 0 tokens reach Claude's context from the raw dump. mcpkill re-ranks the stored chunks against the new query and returns the four most relevant sections — including revalidatePath and revalidateTag.


Query 3 — different topic, new miss

Prompt: "How does the Next.js App Router handle nested layouts?"

New topic. Cache miss. context7 returns another large dump (~6 200 tokens):

[mcpkill] CACHE MISS [6 180t] — chunking 31 450 bytes
[mcpkill] → returning 4/4 chunks (~540t)

Query 4 — hit on the new topic

Prompt: "Can I have a layout per route group in Next.js?"

Semantically close to Query 3:

[mcpkill] CACHE HIT  [38 chunks] original=6180t
[mcpkill] → returning 4/4 chunks (~490t)

Query 5 — partial overlap

Prompt: "How do I call a Server Action from a client component?"

Overlaps with Query 1 (Server Actions) but focuses on a different angle. Cosine similarity is 0.87 — above the 0.85 threshold:

[mcpkill] CACHE HIT  [47 chunks] original=8340t
[mcpkill] → returning 4/4 chunks (~620t)

The 'use client' + startTransition chunks surface to the top because they score highest against the new query embedding.


End-of-session stats

mcpkill stats
┌─────────────────────────────────────────┐
│           mcpkill cache stats            │
├─────────────────────────────────────────┤
│  Cache entries    2                      │
│  Stored chunks    85                     │
│  Cache hits       3   (60%)              │
├─────────────────────────────────────────┤
│  Tokens (original)  22 860               │
│  Tokens (returned)   2 850               │
│  Tokens saved      ~20 010  (88%)        │
├─────────────────────────────────────────┤
│  DB size          0.14 MB                │
└─────────────────────────────────────────┘

5 queries, 88% token reduction. On a paid plan where context tokens cost money, that's roughly 12× cheaper for documentation lookups.


What happened under the hood

Query 1 (cold)
  Claude ──► mcpkill ──► context7
                │
                └─ MISS: chunk(8340t) → embed 47 chunks → store → return top-4 (610t)

Query 2 (warm, same topic)
  Claude ──► mcpkill ──► context7
                │
                └─ HIT (cosine 0.91): re-rank 47 stored chunks → return top-4 (580t)
                   context7 response discarded before token counting

Query 5 (warm, related topic)
  Claude ──► mcpkill ──► context7
                │
                └─ HIT (cosine 0.87): same 47 chunks, different top-4 (620t)
                   chunk about startTransition now ranks #1

The embedding model runs locally (~5 ms per query, all-MiniLM-L6-v2). No data leaves your machine beyond the normal MCP call to context7.


Tuning for your workflow

Situation Flag to adjust
Getting chunks that are too broad --max-chunks 2
Cache hits on unrelated queries --threshold 0.92
Stale docs (fast-moving library) --ttl-days 1
Large codebase, many MCP servers --max-db-mb 500
Debugging what gets filtered --dry-run --verbose
# Example: stricter cache, more chunks, weekly TTL
mcpkill --threshold 0.90 --max-chunks 6 --ttl-days 7 -- npx -y @upstash/context7-mcp

Or persist in ~/.mcpkill.toml so every session picks it up automatically:

threshold  = 0.90
max_chunks = 6
ttl_days   = 7

Works with any MCP server

mcpkill wraps any stdio MCP server — not just context7:

// .claude/settings.json
{
  "mcpServers": {
    "context7": {
      "command": "mcpkill",
      "args": ["--", "npx", "-y", "@upstash/context7-mcp"]
    },
    "filesystem": {
      "command": "mcpkill",
      "args": ["--", "npx", "-y", "@modelcontextprotocol/server-filesystem", "/home/user/project"]
    },
    "github": {
      "command": "mcpkill",
      "args": ["--max-chunks", "6", "--", "npx", "-y", "@modelcontextprotocol/server-github"]
    }
  }
}

Servers with small, targeted responses (e.g. a single-row database lookup) will rarely benefit from caching. Servers that return large blobs of text — documentation, file trees, search results — see the biggest savings.