mcp-w2vgrep

MCP server for semantic search using w2vgrep and Word2Vec embeddings.

Unlike regular grep, finds semantically similar words. Searching for "fear" also finds "anxiety", "terror", "dread".

Docker (Recommended)

Build

docker compose build

Configuration

Add to your project's .mcp.json:

{
  "mcpServers": {
    "w2vgrep": {
      "command": "docker",
      "args": [
        "run",
        "--rm",
        "-i",
        "-v", "/path/to/.config/semantic-grep:/app/models:ro",
        "-v", "/path/to/search/directory:/search:ro",
        "-e", "DOWNLOAD_MODELS=none",
        "mcp-w2vgrep-mcp-w2vgrep:latest"
      ]
    }
  }
}

Replace:

/path/to/.config/semantic-grep — directory with Word2Vec models
/path/to/search/directory — directory to search in

First Run (Download Models)

If you don't have models, the container will download them on first run:

# Download English model (~2.3GB)
DOWNLOAD_MODELS=english docker compose up

# Download Russian model (~2.3GB)
DOWNLOAD_MODELS=russian docker compose up

# Download both
DOWNLOAD_MODELS=english,russian docker compose up

Models are saved to the models Docker volume.

Native Installation

Requirements

w2vgrep binary
ripgrep (rg)
Word2Vec model files (.bin)
Node.js 18+

Getting Word2Vec Models

mkdir -p ~/.config/semantic-grep

# English (2.3GB)
curl -L https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.en.300.bin.gz | \
  gunzip > ~/.config/semantic-grep/english.bin

# Russian (2.3GB)
curl -L https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.ru.300.bin.gz | \
  gunzip > ~/.config/semantic-grep/russian.bin

Installation

git clone <repo-url> mcp-w2vgrep
cd mcp-w2vgrep
npm install
npm run build

Configuration

Add to ~/.claude/settings.json:

{
  "mcpServers": {
    "w2vgrep": {
      "command": "node",
      "args": ["/path/to/mcp-w2vgrep/dist/index.js"],
      "env": {
        "W2VGREP_PATH": "/path/to/w2vgrep"
      }
    }
  }
}

Tool: `semantic_search`

Semantic search in text files using Word2Vec embeddings.

Parameters

Parameter	Type	Required	Description
`query`	string	yes	Search query (single word recommended, phrases may crash)
`model_path`	string	yes	Path to Word2Vec model (.bin)
`threshold`	number	no	Similarity threshold (default: 0.7)
`glob`	string	no	File pattern (default: `*.md`)
`context`	integer	no	Lines of context (default: 2, set to 0 to reduce output)
`ignore_case`	boolean	no	Case-insensitive search

Docker note: Search path is always /search (mounted volume), recursive search is always enabled.

Threshold Guide

Value	Result
0.7	Strict — only very close matches (default)
0.5-0.6	Balanced — good for most use cases
0.4	Broad — more results, some noise
< 0.5	WARNING: Can return MASSIVE amounts of data (millions of characters)!

Usage Examples

Basic search (Russian)

{
  "query": "тревога",
  "model_path": "~/.config/semantic-grep/russian.bin"
}

Search with lower threshold

{
  "query": "happiness",
  "model_path": "~/.config/semantic-grep/english.bin",
  "threshold": 0.5
}

Reduce output size

{
  "query": "error",
  "model_path": "~/.config/semantic-grep/english.bin",
  "threshold": 0.6,
  "context": 0
}

Response Format

{
  "query": "fear",
  "total": 2,
  "matches": [
    {
      "similarity": 1.0,
      "match": "The fear of failure...",
      "locations": [
        {
          "file": "notes/psychology.md",
          "line": 42,
          "context": "Context before\nThe fear of failure...\nContext after"
        }
      ]
    },
    {
      "similarity": 0.72,
      "match": "Anxiety about the future",
      "locations": [
        {
          "file": "diary/2024-01.md",
          "line": 15,
          "context": "..."
        }
      ]
    }
  ]
}

Matches sorted by similarity (highest first). similarity: 1.0 = exact match.

Development

npm test           # Run tests
npm run test:watch # Watch mode
npm run build      # Build TypeScript

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mcp-w2vgrep

Docker (Recommended)

Build

Configuration

First Run (Download Models)

Native Installation

Requirements

Getting Word2Vec Models

Installation

Configuration

Tool: `semantic_search`

Parameters

Threshold Guide

Usage Examples

Basic search (Russian)

Search with lower threshold

Reduce output size

Response Format

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mcp-w2vgrep

Docker (Recommended)

Build

Configuration

First Run (Download Models)

Native Installation

Requirements

Getting Word2Vec Models

Installation

Configuration

Tool: semantic_search

Parameters

Threshold Guide

Usage Examples

Basic search (Russian)

Search with lower threshold

Reduce output size

Response Format

Development

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Tool: `semantic_search`

Packages