Complete guide to configuring embeddings for Cursor @Codebase semantic search and code understanding.
Embeddings enable semantic code search in Cursor IDE's @Codebase feature. Instead of keyword matching, embeddings understand the meaning of your code, allowing you to search for functionality, concepts, or patterns.
Embeddings convert text (code, comments, documentation) into high-dimensional vectors that capture semantic meaning. Similar code gets similar vectors, enabling:
- @Codebase Search - Find relevant code by describing what you need
- Automatic Context - Cursor automatically includes relevant files in conversations
- Find Similar Code - Discover code patterns and examples in your codebase
Without embeddings:
- ❌ Keyword-only search (
grep, exact string matching) - ❌ No semantic understanding
- ❌ Can't find code by describing its purpose
With embeddings:
- ✅ Semantic search ("find authentication logic")
- ✅ Concept-based discovery ("show me error handling patterns")
- ✅ Similar code detection ("code like this function")
Lynkr supports 4 embedding providers with different tradeoffs:
| Provider | Cost | Privacy | Setup | Quality | Best For |
|---|---|---|---|---|---|
| Ollama | FREE | 🔒 100% Local | Easy | Good | Privacy, offline, no costs |
| llama.cpp | FREE | 🔒 100% Local | Medium | Good | Performance, GPU, GGUF models |
| OpenRouter | $0.01-0.10/mo | ☁️ Cloud | Easy | Excellent | Simplicity, quality, one key |
| OpenAI | $0.01-0.10/mo | ☁️ Cloud | Easy | Excellent | Best quality, direct access |
- Cost: 100% FREE 🔒
- Privacy: All data stays on your machine
- Setup: Easy (5 minutes)
- Quality: Good (768-1024 dimensions)
- Best for: Privacy-focused teams, offline work, zero cloud dependencies
# 1. Install Ollama (if not already installed)
brew install ollama # macOS
# Or download from: https://ollama.ai/download
# 2. Start Ollama service
ollama serve
# 3. Pull embedding model (in separate terminal)
ollama pull nomic-embed-text
# 4. Verify model is available
ollama list
# Should show: nomic-embed-text ...Add to .env:
# Ollama embeddings configuration
OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
OLLAMA_EMBEDDINGS_ENDPOINT=http://localhost:11434/api/embeddingsnomic-embed-text (Recommended) ⭐
ollama pull nomic-embed-text- Dimensions: 768
- Parameters: 137M
- Quality: Excellent for code search
- Speed: Fast (~50ms per query)
- Best for: General purpose, best all-around choice
mxbai-embed-large (Higher Quality)
ollama pull mxbai-embed-large- Dimensions: 1024
- Parameters: 335M
- Quality: Higher quality than nomic-embed-text
- Speed: Slower (~100ms per query)
- Best for: Large codebases where quality matters most
all-minilm (Fastest)
ollama pull all-minilm- Dimensions: 384
- Parameters: 23M
- Quality: Good for simple searches
- Speed: Very fast (~20ms per query)
- Best for: Small codebases, speed-critical applications
# Test embedding generation
curl http://localhost:11434/api/embeddings \
-d '{"model":"nomic-embed-text","prompt":"function to sort array"}'
# Should return JSON with embedding vector- ✅ 100% FREE - No API costs ever
- ✅ 100% Private - All data stays on your machine
- ✅ Offline - Works without internet
- ✅ Easy Setup - Install → Pull model → Configure
- ✅ Good Quality - Excellent for code search
- ✅ Multiple Models - Choose speed vs quality tradeoff
- Cost: 100% FREE 🔒
- Privacy: All data stays on your machine
- Setup: Medium (15 minutes, requires compilation)
- Quality: Good (same as Ollama models, GGUF format)
- Best for: Performance optimization, GPU acceleration, GGUF models
# 1. Clone and build llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
# Build with GPU support (optional):
# For CUDA (NVIDIA): make LLAMA_CUDA=1
# For Metal (Apple Silicon): make LLAMA_METAL=1
# For CPU only: make
make
# 2. Download embedding model (GGUF format)
# Example: nomic-embed-text GGUF
wget https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF/resolve/main/nomic-embed-text-v1.5.Q4_K_M.gguf
# 3. Start llama-server with embedding model
./llama-server \
-m nomic-embed-text-v1.5.Q4_K_M.gguf \
--port 8080 \
--embedding
# 4. Verify server is running
curl http://localhost:8080/health
# Should return: {"status":"ok"}Add to .env:
# llama.cpp embeddings configuration
LLAMACPP_EMBEDDINGS_ENDPOINT=http://localhost:8080/embeddingsnomic-embed-text-v1.5 (Recommended) ⭐
- File:
nomic-embed-text-v1.5.Q4_K_M.gguf - Download: https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF
- Dimensions: 768
- Size: ~80MB
- Quality: Excellent for code
- Best for: Best all-around choice
all-MiniLM-L6-v2 (Fastest)
- File:
all-MiniLM-L6-v2.Q4_K_M.gguf - Download: https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2-GGUF
- Dimensions: 384
- Size: ~25MB
- Quality: Good for simple searches
- Best for: Speed-critical applications
bge-large-en-v1.5 (Highest Quality)
- File:
bge-large-en-v1.5.Q4_K_M.gguf - Download: https://huggingface.co/BAAI/bge-large-en-v1.5-GGUF
- Dimensions: 1024
- Size: ~350MB
- Quality: Best quality for embeddings
- Best for: Large codebases, quality-critical applications
llama.cpp supports multiple GPU backends for faster embedding generation:
NVIDIA CUDA:
make LLAMA_CUDA=1
./llama-server -m model.gguf --embedding --n-gpu-layers 32Apple Silicon Metal:
make LLAMA_METAL=1
./llama-server -m model.gguf --embedding --n-gpu-layers 32AMD ROCm:
make LLAMA_ROCM=1
./llama-server -m model.gguf --embedding --n-gpu-layers 32Vulkan (Universal):
make LLAMA_VULKAN=1
./llama-server -m model.gguf --embedding --n-gpu-layers 32# Test embedding generation
curl http://localhost:8080/embeddings \
-H "Content-Type: application/json" \
-d '{"content":"function to sort array"}'
# Should return JSON with embedding vector- ✅ 100% FREE - No API costs
- ✅ 100% Private - All data stays local
- ✅ Faster than Ollama - Optimized C++ implementation
- ✅ GPU Acceleration - CUDA, Metal, ROCm, Vulkan
- ✅ Lower Memory - Quantization options (Q4, Q5, Q8)
- ✅ Any GGUF Model - Use any embedding model from HuggingFace
| Feature | Ollama | llama.cpp |
|---|---|---|
| Setup | Easy (app) | Manual (compile) |
| Model Format | Ollama-specific | Any GGUF model |
| Performance | Good | Better (optimized C++) |
| GPU Support | Yes | Yes (more options) |
| Memory Usage | Higher | Lower (more quantization options) |
| Flexibility | Limited models | Any GGUF from HuggingFace |
- Cost: ~$0.01-0.10/month (typical usage)
- Privacy: Cloud-based
- Setup: Very easy (2 minutes)
- Quality: Excellent (best-in-class models)
- Best for: Simplicity, quality, one key for chat + embeddings
Add to .env:
# OpenRouter configuration (if not already set)
OPENROUTER_API_KEY=sk-or-v1-your-key-here
# Embeddings model (optional, defaults to text-embedding-ada-002)
OPENROUTER_EMBEDDINGS_MODEL=openai/text-embedding-3-smallNote: If you're already using MODEL_PROVIDER=openrouter, embeddings work automatically with the same key! No additional configuration needed.
- Visit openrouter.ai
- Sign in with GitHub, Google, or email
- Go to openrouter.ai/keys
- Create a new API key
- Add credits (pay-as-you-go, no subscription)
openai/text-embedding-3-small (Recommended) ⭐
OPENROUTER_EMBEDDINGS_MODEL=openai/text-embedding-3-small- Dimensions: 1536
- Cost: $0.02 per 1M tokens (80% cheaper than ada-002!)
- Quality: Excellent
- Best for: Best balance of quality and cost
openai/text-embedding-ada-002 (Standard)
OPENROUTER_EMBEDDINGS_MODEL=openai/text-embedding-ada-002- Dimensions: 1536
- Cost: $0.10 per 1M tokens
- Quality: Excellent (widely supported standard)
- Best for: Compatibility
openai/text-embedding-3-large (Best Quality)
OPENROUTER_EMBEDDINGS_MODEL=openai/text-embedding-3-large- Dimensions: 3072
- Cost: $0.13 per 1M tokens
- Quality: Best quality available
- Best for: Large codebases where quality matters most
voyage/voyage-code-2 (Code-Specialized)
OPENROUTER_EMBEDDINGS_MODEL=voyage/voyage-code-2- Dimensions: 1024
- Cost: $0.12 per 1M tokens
- Quality: Optimized specifically for code
- Best for: Code search (better than general models)
voyage/voyage-2 (General Purpose)
OPENROUTER_EMBEDDINGS_MODEL=voyage/voyage-2- Dimensions: 1024
- Cost: $0.12 per 1M tokens
- Quality: Best for general text
- Best for: Mixed code + documentation
- ✅ ONE Key - Same key for chat + embeddings
- ✅ No Setup - Works immediately after adding key
- ✅ Best Quality - State-of-the-art embedding models
- ✅ Automatic Fallbacks - Switches providers if one is down
- ✅ Competitive Pricing - Often cheaper than direct providers
- Cost: ~$0.01-0.10/month (typical usage)
- Privacy: Cloud-based
- Setup: Easy (5 minutes)
- Quality: Excellent (best-in-class, direct from OpenAI)
- Best for: Best quality, direct OpenAI access
Add to .env:
# OpenAI configuration (if not already set)
OPENAI_API_KEY=sk-your-openai-api-key
# Embeddings model (optional, defaults to text-embedding-ada-002)
# Recommended: Use text-embedding-3-small for 80% cost savings
# OPENAI_EMBEDDINGS_MODEL=text-embedding-3-small- Visit platform.openai.com
- Sign up or log in
- Go to API Keys
- Create a new API key
- Add credits to your account (pay-as-you-go)
text-embedding-3-small (Recommended) ⭐
OPENAI_EMBEDDINGS_MODEL=text-embedding-3-small- Dimensions: 1536
- Cost: $0.02 per 1M tokens (80% cheaper!)
- Quality: Excellent
- Best for: Best balance of quality and cost
text-embedding-ada-002 (Standard)
OPENAI_EMBEDDINGS_MODEL=text-embedding-ada-002- Dimensions: 1536
- Cost: $0.10 per 1M tokens
- Quality: Excellent (standard, widely used)
- Best for: Compatibility
text-embedding-3-large (Best Quality)
OPENAI_EMBEDDINGS_MODEL=text-embedding-3-large- Dimensions: 3072
- Cost: $0.13 per 1M tokens
- Quality: Best quality available
- Best for: Maximum quality for large codebases
- ✅ Best Quality - Direct from OpenAI, best-in-class
- ✅ Lowest Latency - No intermediaries
- ✅ Simple Setup - Just one API key
- ✅ Organization Support - Use org-level API keys for teams
| Feature | Ollama | llama.cpp | OpenRouter | OpenAI |
|---|---|---|---|---|
| Cost | FREE | FREE | $0.01-0.10/mo | $0.01-0.10/mo |
| Privacy | 🔒 Local | 🔒 Local | ☁️ Cloud | ☁️ Cloud |
| Setup | Easy | Medium | Easy | Easy |
| Quality | Good | Good | Excellent | Excellent |
| Speed | Fast | Faster | Fast | Fast |
| Offline | ✅ Yes | ✅ Yes | ❌ No | ❌ No |
| GPU Support | Yes | Yes (more options) | N/A | N/A |
| Model Choice | Limited | Any GGUF | Many | Few |
| Dimensions | 384-1024 | 384-1024 | 1024-3072 | 1536-3072 |
| Provider | Model | Monthly Cost |
|---|---|---|
| Ollama | Any | $0 (100% FREE) 🔒 |
| llama.cpp | Any | $0 (100% FREE) 🔒 |
| OpenRouter | text-embedding-3-small | $0.02 |
| OpenRouter | text-embedding-ada-002 | $0.10 |
| OpenRouter | voyage-code-2 | $0.12 |
| OpenAI | text-embedding-3-small | $0.02 |
| OpenAI | text-embedding-ada-002 | $0.10 |
| OpenAI | text-embedding-3-large | $0.13 |
By default, Lynkr uses the same provider as MODEL_PROVIDER for embeddings (if supported). To use a different provider for embeddings:
# Use Databricks for chat, but Ollama for embeddings (privacy + cost savings)
MODEL_PROVIDER=databricks
DATABRICKS_API_BASE=https://your-workspace.databricks.com
DATABRICKS_API_KEY=your-key
# Override embeddings provider
EMBEDDINGS_PROVIDER=ollama
OLLAMA_EMBEDDINGS_MODEL=nomic-embed-textSmart provider detection:
- Uses same provider as chat (if embeddings supported)
- Or automatically selects first available embeddings provider
- Or use
EMBEDDINGS_PROVIDERto force a specific provider
Best for: Sensitive codebases, offline work, zero cloud dependencies
# Chat: Ollama (local)
MODEL_PROVIDER=ollama
OLLAMA_MODEL=llama3.1:8b
# Embeddings: Ollama (local)
OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
# Everything 100% local, 100% private, 100% FREE!Benefits:
- ✅ Zero cloud dependencies
- ✅ All data stays on your machine
- ✅ Works offline
- ✅ 100% FREE
Best for: Easy setup, flexibility, quality
# Chat + Embeddings: OpenRouter with ONE key
MODEL_PROVIDER=openrouter
OPENROUTER_API_KEY=sk-or-v1-your-key
OPENROUTER_MODEL=anthropic/claude-3.5-sonnet
# Embeddings work automatically with same key!
# Optional: Specify model for cost savings
OPENROUTER_EMBEDDINGS_MODEL=openai/text-embedding-3-smallBenefits:
- ✅ ONE key for everything
- ✅ Best quality embeddings
- ✅ 100+ chat models available
- ✅ ~$5-10/month total cost
Best for: Privacy + Quality + Cost Optimization
# Chat: Tier-based routing (set all 4 to enable)
TIER_SIMPLE=ollama:llama3.2
TIER_MEDIUM=openrouter:openai/gpt-4o-mini
TIER_COMPLEX=databricks:databricks-claude-sonnet-4-5
TIER_REASONING=databricks:databricks-claude-sonnet-4-5
FALLBACK_ENABLED=true
FALLBACK_PROVIDER=databricks
DATABRICKS_API_BASE=https://your-workspace.databricks.com
DATABRICKS_API_KEY=your-key
# Embeddings: Ollama (local, private)
OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
# Result: Free + private embeddings, mostly free chat, cloud for complex tasksBenefits:
- ✅ 70-80% of chat requests FREE (Ollama via TIER_SIMPLE)
- ✅ 100% private embeddings (local)
- ✅ Cloud quality for complex tasks
- ✅ Intelligent automatic tier-based routing
Best for: Large teams, quality-critical applications
# Chat: Databricks (enterprise SLA)
MODEL_PROVIDER=databricks
DATABRICKS_API_BASE=https://your-workspace.databricks.com
DATABRICKS_API_KEY=your-key
# Embeddings: OpenRouter (best quality)
OPENROUTER_API_KEY=sk-or-v1-your-key
OPENROUTER_EMBEDDINGS_MODEL=voyage/voyage-code-2 # Code-specializedBenefits:
- ✅ Enterprise chat (Claude 4.5)
- ✅ Best embedding quality (code-specialized)
- ✅ Separate billing/limits for chat vs embeddings
- ✅ Production-ready reliability
# Test embedding generation
curl http://localhost:8081/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"input": "function to sort an array",
"model": "text-embedding-ada-002"
}'
# Should return JSON with embedding vector
# Example response:
# {
# "object": "list",
# "data": [{
# "object": "embedding",
# "embedding": [0.123, -0.456, 0.789, ...], # 768-3072 dimensions
# "index": 0
# }],
# "model": "text-embedding-ada-002",
# "usage": {"prompt_tokens": 7, "total_tokens": 7}
# }- Open Cursor IDE
- Open a project
- Press Cmd+L (or Ctrl+L)
- Type:
@Codebase find authentication logic - Expected: Cursor returns relevant files
If @Codebase doesn't work:
- Check embeddings endpoint:
curl http://localhost:8081/v1/embeddings(should not return 501) - Restart Lynkr after adding embeddings config
- Restart Cursor to re-index codebase
Symptoms: @Codebase doesn't return results or shows error
Solutions:
-
Verify embeddings are configured:
curl http://localhost:8081/v1/embeddings \ -H "Content-Type: application/json" \ -d '{"input":"test","model":"text-embedding-ada-002"}' # Should return embeddings, not 501 error
-
Check embeddings provider in .env:
# Verify ONE of these is set: OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text # OR LLAMACPP_EMBEDDINGS_ENDPOINT=http://localhost:8080/embeddings # OR OPENROUTER_API_KEY=sk-or-v1-your-key # OR OPENAI_API_KEY=sk-your-key
-
Restart Lynkr after adding embeddings config
-
Restart Cursor to re-index codebase
Symptoms: @Codebase returns irrelevant files
Solutions:
-
Upgrade to better embedding model:
# Ollama: Use larger model ollama pull mxbai-embed-large OLLAMA_EMBEDDINGS_MODEL=mxbai-embed-large # OpenRouter: Use code-specialized model OPENROUTER_EMBEDDINGS_MODEL=voyage/voyage-code-2
-
Switch to cloud embeddings:
- Local models (Ollama/llama.cpp): Good quality
- Cloud models (OpenRouter/OpenAI): Excellent quality
-
This may be a Cursor indexing issue:
- Close and reopen workspace in Cursor
- Wait for Cursor to re-index
Symptoms: Error: model "nomic-embed-text" not found
Solutions:
# List available models
ollama list
# Pull the model
ollama pull nomic-embed-text
# Verify it's available
ollama list
# Should show: nomic-embed-text ...Symptoms: ECONNREFUSED when accessing llama.cpp endpoint
Solutions:
-
Verify llama-server is running:
lsof -i :8080 # Should show llama-server process -
Start llama-server with embedding model:
./llama-server -m nomic-embed-text-v1.5.Q4_K_M.gguf --port 8080 --embedding
-
Test endpoint:
curl http://localhost:8080/health # Should return: {"status":"ok"}
Symptoms: Too many requests error (429)
Solutions:
-
Switch to local embeddings:
# Ollama (no rate limits, 100% FREE) OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
-
Use OpenRouter (pooled rate limits):
OPENROUTER_API_KEY=sk-or-v1-your-key
- Cursor Integration - Full Cursor IDE setup guide
- Provider Configuration - Configure all providers
- Installation Guide - Install Lynkr
- Troubleshooting - More troubleshooting tips
- GitHub Discussions - Community Q&A
- GitHub Issues - Report bugs
- FAQ - Frequently asked questions