Embeddings Configuration Guide

Complete guide to configuring embeddings for Cursor @Codebase semantic search and code understanding.

Overview

Embeddings enable semantic code search in Cursor IDE's @Codebase feature. Instead of keyword matching, embeddings understand the meaning of your code, allowing you to search for functionality, concepts, or patterns.

What Are Embeddings?

Embeddings convert text (code, comments, documentation) into high-dimensional vectors that capture semantic meaning. Similar code gets similar vectors, enabling:

@Codebase Search - Find relevant code by describing what you need
Automatic Context - Cursor automatically includes relevant files in conversations
Find Similar Code - Discover code patterns and examples in your codebase

Why Use Embeddings?

Without embeddings:

❌ Keyword-only search (grep, exact string matching)
❌ No semantic understanding
❌ Can't find code by describing its purpose

With embeddings:

✅ Semantic search ("find authentication logic")
✅ Concept-based discovery ("show me error handling patterns")
✅ Similar code detection ("code like this function")

Supported Embedding Providers

Lynkr supports 4 embedding providers with different tradeoffs:

Provider	Cost	Privacy	Setup	Quality	Best For
Ollama	FREE	🔒 100% Local	Easy	Good	Privacy, offline, no costs
llama.cpp	FREE	🔒 100% Local	Medium	Good	Performance, GPU, GGUF models
OpenRouter	$0.01-0.10/mo	☁️ Cloud	Easy	Excellent	Simplicity, quality, one key
OpenAI	$0.01-0.10/mo	☁️ Cloud	Easy	Excellent	Best quality, direct access

Option 1: Ollama (Recommended for Privacy)

Overview

Cost: 100% FREE 🔒
Privacy: All data stays on your machine
Setup: Easy (5 minutes)
Quality: Good (768-1024 dimensions)
Best for: Privacy-focused teams, offline work, zero cloud dependencies

Installation & Setup

# 1. Install Ollama (if not already installed)
brew install ollama  # macOS
# Or download from: https://ollama.ai/download

# 2. Start Ollama service
ollama serve

# 3. Pull embedding model (in separate terminal)
ollama pull nomic-embed-text

# 4. Verify model is available
ollama list
# Should show: nomic-embed-text  ...

Configuration

Add to .env:

# Ollama embeddings configuration
OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
OLLAMA_EMBEDDINGS_ENDPOINT=http://localhost:11434/api/embeddings

Available Models

nomic-embed-text (Recommended) ⭐

ollama pull nomic-embed-text

Dimensions: 768
Parameters: 137M
Quality: Excellent for code search
Speed: Fast (~50ms per query)
Best for: General purpose, best all-around choice

mxbai-embed-large (Higher Quality)

ollama pull mxbai-embed-large

Dimensions: 1024
Parameters: 335M
Quality: Higher quality than nomic-embed-text
Speed: Slower (~100ms per query)
Best for: Large codebases where quality matters most

all-minilm (Fastest)

ollama pull all-minilm

Dimensions: 384
Parameters: 23M
Quality: Good for simple searches
Speed: Very fast (~20ms per query)
Best for: Small codebases, speed-critical applications

Testing

# Test embedding generation
curl http://localhost:11434/api/embeddings \
  -d '{"model":"nomic-embed-text","prompt":"function to sort array"}'

# Should return JSON with embedding vector

Benefits

✅ 100% FREE - No API costs ever
✅ 100% Private - All data stays on your machine
✅ Offline - Works without internet
✅ Easy Setup - Install → Pull model → Configure
✅ Good Quality - Excellent for code search
✅ Multiple Models - Choose speed vs quality tradeoff

Option 2: llama.cpp (Maximum Performance)

Overview

Cost: 100% FREE 🔒
Privacy: All data stays on your machine
Setup: Medium (15 minutes, requires compilation)
Quality: Good (same as Ollama models, GGUF format)
Best for: Performance optimization, GPU acceleration, GGUF models

Installation & Setup

# 1. Clone and build llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp

# Build with GPU support (optional):
# For CUDA (NVIDIA): make LLAMA_CUDA=1
# For Metal (Apple Silicon): make LLAMA_METAL=1
# For CPU only: make
make

# 2. Download embedding model (GGUF format)
# Example: nomic-embed-text GGUF
wget https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF/resolve/main/nomic-embed-text-v1.5.Q4_K_M.gguf

# 3. Start llama-server with embedding model
./llama-server \
  -m nomic-embed-text-v1.5.Q4_K_M.gguf \
  --port 8080 \
  --embedding

# 4. Verify server is running
curl http://localhost:8080/health
# Should return: {"status":"ok"}

Configuration

Add to .env:

# llama.cpp embeddings configuration
LLAMACPP_EMBEDDINGS_ENDPOINT=http://localhost:8080/embeddings

Available Models (GGUF)

nomic-embed-text-v1.5 (Recommended) ⭐

File: nomic-embed-text-v1.5.Q4_K_M.gguf
Download: https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF
Dimensions: 768
Size: ~80MB
Quality: Excellent for code
Best for: Best all-around choice

all-MiniLM-L6-v2 (Fastest)

File: all-MiniLM-L6-v2.Q4_K_M.gguf
Download: https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2-GGUF
Dimensions: 384
Size: ~25MB
Quality: Good for simple searches
Best for: Speed-critical applications

bge-large-en-v1.5 (Highest Quality)

File: bge-large-en-v1.5.Q4_K_M.gguf
Download: https://huggingface.co/BAAI/bge-large-en-v1.5-GGUF
Dimensions: 1024
Size: ~350MB
Quality: Best quality for embeddings
Best for: Large codebases, quality-critical applications

GPU Support

llama.cpp supports multiple GPU backends for faster embedding generation:

NVIDIA CUDA:

make LLAMA_CUDA=1
./llama-server -m model.gguf --embedding --n-gpu-layers 32

Apple Silicon Metal:

make LLAMA_METAL=1
./llama-server -m model.gguf --embedding --n-gpu-layers 32

AMD ROCm:

make LLAMA_ROCM=1
./llama-server -m model.gguf --embedding --n-gpu-layers 32

Vulkan (Universal):

make LLAMA_VULKAN=1
./llama-server -m model.gguf --embedding --n-gpu-layers 32

Testing

# Test embedding generation
curl http://localhost:8080/embeddings \
  -H "Content-Type: application/json" \
  -d '{"content":"function to sort array"}'

# Should return JSON with embedding vector

Benefits

✅ 100% FREE - No API costs
✅ 100% Private - All data stays local
✅ Faster than Ollama - Optimized C++ implementation
✅ GPU Acceleration - CUDA, Metal, ROCm, Vulkan
✅ Lower Memory - Quantization options (Q4, Q5, Q8)
✅ Any GGUF Model - Use any embedding model from HuggingFace

llama.cpp vs Ollama

Feature	Ollama	llama.cpp
Setup	Easy (app)	Manual (compile)
Model Format	Ollama-specific	Any GGUF model
Performance	Good	Better (optimized C++)
GPU Support	Yes	Yes (more options)
Memory Usage	Higher	Lower (more quantization options)
Flexibility	Limited models	Any GGUF from HuggingFace

Option 3: OpenRouter (Simplest Cloud)

Overview

Cost: ~$0.01-0.10/month (typical usage)
Privacy: Cloud-based
Setup: Very easy (2 minutes)
Quality: Excellent (best-in-class models)
Best for: Simplicity, quality, one key for chat + embeddings

Configuration

Add to .env:

# OpenRouter configuration (if not already set)
OPENROUTER_API_KEY=sk-or-v1-your-key-here

# Embeddings model (optional, defaults to text-embedding-ada-002)
OPENROUTER_EMBEDDINGS_MODEL=openai/text-embedding-3-small

Note: If you're already using MODEL_PROVIDER=openrouter, embeddings work automatically with the same key! No additional configuration needed.

Getting OpenRouter API Key

Visit openrouter.ai
Sign in with GitHub, Google, or email
Go to openrouter.ai/keys
Create a new API key
Add credits (pay-as-you-go, no subscription)

Available Models

openai/text-embedding-3-small (Recommended) ⭐

OPENROUTER_EMBEDDINGS_MODEL=openai/text-embedding-3-small

Dimensions: 1536
Cost: $0.02 per 1M tokens (80% cheaper than ada-002!)
Quality: Excellent
Best for: Best balance of quality and cost

openai/text-embedding-ada-002 (Standard)

OPENROUTER_EMBEDDINGS_MODEL=openai/text-embedding-ada-002

Dimensions: 1536
Cost: $0.10 per 1M tokens
Quality: Excellent (widely supported standard)
Best for: Compatibility

openai/text-embedding-3-large (Best Quality)

OPENROUTER_EMBEDDINGS_MODEL=openai/text-embedding-3-large

Dimensions: 3072
Cost: $0.13 per 1M tokens
Quality: Best quality available
Best for: Large codebases where quality matters most

voyage/voyage-code-2 (Code-Specialized)

OPENROUTER_EMBEDDINGS_MODEL=voyage/voyage-code-2

Dimensions: 1024
Cost: $0.12 per 1M tokens
Quality: Optimized specifically for code
Best for: Code search (better than general models)

voyage/voyage-2 (General Purpose)

OPENROUTER_EMBEDDINGS_MODEL=voyage/voyage-2

Dimensions: 1024
Cost: $0.12 per 1M tokens
Quality: Best for general text
Best for: Mixed code + documentation

Benefits

✅ ONE Key - Same key for chat + embeddings
✅ No Setup - Works immediately after adding key
✅ Best Quality - State-of-the-art embedding models
✅ Automatic Fallbacks - Switches providers if one is down
✅ Competitive Pricing - Often cheaper than direct providers

Option 4: OpenAI (Direct)

Overview

Cost: ~$0.01-0.10/month (typical usage)
Privacy: Cloud-based
Setup: Easy (5 minutes)
Quality: Excellent (best-in-class, direct from OpenAI)
Best for: Best quality, direct OpenAI access

Configuration

Add to .env:

# OpenAI configuration (if not already set)
OPENAI_API_KEY=sk-your-openai-api-key

# Embeddings model (optional, defaults to text-embedding-ada-002)
# Recommended: Use text-embedding-3-small for 80% cost savings
# OPENAI_EMBEDDINGS_MODEL=text-embedding-3-small

Getting OpenAI API Key

Visit platform.openai.com
Sign up or log in
Go to API Keys
Create a new API key
Add credits to your account (pay-as-you-go)

Available Models

text-embedding-3-small (Recommended) ⭐

OPENAI_EMBEDDINGS_MODEL=text-embedding-3-small

Dimensions: 1536
Cost: $0.02 per 1M tokens (80% cheaper!)
Quality: Excellent
Best for: Best balance of quality and cost

text-embedding-ada-002 (Standard)

OPENAI_EMBEDDINGS_MODEL=text-embedding-ada-002

Dimensions: 1536
Cost: $0.10 per 1M tokens
Quality: Excellent (standard, widely used)
Best for: Compatibility

text-embedding-3-large (Best Quality)

OPENAI_EMBEDDINGS_MODEL=text-embedding-3-large

Dimensions: 3072
Cost: $0.13 per 1M tokens
Quality: Best quality available
Best for: Maximum quality for large codebases

Benefits

✅ Best Quality - Direct from OpenAI, best-in-class
✅ Lowest Latency - No intermediaries
✅ Simple Setup - Just one API key
✅ Organization Support - Use org-level API keys for teams

Provider Comparison

Feature Comparison

Feature	Ollama	llama.cpp	OpenRouter	OpenAI
Cost	FREE	FREE	$0.01-0.10/mo	$0.01-0.10/mo
Privacy	🔒 Local	🔒 Local	☁️ Cloud	☁️ Cloud
Setup	Easy	Medium	Easy	Easy
Quality	Good	Good	Excellent	Excellent
Speed	Fast	Faster	Fast	Fast
Offline	✅ Yes	✅ Yes	❌ No	❌ No
GPU Support	Yes	Yes (more options)	N/A	N/A
Model Choice	Limited	Any GGUF	Many	Few
Dimensions	384-1024	384-1024	1024-3072	1536-3072

Cost Comparison (100K embeddings/month)

Provider	Model	Monthly Cost
Ollama	Any	$0 (100% FREE) 🔒
llama.cpp	Any	$0 (100% FREE) 🔒
OpenRouter	text-embedding-3-small	$0.02
OpenRouter	text-embedding-ada-002	$0.10
OpenRouter	voyage-code-2	$0.12
OpenAI	text-embedding-3-small	$0.02
OpenAI	text-embedding-ada-002	$0.10
OpenAI	text-embedding-3-large	$0.13

Embeddings Provider Override

By default, Lynkr uses the same provider as MODEL_PROVIDER for embeddings (if supported). To use a different provider for embeddings:

# Use Databricks for chat, but Ollama for embeddings (privacy + cost savings)
MODEL_PROVIDER=databricks
DATABRICKS_API_BASE=https://your-workspace.databricks.com
DATABRICKS_API_KEY=your-key

# Override embeddings provider
EMBEDDINGS_PROVIDER=ollama
OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text

Smart provider detection:

Uses same provider as chat (if embeddings supported)
Or automatically selects first available embeddings provider
Or use EMBEDDINGS_PROVIDER to force a specific provider

Recommended Configurations

1. Privacy-First (100% Local, FREE)

Best for: Sensitive codebases, offline work, zero cloud dependencies

# Chat: Ollama (local)
MODEL_PROVIDER=ollama
OLLAMA_MODEL=llama3.1:8b

# Embeddings: Ollama (local)
OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text

# Everything 100% local, 100% private, 100% FREE!

Benefits:

✅ Zero cloud dependencies
✅ All data stays on your machine
✅ Works offline
✅ 100% FREE

2. Simplest (One Key for Everything)

Best for: Easy setup, flexibility, quality

# Chat + Embeddings: OpenRouter with ONE key
MODEL_PROVIDER=openrouter
OPENROUTER_API_KEY=sk-or-v1-your-key
OPENROUTER_MODEL=anthropic/claude-3.5-sonnet

# Embeddings work automatically with same key!
# Optional: Specify model for cost savings
OPENROUTER_EMBEDDINGS_MODEL=openai/text-embedding-3-small

Benefits:

✅ ONE key for everything
✅ Best quality embeddings
✅ 100+ chat models available
✅ ~$5-10/month total cost

3. Hybrid (Best of Both Worlds)

Best for: Privacy + Quality + Cost Optimization

# Chat: Tier-based routing (set all 4 to enable)
TIER_SIMPLE=ollama:llama3.2
TIER_MEDIUM=openrouter:openai/gpt-4o-mini
TIER_COMPLEX=databricks:databricks-claude-sonnet-4-5
TIER_REASONING=databricks:databricks-claude-sonnet-4-5
FALLBACK_ENABLED=true
FALLBACK_PROVIDER=databricks
DATABRICKS_API_BASE=https://your-workspace.databricks.com
DATABRICKS_API_KEY=your-key

# Embeddings: Ollama (local, private)
OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text

# Result: Free + private embeddings, mostly free chat, cloud for complex tasks

Benefits:

✅ 70-80% of chat requests FREE (Ollama via TIER_SIMPLE)
✅ 100% private embeddings (local)
✅ Cloud quality for complex tasks
✅ Intelligent automatic tier-based routing

4. Enterprise (Best Quality)

Best for: Large teams, quality-critical applications

# Chat: Databricks (enterprise SLA)
MODEL_PROVIDER=databricks
DATABRICKS_API_BASE=https://your-workspace.databricks.com
DATABRICKS_API_KEY=your-key

# Embeddings: OpenRouter (best quality)
OPENROUTER_API_KEY=sk-or-v1-your-key
OPENROUTER_EMBEDDINGS_MODEL=voyage/voyage-code-2  # Code-specialized

Benefits:

✅ Enterprise chat (Claude 4.5)
✅ Best embedding quality (code-specialized)
✅ Separate billing/limits for chat vs embeddings
✅ Production-ready reliability

Testing & Verification

Test Embeddings Endpoint

# Test embedding generation
curl http://localhost:8081/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "input": "function to sort an array",
    "model": "text-embedding-ada-002"
  }'

# Should return JSON with embedding vector
# Example response:
# {
#   "object": "list",
#   "data": [{
#     "object": "embedding",
#     "embedding": [0.123, -0.456, 0.789, ...],  # 768-3072 dimensions
#     "index": 0
#   }],
#   "model": "text-embedding-ada-002",
#   "usage": {"prompt_tokens": 7, "total_tokens": 7}
# }

Test in Cursor

Open Cursor IDE
Open a project
Press Cmd+L (or Ctrl+L)
Type: @Codebase find authentication logic
Expected: Cursor returns relevant files

If @Codebase doesn't work:

Check embeddings endpoint: curl http://localhost:8081/v1/embeddings (should not return 501)
Restart Lynkr after adding embeddings config
Restart Cursor to re-index codebase

Troubleshooting

@Codebase Doesn't Work

Symptoms: @Codebase doesn't return results or shows error

Solutions:

Verify embeddings are configured:

curl http://localhost:8081/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"input":"test","model":"text-embedding-ada-002"}'

# Should return embeddings, not 501 error

Check embeddings provider in .env:

# Verify ONE of these is set:
OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
# OR
LLAMACPP_EMBEDDINGS_ENDPOINT=http://localhost:8080/embeddings
# OR
OPENROUTER_API_KEY=sk-or-v1-your-key
# OR
OPENAI_API_KEY=sk-your-key

Restart Lynkr after adding embeddings config
Restart Cursor to re-index codebase

Poor Search Results

Symptoms: @Codebase returns irrelevant files

Solutions:

Upgrade to better embedding model:

# Ollama: Use larger model
ollama pull mxbai-embed-large
OLLAMA_EMBEDDINGS_MODEL=mxbai-embed-large

# OpenRouter: Use code-specialized model
OPENROUTER_EMBEDDINGS_MODEL=voyage/voyage-code-2

Switch to cloud embeddings:
- Local models (Ollama/llama.cpp): Good quality
- Cloud models (OpenRouter/OpenAI): Excellent quality
This may be a Cursor indexing issue:
- Close and reopen workspace in Cursor
- Wait for Cursor to re-index

Ollama Model Not Found

Symptoms: Error: model "nomic-embed-text" not found

Solutions:

# List available models
ollama list

# Pull the model
ollama pull nomic-embed-text

# Verify it's available
ollama list
# Should show: nomic-embed-text  ...

llama.cpp Connection Refused

Symptoms: ECONNREFUSED when accessing llama.cpp endpoint

Solutions:

Verify llama-server is running:

lsof -i :8080
# Should show llama-server process

Start llama-server with embedding model:

./llama-server -m nomic-embed-text-v1.5.Q4_K_M.gguf --port 8080 --embedding

Test endpoint:

curl http://localhost:8080/health
# Should return: {"status":"ok"}

Rate Limiting (Cloud Providers)

Symptoms: Too many requests error (429)

Solutions:

Switch to local embeddings:

# Ollama (no rate limits, 100% FREE)
OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text

Use OpenRouter (pooled rate limits):
```
OPENROUTER_API_KEY=sk-or-v1-your-key
```

Next Steps

Cursor Integration - Full Cursor IDE setup guide
Provider Configuration - Configure all providers
Installation Guide - Install Lynkr
Troubleshooting - More troubleshooting tips

Getting Help

GitHub Discussions - Community Q&A
GitHub Issues - Report bugs
FAQ - Frequently asked questions

Uh oh!

FilesExpand file tree

embeddings.md

Latest commit

History

embeddings.md

File metadata and controls

Embeddings Configuration Guide

Overview

What Are Embeddings?

Why Use Embeddings?

Supported Embedding Providers

Option 1: Ollama (Recommended for Privacy)

Overview

Installation & Setup

Configuration

Available Models

Testing

Benefits

Option 2: llama.cpp (Maximum Performance)

Overview

Installation & Setup

Configuration

Available Models (GGUF)

GPU Support

Testing

Benefits

llama.cpp vs Ollama

Option 3: OpenRouter (Simplest Cloud)

Overview

Configuration

Getting OpenRouter API Key

Available Models

Benefits

Option 4: OpenAI (Direct)

Overview

Configuration

Getting OpenAI API Key

Available Models

Benefits

Provider Comparison

Feature Comparison

Cost Comparison (100K embeddings/month)

Embeddings Provider Override

Recommended Configurations

1. Privacy-First (100% Local, FREE)

2. Simplest (One Key for Everything)

3. Hybrid (Best of Both Worlds)

4. Enterprise (Best Quality)

Testing & Verification

Test Embeddings Endpoint

Test in Cursor

Troubleshooting

@Codebase Doesn't Work

Poor Search Results

Ollama Model Not Found

llama.cpp Connection Refused

Rate Limiting (Cloud Providers)

Next Steps

Getting Help