Version: 1.0 Status: Production Ready Last Updated: 2026-02-05
This guide covers the vector search capabilities in skill-split, including semantic similarity search, hybrid ranking, and cost management.
Vector search enables you to find relevant sections based on semantic meaning rather than exact keyword matching. Combined with traditional text search through hybrid ranking, it provides superior relevance.
Set environment variables:
export ENABLE_EMBEDDINGS=true
export OPENAI_API_KEY=sk-... # Your OpenAI API key
export SUPABASE_URL=https://... # Your Supabase URL
export SUPABASE_KEY=... # Your Supabase anon keyRun the batch migration script to generate embeddings for all existing sections:
python3 scripts/generate_embeddings.pyThis will:
- Count total sections requiring embeddings
- Generate embeddings in batches of 100
- Track progress and estimated cost
- Update embedding metadata in the database
- Resume gracefully if interrupted
Output example:
============================================================
EMBEDDING MIGRATION - Generating embeddings for all sections
============================================================
📊 Total sections to process: 19207
📊 Starting from offset: 0
📦 Processing batch: 0-100 / 19207
✓ Embedded section 1 (156 tokens)
✓ Embedded section 2 (234 tokens)
...
💾 Stored 100 embeddings in this batch
============================================================
MIGRATION SUMMARY
============================================================
✓ Total sections processed: 19207
✓ Newly embedded: 19207
✓ Already embedded: 0
✗ Failed: 0
📊 Total tokens used: 2,040,821
💰 Estimated cost: $0.0408
⏱ Duration: 0:45:32
============================================================
Use the CLI to perform semantic searches:
# Basic semantic search
./skill_split.py search-semantic "authentication patterns"
# Adjust vector vs. text weight (0.0 = pure text, 1.0 = pure vector)
./skill_split.py search-semantic "browser automation" --vector-weight 0.8 --limit 15
# Pure vector search (ignores text matching)
./skill_split.py search-semantic "react hooks" --vector-weight 1.0- Embedding Generation: Text content → 1536-dimensional vector via OpenAI API
- Storage: Vectors stored in Supabase with pgvector extension
- Query Processing: Query text → embedding → similarity search
- Ranking: Vector similarity + text relevance → hybrid score
- Model: OpenAI text-embedding-3-small
- Dimensions: 1536
- Training Data: Natural text from Claude Code skills and tools
- Cost: $0.02 per 1M tokens
Vector similarity uses cosine distance, which measures the angle between embedding vectors:
Similarity = 1 - (cosine_distance)
Range: [0, 1] where 1 = identical, 0 = completely different
Hybrid search combines vector similarity with keyword-based text relevance:
Hybrid Score = (vector_weight × vector_similarity) +
((1 - vector_weight) × text_relevance)
Default: vector_weight = 0.7 (70% semantic, 30% keyword)
Search sections using semantic similarity and hybrid ranking.
./skill_split.py search-semantic [OPTIONS] QUERYOptions:
--limit INT: Maximum results to return (default: 10)--vector-weight FLOAT: Weight for vector similarity (0.0-1.0, default: 0.7)--db STR: Database path or "supabase" (default: supabase)
Examples:
# Find authentication-related sections
./skill_split.py search-semantic "how to implement JWT tokens" --limit 20
# Pure semantic search (ignores keywords)
./skill_split.py search-semantic "callbacks and promises" --vector-weight 1.0
# Favor text matching for specific terms
./skill_split.py search-semantic "Docker container configuration" --vector-weight 0.2Output:
[0.95] Authentication with JWT - OAuth patterns (ID: 1234)
[0.91] Token-based auth middleware (ID: 1235)
[0.87] Session management strategies (ID: 1236)
Best for:
- Semantic/conceptual searches ("coding best practices")
- Vague or natural language queries ("how to debug errors")
- Cross-domain searches ("apply auth patterns to form validation")
Example:
./skill_split.py search-semantic "what's a better way to handle state?" --vector-weight 1.0Best for:
- Mixed intent searches
- When you want both semantics AND keywords
Example:
./skill_split.py search-semantic "React hooks and custom implementations" --vector-weight 0.5Best for:
- Precise terminology ("useCallback hook")
- Acronyms ("JWT", "CORS", "REST")
- Product/tool names ("Next.js", "TypeScript")
Example:
./skill_split.py search-semantic "next.js middleware" --vector-weight 0.0Scenario: 19,207 sections at ~100 tokens average
Tokens: 19,207 × 100 = 1,920,700 tokens
Cost: (1,920,700 / 1,000,000) × $0.02 = $0.0384
Estimate: ~$0.04 for initial embedding
Scenario: 50 new sections per month at ~100 tokens average
Tokens: 50 × 100 = 5,000 tokens
Cost: (5,000 / 1,000,000) × $0.02 = $0.0001
Estimate: ~$0.0001 per month for new content
Cost per search: ~$0.0001 per query (embedding the search query)
With 100 searches per month:
100 × $0.0001 = $0.01 per month
After initial setup:
New sections: $0.0001
Queries: $0.01
Total: ~$0.011 per month ($0.13 per year)
Typical latencies on production database (19,207 sections):
Vector search only: 25-50ms
Text search only: 10-30ms
Hybrid search: 40-80ms
Cache hit (embedding): 20-40ms faster
Expected queries per second (single connection):
Pure vector: 20-40 qps
Pure text: 33-100 qps
Hybrid: 12-25 qps
Vector storage requirements:
Per section: 1536 floats × 4 bytes = 6,144 bytes (~6 KB)
Total (19,207 sections): ~118 MB
Supabase overhead: ~50% = ~177 MB
Vector embeddings are cached after generation to avoid redundant API calls:
# First query - generates embedding (costs money)
results = hybrid_search.hybrid_search("authentication") # ~$0.0001
# Second query with same text - uses cache (free)
results = hybrid_search.hybrid_search("authentication") # $0.0000Cache Stats (from metrics):
metrics = hybrid_search.get_metrics()
print(f"Cache hit rate: {metrics['cache_hit_rate']:.1%}")
print(f"Cache hits: {metrics['embedding_cache_hits']}")
print(f"Cache misses: {metrics['embedding_cache_misses']}")- Supabase project with PostgreSQL
- pgvector extension enabled
- Embeddings table created
-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Embeddings table
CREATE TABLE section_embeddings (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
section_id INTEGER NOT NULL REFERENCES sections(id) ON DELETE CASCADE,
embedding VECTOR(1536),
model_name TEXT NOT NULL,
created_at TIMESTAMP DEFAULT NOW(),
UNIQUE(section_id, model_name)
);
-- Index for fast search
CREATE INDEX section_embeddings_vector_idx
ON section_embeddings
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);Run PostgreSQL migrations to enable vector search:
# Enable pgvector extension
psql -d your_database -f migrations/enable_pgvector.sql
# Create embeddings table
psql -d your_database -f migrations/create_embeddings_table.sql
# Add embedding metadata tracking
psql -d your_database -f migrations/add_embedding_metadata.sql
# Optimize vector search performance
psql -d your_database -f migrations/optimize_vector_search.sqlOr apply via Supabase SQL Editor:
- Navigate to SQL Editor in Supabase dashboard
- Create a new query
- Copy content from
migrations/optimize_vector_search.sql - Run the query
from core.embedding_service import EmbeddingService
from core.hybrid_search import HybridSearch
from core.supabase_store import SupabaseStore
from core.query import QueryAPI
# Initialize services
embedding_service = EmbeddingService("sk-...")
supabase_store = SupabaseStore(url, key)
query_api = QueryAPI(database_store)
hybrid_search = HybridSearch(embedding_service, supabase_store, query_api)
# Perform hybrid search
results = hybrid_search.hybrid_search(
"authentication patterns",
limit=10,
vector_weight=0.7
)
# results: List[Tuple[section_id, score]]
for section_id, score in results:
section = query_api.get_section(section_id)
print(f"[{score:.2f}] {section.title}")embedding = embedding_service.generate_embedding("your query")
results = hybrid_search.vector_search(embedding, limit=10)metrics = hybrid_search.get_metrics()
print(f"Average latency: {metrics['average_latency_ms']:.1f}ms")
print(f"Total searches: {metrics['total_searches']}")
print(f"Cache hit rate: {metrics['cache_hit_rate']:.1%}")
print(f"Avg embedding time: {metrics['average_embedding_time_ms']:.1f}ms")Enable the extension:
CREATE EXTENSION IF NOT EXISTS vector;Apply the embeddings table migration:
python3 scripts/generate_embeddings.pyReduce batch size or add delays between API calls:
# In scripts/generate_embeddings.py
time.sleep(2) # 2-second delay between batchesThis is expected - 19K sections at ~0.5s per batch takes ~45 minutes.
Run in background:
nohup python3 scripts/generate_embeddings.py > embeddings.log 2>&1 &Resume from checkpoint if interrupted:
migration = EmbeddingMigration(supabase_store, embedding_service)
stats = migration.run(resume_from=5000) # Start from section 5000Try adjusting vector weight or query formulation:
# More semantic
./skill_split.py search-semantic "related concept" --vector-weight 1.0
# More keyword-focused
./skill_split.py search-semantic "specific terms" --vector-weight 0.0Adjust the ivfflat.probes parameter for accuracy/speed tradeoff:
-- Faster but less accurate (fewer candidates examined)
SET ivfflat.probes = 5;
-- Slower but more accurate (more candidates examined)
SET ivfflat.probes = 20;
-- Default (good balance)
SET ivfflat.probes = 10;For generating embeddings, tune batch size in scripts/generate_embeddings.py:
batch_size = 100 # Smaller = more API calls, faster failures recovery
# Larger = fewer API calls, but longer to retry on failureMonitor index bloat and rebuild if needed:
-- Check index size
SELECT pg_size_pretty(pg_relation_size('section_embeddings_vector_idx'));
-- Rebuild index (optional, can be slow)
REINDEX INDEX CONCURRENTLY section_embeddings_vector_idx;- OpenAI API key configured
- Supabase pgvector extension enabled
-
section_embeddingstable created - Vector indexes applied
-
embedding_metadatatable created - Batch embedding script runs successfully
- Vector search queries working
- Monitoring/cost tracking in place
- Team educated on vector weight tuning
For issues or questions:
- Check Troubleshooting section
- Review performance benchmarks:
python3 benchmarks/vector_search_benchmark.py - Check metrics:
hybrid_search.get_metrics() - Monitor logs:
tail -f embeddings.log
Last Updated: 2026-02-05 Production Ready: Yes Cost Tracking: Recommended (see cost analysis section)