| title | PostgreSQL Setup |
|---|---|
| layout | default |
| parent | Configuration |
| nav_order | 1 |
This guide covers setting up PostgreSQL with pgvector for scalable vector similarity search in codetect.
- Why PostgreSQL + pgvector?
- Performance Comparison
- Quick Start with Docker
- Manual Installation
- Configuration
- Migration from SQLite
- Troubleshooting
By default, codetect uses SQLite with brute-force vector search. This works well for small to medium codebases (< 10,000 files), but has performance limitations at scale.
PostgreSQL + pgvector provides:
- Scalability: Efficient HNSW indexing for 100K+ vectors
- Performance: 60x faster search on large datasets (see benchmarks below)
- Persistence: Separate database server with better reliability
- Flexibility: Advanced querying and management capabilities
When to use PostgreSQL:
- Large codebases (> 10,000 files)
- Multiple projects sharing embeddings
- Production deployments requiring high performance
- Teams needing centralized search infrastructure
When SQLite is sufficient:
- Small to medium projects (< 10,000 files)
- Personal use / single developer
- Quick prototyping
- No embedding server available
Real-world benchmarks on Apple M3 Pro (2024):
| Dataset Size | SQLite (brute-force) | PostgreSQL (pgvector) | Speedup |
|---|---|---|---|
| 100 vectors | 77 μs | 603 μs | 0.13x (slower) |
| 1,000 vectors | 1.19 ms | 745 μs | 1.6x faster |
| 10,000 vectors | 58.1 ms | 963 μs | 60x faster |
Key Insights:
- For small datasets (< 1,000 vectors), brute-force SQLite is faster due to lower overhead
- pgvector with HNSW indexing scales logarithmically: O(log n) vs O(n) for brute-force
- At 10,000+ vectors, PostgreSQL provides massive performance improvements
- pgvector enables sub-millisecond search even on 100K+ vector datasets
The easiest way to get PostgreSQL + pgvector running:
cd /path/to/codetect
docker-compose up -dThis starts PostgreSQL 16 with pgvector extension pre-installed.
Container details:
- Image:
pgvector/pgvector:pg16 - Port:
5432(configurable viaPOSTGRES_PORTenv var) - User:
codetect - Password:
codetect - Database:
codetect - Volume:
codetect-pgdata(persistent storage)
Set environment variables:
export CODETECT_DB_TYPE=postgres
export CODETECT_DB_DSN="postgres://codetect:codetect@localhost:5432/codetect?sslmode=disable"Add to your ~/.bashrc, ~/.zshrc, or shell profile for persistence.
cd /path/to/your/project
codetect index # Index symbols
codetect embed # Generate embeddings (stored in PostgreSQL)# Check PostgreSQL connection
docker-compose ps
# View embeddings
docker-compose exec postgres psql -U codetect -c "SELECT COUNT(*) FROM embeddings;"That's it! codetect will now use PostgreSQL for all vector operations.
If you prefer to install PostgreSQL manually without Docker:
# Install PostgreSQL
brew install postgresql@16
# Start PostgreSQL
brew services start postgresql@16
# Install pgvector
brew install pgvector
# Create database and enable extension
createdb codetect
psql codetect -c "CREATE EXTENSION vector;"# Add PostgreSQL repository
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt-get update
# Install PostgreSQL 16
sudo apt-get install -y postgresql-16 postgresql-client-16
# Install pgvector
sudo apt-get install -y postgresql-16-pgvector
# Start PostgreSQL
sudo systemctl start postgresql
sudo systemctl enable postgresql
# Create database and user
sudo -u postgres psql -c "CREATE USER codetect WITH PASSWORD 'codetect';"
sudo -u postgres psql -c "CREATE DATABASE codetect OWNER codetect;"
sudo -u postgres psql codetect -c "CREATE EXTENSION vector;"# Check pgvector version
psql codetect -c "SELECT extname, extversion FROM pg_extension WHERE extname = 'vector';"
# Expected output:
# extname | extversion
# ---------+------------
# vector | 0.7.0codetect supports flexible PostgreSQL configuration via environment variables.
| Variable | Description | Default | Example |
|---|---|---|---|
CODETECT_DB_TYPE |
Database type (sqlite or postgres) |
sqlite |
postgres |
CODETECT_DB_DSN |
PostgreSQL connection string | - | postgres://user:pass@host:5432/dbname |
CODETECT_DB_PATH |
SQLite database path (if using SQLite) | .codetect/symbols.db |
/custom/path.db |
CODETECT_VECTOR_DIMENSIONS |
Embedding vector size | 768 |
1536 |
PostgreSQL connection string format:
postgres://[user]:[password]@[host]:[port]/[database]?[options]
Common examples:
# Local PostgreSQL
export CODETECT_DB_DSN="postgres://codetect:codetect@localhost:5432/codetect?sslmode=disable"
# Docker (custom port)
export CODETECT_DB_DSN="postgres://codetect:codetect@localhost:5465/codetect?sslmode=disable"
# Remote PostgreSQL with SSL
export CODETECT_DB_DSN="postgres://user:pass@db.example.com:5432/codetect?sslmode=require"
# Connection pooling
export CODETECT_DB_DSN="postgres://user:pass@localhost:5432/codetect?pool_max_conns=20"codetect automatically detects PostgreSQL from the DSN:
# Explicitly set type
export CODETECT_DB_TYPE=postgres
export CODETECT_DB_DSN="postgres://..."
# Auto-detect from DSN (type not required)
export CODETECT_DB_DSN="postgres://..." # Type inferred automaticallyDifferent embedding models use different dimensions:
| Model | Dimensions | Setting | Notes |
|---|---|---|---|
bge-m3 (recommended) |
1024 | CODETECT_VECTOR_DIMENSIONS=1024 |
+47% retrieval vs default ✅ |
snowflake-arctic-embed-l-v2.0 |
1024 | CODETECT_VECTOR_DIMENSIONS=1024 |
+57% retrieval vs default ✅ |
jina-embeddings-v3 |
1024 | CODETECT_VECTOR_DIMENSIONS=1024 |
+50% retrieval vs default ✅ |
nomic-embed-text (default) |
768 | CODETECT_VECTOR_DIMENSIONS=768 |
Smaller, lower quality |
OpenAI text-embedding-3-small |
1536 | CODETECT_VECTOR_DIMENSIONS=1536 |
API-only |
OpenAI text-embedding-3-large |
3072 | CODETECT_VECTOR_DIMENSIONS=3072 |
API-only |
Recommendation: Use bge-m3 (1024 dimensions) for best code search quality. See Embedding Model Comparison for detailed analysis.
Important: Set dimensions before running codetect embed. Changing dimensions requires re-embedding.
Add to your shell profile for persistence:
Bash (~/.bashrc):
export CODETECT_DB_TYPE=postgres
export CODETECT_DB_DSN="postgres://codetect:codetect@localhost:5432/codetect?sslmode=disable"Zsh (~/.zshrc):
export CODETECT_DB_TYPE=postgres
export CODETECT_DB_DSN="postgres://codetect:codetect@localhost:5432/codetect?sslmode=disable"Fish (~/.config/fish/config.fish):
set -x CODETECT_DB_TYPE postgres
set -x CODETECT_DB_DSN "postgres://codetect:codetect@localhost:5432/codetect?sslmode=disable"To migrate existing embeddings from SQLite to PostgreSQL, use the migration script:
# Start PostgreSQL
docker-compose up -d
# Set PostgreSQL configuration
export CODETECT_DB_TYPE=postgres
export CODETECT_DB_DSN="postgres://codetect:codetect@localhost:5432/codetect?sslmode=disable"
# Run migration
codetect migrate-to-postgresThe migration script will:
- Read embeddings from SQLite (
.codetect/symbols.db) - Create embeddings table in PostgreSQL
- Copy all vectors with metadata
- Create pgvector index for fast search
- Verify data integrity
Migration Performance:
- ~1,000 vectors/sec on typical hardware
- 10,000 vectors: ~10 seconds
- 100,000 vectors: ~2 minutes
After migration:
- SQLite database is preserved (not deleted)
- You can switch back by removing PostgreSQL env vars
- No need to re-run
codetect embed
See the Migration Script section below for advanced options.
Error: connection refused
# Check if PostgreSQL is running
docker-compose ps
# If not running, start it
docker-compose up -d
# Check logs
docker-compose logs postgresError: password authentication failed
# Verify credentials match docker-compose.yml
docker-compose exec postgres psql -U codetect -c "SELECT version();"
# Reset password if needed
docker-compose exec postgres psql -U postgres -c "ALTER USER codetect PASSWORD 'codetect';"Error: extension "vector" does not exist
# Verify pgvector is installed
docker-compose exec postgres psql -U codetect -c "SELECT * FROM pg_available_extensions WHERE name = 'vector';"
# Create extension
docker-compose exec postgres psql -U codetect -c "CREATE EXTENSION vector;"Error: could not open extension control file
Your PostgreSQL image doesn't include pgvector. Use the official pgvector image:
# docker-compose.yml
services:
postgres:
image: pgvector/pgvector:pg16 # Use this imageSlow searches after migration:
# Verify index exists
docker-compose exec postgres psql -U codetect -c "\d embeddings"
# Look for: "idx_embeddings_embedding" hnsw (embedding vector_cosine_ops)
# If missing, create index
docker-compose exec postgres psql -U codetect -c "
CREATE INDEX idx_embeddings_embedding
ON embeddings
USING hnsw (embedding vector_cosine_ops);
"Index creation taking too long:
For large datasets (100K+ vectors), index creation can take 10-30 minutes. Monitor progress:
# Check PostgreSQL activity
docker-compose exec postgres psql -U codetect -c "
SELECT pid, state, query
FROM pg_stat_activity
WHERE state != 'idle';
"Empty search results:
# Check embedding count
docker-compose exec postgres psql -U codetect -c "SELECT COUNT(*) FROM embeddings;"
# If 0, re-run embedding
codetect embedDimension mismatch error:
# Check actual dimensions in database
docker-compose exec postgres psql -U codetect -c "
SELECT vector_dims(embedding)
FROM embeddings
LIMIT 1;
"
# Must match CODETECT_VECTOR_DIMENSIONS
# If mismatch, drop table and re-embed
docker-compose exec postgres psql -U codetect -c "DROP TABLE IF EXISTS embeddings;"
codetect embedPort conflict:
# Change port in docker-compose
export POSTGRES_PORT=5465
docker-compose up -d
# Update DSN
export CODETECT_DB_DSN="postgres://codetect:codetect@localhost:5465/codetect?sslmode=disable"Volume persistence:
# List volumes
docker volume ls | grep codetect
# Backup volume
docker run --rm -v codetect-pgdata:/data -v $(pwd):/backup ubuntu tar czf /backup/pgdata-backup.tar.gz /data
# Restore volume
docker run --rm -v codetect-pgdata:/data -v $(pwd):/backup ubuntu tar xzf /backup/pgdata-backup.tar.gz -C /Clean restart:
# Stop and remove container
docker-compose down
# Remove volume (WARNING: deletes all data)
docker volume rm codetect-pgdata
# Fresh start
docker-compose up -d
codetect embed # Re-embedIf issues persist:
- Check PostgreSQL logs:
docker-compose logs postgres - Verify configuration:
echo $CODETECT_DB_DSN - Test connection:
docker-compose exec postgres psql -U codetect - Open an issue: https://github.com/brian-lai/codetect/issues
Include:
- PostgreSQL version:
docker-compose exec postgres psql -U postgres -c "SELECT version();" - pgvector version:
docker-compose exec postgres psql -U codetect -c "SELECT extversion FROM pg_extension WHERE extname = 'vector';" - Error messages from logs
- Operating system and architecture
Mount custom postgresql.conf:
# docker-compose.yml
services:
postgres:
volumes:
- ./postgresql.conf:/etc/postgresql/postgresql.conf
command: postgres -c config_file=/etc/postgresql/postgresql.confFor high-concurrency scenarios, use PgBouncer:
# docker-compose.yml
services:
pgbouncer:
image: pgbouncer/pgbouncer
environment:
DATABASES_HOST: postgres
DATABASES_PORT: 5432
DATABASES_DBNAME: codetect
PGBOUNCER_AUTH_TYPE: plain
ports:
- "6432:5432"# Connect through PgBouncer
export CODETECT_DB_DSN="postgres://codetect:codetect@localhost:6432/codetect"Share PostgreSQL across multiple projects using different schemas:
# Project 1
export CODETECT_DB_DSN="postgres://codetect:codetect@localhost:5432/codetect?search_path=project1"
# Project 2
export CODETECT_DB_DSN="postgres://codetect:codetect@localhost:5432/codetect?search_path=project2"Or use separate databases:
# Create additional databases
docker-compose exec postgres psql -U postgres -c "CREATE DATABASE project2 OWNER codetect;"
docker-compose exec postgres psql -U codetect -d project2 -c "CREATE EXTENSION vector;"- Migration Script Documentation - Detailed migration guide
- Configuration Reference - All environment variables
- Performance Tuning - Optimization tips
- Backup and Recovery - Data protection strategies