MeetMemo uses PostgreSQL 16 for data persistence.
- Database: PostgreSQL 16
- Driver: asyncpg (async PostgreSQL driver for Python)
- Connection Pool: 5-20 connections
- Schema Management: SQL migration files in
backend/migrations/
Stores metadata for transcription jobs.
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
UUID | PRIMARY KEY | Unique job identifier |
file_name |
VARCHAR(255) | NOT NULL | Original filename |
file_path |
TEXT | NOT NULL | Path to audio file in Docker volume |
file_hash |
VARCHAR(64) | UNIQUE | SHA256 hash for deduplication |
status |
VARCHAR(50) | NOT NULL, DEFAULT 'pending' | Job status (pending, processing, completed, failed) |
error_message |
TEXT | NULL | Error details if job failed |
workflow_state |
VARCHAR(50) | NULL | Current workflow step |
created_at |
TIMESTAMP | NOT NULL, DEFAULT NOW() | Creation timestamp |
updated_at |
TIMESTAMP | NOT NULL, DEFAULT NOW() | Last update timestamp |
Indexes:
- Primary key on
id - Unique index on
file_hash - Index on
created_atfor cleanup queries - Index on
statusfor filtering
Example:
SELECT id, file_name, status, workflow_state
FROM jobs
WHERE status = 'completed'
ORDER BY created_at DESC
LIMIT 10;Tracks asynchronous export generation jobs.
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
UUID | PRIMARY KEY | Unique export job identifier |
job_id |
UUID | NOT NULL, FOREIGN KEY → jobs(id) | Reference to parent job |
export_type |
VARCHAR(50) | NOT NULL | Type of export (pdf_summary, markdown_transcript, etc.) |
status |
VARCHAR(50) | NOT NULL, DEFAULT 'pending' | Export status (pending, processing, completed, failed) |
progress |
INTEGER | DEFAULT 0 | Progress percentage (0-100) |
error_message |
TEXT | NULL | Error details if export failed |
file_path |
TEXT | NULL | Path to generated export file |
created_at |
TIMESTAMP | NOT NULL, DEFAULT NOW() | Creation timestamp |
updated_at |
TIMESTAMP | NOT NULL, DEFAULT NOW() | Last update timestamp |
Foreign Key:
job_idreferencesjobs(id)ON DELETE CASCADE
Indexes:
- Primary key on
id - Index on
job_idfor lookups - Index on
created_atfor cleanup queries
Export Types:
pdf_summary- PDF version of summarymarkdown_summary- Markdown version of summarypdf_transcript- PDF version of transcriptmarkdown_transcript- Markdown version of transcript
Example:
SELECT e.id, e.export_type, e.status, e.progress, j.file_name
FROM export_jobs e
JOIN jobs j ON e.job_id = j.id
WHERE e.status = 'completed'
ORDER BY e.created_at DESC;Configured in backend/config.py:
class Settings(BaseSettings):
database_url: str # postgresql://user:pass@host:port/dbname
db_pool_min_size: int = 5
db_pool_max_size: int = 20Default connection string (Docker):
postgresql://meetmemo:changeme@postgres:5432/meetmemo
Located in backend/migrations/:
001_init_schema.sql- Initial schema creation
Migrations run automatically on container startup via PostgreSQL's docker-entrypoint-initdb.d/:
# docker-compose.yml
postgres:
volumes:
- ./backend/migrations:/docker-entrypoint-initdb.d- Create new SQL file:
backend/migrations/002_description.sql - Write migration SQL
- Restart PostgreSQL container
Example migration:
-- 002_add_metadata_column.sql
ALTER TABLE jobs ADD COLUMN metadata JSONB DEFAULT '{}'::jsonb;
CREATE INDEX idx_jobs_metadata ON jobs USING GIN (metadata);SELECT
id,
file_name,
status,
workflow_state,
created_at,
updated_at
FROM jobs
ORDER BY created_at DESC
LIMIT 50 OFFSET 0;SELECT
j.id,
j.file_name,
j.status as job_status,
j.workflow_state,
e.id as export_id,
e.export_type,
e.status as export_status,
e.progress
FROM jobs j
LEFT JOIN export_jobs e ON j.id = e.job_id
WHERE j.id = 'your-uuid-here';DELETE FROM jobs
WHERE created_at < NOW() - INTERVAL '12 hours'
AND status IN ('completed', 'failed');SELECT file_hash, COUNT(*) as count
FROM jobs
WHERE file_hash IS NOT NULL
GROUP BY file_hash
HAVING COUNT(*) > 1;# Backup to SQL file
docker exec meetmemo-postgres pg_dump -U meetmemo meetmemo > backup.sql
# Backup with compression
docker exec meetmemo-postgres pg_dump -U meetmemo meetmemo | gzip > backup.sql.gz
# Backup specific table
docker exec meetmemo-postgres pg_dump -U meetmemo meetmemo -t jobs > jobs_backup.sql# Restore from SQL file
docker exec -i meetmemo-postgres psql -U meetmemo meetmemo < backup.sql
# Restore from compressed file
gunzip < backup.sql.gz | docker exec -i meetmemo-postgres psql -U meetmemo meetmemo
# Restore specific table
docker exec -i meetmemo-postgres psql -U meetmemo meetmemo < jobs_backup.sql-- Analyze tables for query optimization
ANALYZE jobs;
ANALYZE export_jobs;
-- Full vacuum (reclaim storage)
VACUUM FULL jobs;
VACUUM FULL export_jobs;SELECT
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;SELECT
schemaname,
tablename,
indexname,
idx_scan as index_scans,
idx_tup_read as tuples_read,
idx_tup_fetch as tuples_fetched
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;Check if PostgreSQL is running:
docker compose ps postgres
docker exec meetmemo-postgres pg_isready -U meetmemoTest connection from backend:
docker exec meetmemo-backend python -c "
import asyncio
from database import get_pool
async def test():
pool = await get_pool()
async with pool.acquire() as conn:
result = await conn.fetchval('SELECT 1')
print(f'Connection successful: {result}')
asyncio.run(test())
"Enable query logging in PostgreSQL:
# Inside postgres container
echo "log_statement = 'all'" >> /var/lib/postgresql/data/postgresql.conf
echo "log_duration = on" >> /var/lib/postgresql/data/postgresql.conf
# Restart PostgreSQL
docker compose restart postgresView slow queries:
docker exec meetmemo-postgres tail -f /var/lib/postgresql/data/log/postgresql-*.logCheck for blocking queries:
SELECT
blocked_locks.pid AS blocked_pid,
blocked_activity.usename AS blocked_user,
blocking_locks.pid AS blocking_pid,
blocking_activity.usename AS blocking_user,
blocked_activity.query AS blocked_statement,
blocking_activity.query AS blocking_statement
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;Kill blocking query:
SELECT pg_terminate_backend(pid);The cleanup service automatically removes old data:
- Jobs: Deleted after
job_retention_hours(default: 12 hours) - Export Jobs: Deleted after
export_retention_hours(default: 24 hours)
Configure in backend/config.py:
cleanup_interval_hours: int = 1
job_retention_hours: int = 12
export_retention_hours: int = 24- Password: Change default
POSTGRES_PASSWORDin production - Network: Database only accessible within Docker network
- Parameterized Queries: All queries use parameterized statements (prevents SQL injection)
- Backups: Regular backups recommended for production
PostgreSQL is not exposed to the host by default (no port mapping). To access:
Option 1: Add port mapping (development only):
# docker-compose.yml
postgres:
ports:
- "5432:5432"Option 2: Use docker exec:
docker exec -it meetmemo-postgres psql -U meetmemo meetmemoOption 3: Port forward:
kubectl port-forward svc/postgres 5432:5432 # If using KubernetesSELECT
numbackends as current_connections,
datname as database
FROM pg_stat_database
WHERE datname = 'meetmemo';SELECT
pid,
usename,
application_name,
client_addr,
state,
query,
query_start
FROM pg_stat_activity
WHERE datname = 'meetmemo'
AND state != 'idle'
ORDER BY query_start;SELECT
schemaname,
tablename,
n_tup_ins as inserts,
n_tup_upd as updates,
n_tup_del as deletes,
n_live_tup as live_rows,
n_dead_tup as dead_rows,
last_vacuum,
last_autovacuum,
last_analyze,
last_autoanalyze
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC;