pgtrace is an always-on, low-overhead, bounded "black box" for PostgreSQL incidents. It continuously captures behavioral signals (waits, locks, query fingerprints, replication/WAL, markers) into a rewindable timeline.
- Agent (Rust): Polls source DB (production Postgres, read-only), derives lock edges, inserts to store DB
- Store (PostgreSQL): Dedicated database with pgtrace schema, migrations, retention logic
- Server (Elixir/Phoenix): REST API + LiveView realtime incident room UI (reads from store DB)
- CLI (Rust): Terminal interface for querying store DB and rendering readable output
- source_db: Production Postgres being observed (read-only access)
- store_db: Dedicated Postgres database for PGTrace storage (write access, pgtrace schema)
- Default deployment: separate databases (recommended topology B from spec)
Authoritative list (frozen values):
- 10 – WaitSample
- 20 – LockEdgeSummary
- 30 – QueryStatsDelta
- 40 – ReplicationSample
- 41 – WALSample
- 50 – ExplainSample
- 90 – Marker
- 91 – Heartbeat
- 99 – Internal / Diagnostic
- Agent polls source DB at configured intervals
- Agent derives lock edges and computes rollups
- Agent inserts data into store DB
- Server/CLI reads from store DB for queries and UI
- Retention jobs clean up old data
- Each agent instance observes one source_db and writes to one store_db
- Multiple agents can write to the same store_db (multi-node monitoring)
- All tables include node_id for scoping
- Primary keys include node_id where appropriate