pgtrace architecture

Overview

pgtrace is an always-on, low-overhead, bounded "black box" for PostgreSQL incidents. It continuously captures behavioral signals (waits, locks, query fingerprints, replication/WAL, markers) into a rewindable timeline.

Components

Agent (Rust): Polls source DB (production Postgres, read-only), derives lock edges, inserts to store DB
Store (PostgreSQL): Dedicated database with pgtrace schema, migrations, retention logic
Server (Elixir/Phoenix): REST API + LiveView realtime incident room UI (reads from store DB)
CLI (Rust): Terminal interface for querying store DB and rendering readable output

Database separation

source_db: Production Postgres being observed (read-only access)
store_db: Dedicated Postgres database for PGTrace storage (write access, pgtrace schema)
Default deployment: separate databases (recommended topology B from spec)

Event Kind Enum

Authoritative list (frozen values):

10 – WaitSample
20 – LockEdgeSummary
30 – QueryStatsDelta
40 – ReplicationSample
41 – WALSample
50 – ExplainSample
90 – Marker
91 – Heartbeat
99 – Internal / Diagnostic

Data flow

Agent polls source DB at configured intervals
Agent derives lock edges and computes rollups
Agent inserts data into store DB
Server/CLI reads from store DB for queries and UI
Retention jobs clean up old data

Multi-node support

Each agent instance observes one source_db and writes to one store_db
Multiple agents can write to the same store_db (multi-node monitoring)
All tables include node_id for scoping
Primary keys include node_id where appropriate

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pgtrace architecture

Overview

Components

Database separation

Event Kind Enum

Data flow

Multi-node support

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

pgtrace architecture

Overview

Components

Database separation

Event Kind Enum

Data flow

Multi-node support