Skip to content

Latest commit

 

History

History
47 lines (34 loc) · 1.65 KB

File metadata and controls

47 lines (34 loc) · 1.65 KB

pgtrace architecture

Overview

pgtrace is an always-on, low-overhead, bounded "black box" for PostgreSQL incidents. It continuously captures behavioral signals (waits, locks, query fingerprints, replication/WAL, markers) into a rewindable timeline.

Components

  1. Agent (Rust): Polls source DB (production Postgres, read-only), derives lock edges, inserts to store DB
  2. Store (PostgreSQL): Dedicated database with pgtrace schema, migrations, retention logic
  3. Server (Elixir/Phoenix): REST API + LiveView realtime incident room UI (reads from store DB)
  4. CLI (Rust): Terminal interface for querying store DB and rendering readable output

Database separation

  • source_db: Production Postgres being observed (read-only access)
  • store_db: Dedicated Postgres database for PGTrace storage (write access, pgtrace schema)
  • Default deployment: separate databases (recommended topology B from spec)

Event Kind Enum

Authoritative list (frozen values):

  • 10 – WaitSample
  • 20 – LockEdgeSummary
  • 30 – QueryStatsDelta
  • 40 – ReplicationSample
  • 41 – WALSample
  • 50 – ExplainSample
  • 90 – Marker
  • 91 – Heartbeat
  • 99 – Internal / Diagnostic

Data flow

  1. Agent polls source DB at configured intervals
  2. Agent derives lock edges and computes rollups
  3. Agent inserts data into store DB
  4. Server/CLI reads from store DB for queries and UI
  5. Retention jobs clean up old data

Multi-node support

  • Each agent instance observes one source_db and writes to one store_db
  • Multiple agents can write to the same store_db (multi-node monitoring)
  • All tables include node_id for scoping
  • Primary keys include node_id where appropriate