Skip to content

shubhamwagdarkar/postgres-query-performance-tracker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

postgres-query-performance-tracker

Connects to a live PostgreSQL instance, pulls execution statistics from pg_stat_statements, and surfaces the top slow queries with performance metrics and actionable optimization hints. Delivers results as a rich console table, a standalone HTML report, or a scheduled email.


What It Does

  • Queries pg_stat_statements for real execution data (mean time, std dev, cache hit ratio, temp block usage)
  • Ranks queries by mean execution time and computes derived metrics:
    • Cache hit ratio — identifies queries bypassing the buffer cache
    • Coefficient of variation — flags unstable queries with inconsistent runtimes
    • Rows per call — surfaces sequential scan candidates
    • Temp block usage — detects sort/hash spills to disk
  • Generates optimization hints per query (index suggestions, work_mem tuning, plan instability warnings)
  • Produces a self-contained HTML report with a summary scorecard and per-query detail cards
  • Optionally emails the report via any SMTP relay

Requirements

  • Python 3.11+
  • PostgreSQL 13+ with pg_stat_statements extension enabled

Enable the extension (run once as superuser):

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

Ensure pg_stat_statements is loaded in postgresql.conf:

shared_preload_libraries = 'pg_stat_statements'

Setup

pip install -r requirements.txt
cp .env.example .env
# edit .env with your PG_PASSWORD (and SMTP_* if using email)
# edit config/settings.yaml with your database host, name, user

Usage

Analyze (console output)

python main.py analyze
python main.py analyze --config path/to/settings.yaml

Generate HTML report

python main.py report
python main.py report --output weekly_report.html

Email the report

python main.py email

Configuration

Edit config/settings.yaml:

database:
  host: localhost
  port: 5432
  dbname: your_database
  user: postgres

analysis:
  top_n: 10
  min_calls: 5
  slow_query_threshold_ms: 100.0

email:
  smtp_host: smtp.gmail.com
  smtp_port: 587
  use_tls: true
  from_address: alerts@example.com
  to_addresses:
    - dba@example.com

Secrets go in .env (never in settings.yaml):

PG_PASSWORD=your_postgres_password
SMTP_USER=alerts@example.com
SMTP_PASSWORD=your_smtp_password

Architecture

main.py (typer CLI)
    ├── PostgresClient      — connects to PG, fetches pg_stat_statements rows
    ├── QueryAnalyzer       — ranks queries, computes metrics, generates hints
    │     └── pandas        — percentile stats (p50/p95/p99) across full query set
    ├── ReportGenerator     — Jinja2 HTML template → report.html
    └── EmailSender         — smtplib: sends HTML report via SMTP

Key design decisions

pg_stat_statements over pg_stat_activity — stat_activity shows live sessions; stat_statements gives historical aggregates across all executions, which is what performance analysis needs.

Mean execution time as primary rank key — total time skews toward high-frequency cheap queries; mean time surfaces the genuinely slow ones regardless of call volume.

Coefficient of variation as instability signal — stddev alone is hard to interpret. CV (stddev / mean) normalizes it: a CV > 0.5 on a slow query points to lock waits, plan flips, or parameter sniffing rather than uniformly slow execution.

pandas for percentile stats — a one-liner quantile() across the full query set gives p50/p95/p99 that would require a window function to compute in SQL.


Running Tests

pytest

All tests mock psycopg2 and smtplib — no live database or SMTP server required.


What I Learned

pg_stat_statements exposes more signal than most engineers realize. The stddev_exec_time / mean_exec_time ratio turned out to be a cleaner instability detector than I expected — a CV above 0.5 almost always pointed to something interesting (lock contention, plan instability, or stale statistics) rather than just noisy data. Building the optimization hint logic against real PostgreSQL documentation made me much more concrete about what each metric actually indicates versus what I assumed it meant.

About

Analyzes PostgreSQL slow queries via pg_stat_statements ranks by mean execution time, generates optimization hints, and delivers HTML reports via email

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors