Connects to a live PostgreSQL instance, pulls execution statistics from pg_stat_statements, and surfaces the top slow queries with performance metrics and actionable optimization hints. Delivers results as a rich console table, a standalone HTML report, or a scheduled email.
- Queries
pg_stat_statementsfor real execution data (mean time, std dev, cache hit ratio, temp block usage) - Ranks queries by mean execution time and computes derived metrics:
- Cache hit ratio — identifies queries bypassing the buffer cache
- Coefficient of variation — flags unstable queries with inconsistent runtimes
- Rows per call — surfaces sequential scan candidates
- Temp block usage — detects sort/hash spills to disk
- Generates optimization hints per query (index suggestions, work_mem tuning, plan instability warnings)
- Produces a self-contained HTML report with a summary scorecard and per-query detail cards
- Optionally emails the report via any SMTP relay
- Python 3.11+
- PostgreSQL 13+ with
pg_stat_statementsextension enabled
Enable the extension (run once as superuser):
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;Ensure pg_stat_statements is loaded in postgresql.conf:
shared_preload_libraries = 'pg_stat_statements'
pip install -r requirements.txt
cp .env.example .env
# edit .env with your PG_PASSWORD (and SMTP_* if using email)
# edit config/settings.yaml with your database host, name, userpython main.py analyze
python main.py analyze --config path/to/settings.yamlpython main.py report
python main.py report --output weekly_report.htmlpython main.py emailEdit config/settings.yaml:
database:
host: localhost
port: 5432
dbname: your_database
user: postgres
analysis:
top_n: 10
min_calls: 5
slow_query_threshold_ms: 100.0
email:
smtp_host: smtp.gmail.com
smtp_port: 587
use_tls: true
from_address: alerts@example.com
to_addresses:
- dba@example.comSecrets go in .env (never in settings.yaml):
PG_PASSWORD=your_postgres_password
SMTP_USER=alerts@example.com
SMTP_PASSWORD=your_smtp_password
main.py (typer CLI)
├── PostgresClient — connects to PG, fetches pg_stat_statements rows
├── QueryAnalyzer — ranks queries, computes metrics, generates hints
│ └── pandas — percentile stats (p50/p95/p99) across full query set
├── ReportGenerator — Jinja2 HTML template → report.html
└── EmailSender — smtplib: sends HTML report via SMTP
pg_stat_statements over pg_stat_activity — stat_activity shows live sessions; stat_statements gives historical aggregates across all executions, which is what performance analysis needs.
Mean execution time as primary rank key — total time skews toward high-frequency cheap queries; mean time surfaces the genuinely slow ones regardless of call volume.
Coefficient of variation as instability signal — stddev alone is hard to interpret. CV (stddev / mean) normalizes it: a CV > 0.5 on a slow query points to lock waits, plan flips, or parameter sniffing rather than uniformly slow execution.
pandas for percentile stats — a one-liner quantile() across the full query set gives p50/p95/p99 that would require a window function to compute in SQL.
pytestAll tests mock psycopg2 and smtplib — no live database or SMTP server required.
pg_stat_statements exposes more signal than most engineers realize. The stddev_exec_time / mean_exec_time ratio turned out to be a cleaner instability detector than I expected — a CV above 0.5 almost always pointed to something interesting (lock contention, plan instability, or stale statistics) rather than just noisy data. Building the optimization hint logic against real PostgreSQL documentation made me much more concrete about what each metric actually indicates versus what I assumed it meant.