Modular diagnostic and performance analyzer for CockroachDB clusters. Works with CockroachDB via SQL (crdb_internal tables) or HTTP admin API, and can also analyze offline JSON range dumps.
- Hot Ranges -- Find the largest or busiest ranges in the cluster
- Hot Nodes -- Identify overloaded nodes by lease count and data size
- Data Skew -- Detect disproportionately large ranges
- Table Stats -- Per-table breakdown of range count and total size
- Contention -- Surface lock contention events and slow statements
- Range Details -- Inspect specific ranges by ID
- Index Usage -- Find unused and hottest indexes
- Lease Balance -- Check replica/lease distribution across nodes
- Statement Fingerprints -- Identify slowest queries, most executed, highest rows read
- Cluster Health -- Node liveness, capacity, version skew detection
- Node Hotspot -- Deep-dive into why a specific node is running hot
- Rebalance Status -- Check if cluster rebalancing is complete with per-node distribution diagnostics
- Job Status -- Detect stuck GC jobs, coordinator imbalance, and failed jobs
- Statement Errors -- Failing queries, retry errors, per-node failure distribution, contention correlation
- Snapshot / History / Compare -- Save point-in-time snapshots and diff them later
All commands support table, JSON, and CSV output formats.
- Python 3.10+
- Network access to a CockroachDB cluster (SQL port 26257 and/or HTTP admin port 8080)
This project uses uv for fast, reproducible dependency management.
# Install uv (if you don't have it)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone and install
git clone https://github.com/cockroachdb/cockroachdb-analyzer.git
cd cockroachdb-analyzer
uv sync # installs runtime deps into .venv
uv sync --dev # includes dev tools (ruff, mypy, pytest)The crdb-analyzer CLI is available inside the managed venv. Run it with:
uv run crdb-analyzer --helpAlternative: install with pip
pip install -e .
pip install -e ".[dev]" # with dev tools# Local insecure cluster
export CRDB_SQL_URL="postgresql://root@localhost:26257/defaultdb?sslmode=disable"
# CockroachDB Cloud (TLS)
export CRDB_SQL_URL="postgresql://myuser:changeme@host:26257/mydb?sslmode=verify-full&sslrootcert=/path/to/ca.crt"# Top 10 ranges by size
crdb-analyzer hot-ranges --limit 10
# Node load distribution
crdb-analyzer hot-nodes
# Data skew detection
crdb-analyzer data-skew --limit 20
# Per-table range breakdown (filter by database/table)
crdb-analyzer table-stats --database mydb --table users
# Contention and slow statements
crdb-analyzer contention --limit 10
# Unused and hot indexes
crdb-analyzer index-usage --limit 30
# Replica/lease balance across nodes
crdb-analyzer lease-balance
# Slow queries, most executed, highest rows read
crdb-analyzer stmt-fingerprints --limit 20
# Cluster health: node liveness, capacity, version skew
crdb-analyzer cluster-health
# Deep-dive: why is node 2 running hot?
crdb-analyzer node-hotspot --node-id 2
# Check if cluster rebalancing is complete
crdb-analyzer rebalance-status
# With a custom balance threshold (default: 5%)
crdb-analyzer rebalance-status --balance-threshold 10
# Detect stuck GC jobs, coordinator imbalance, failed jobs
crdb-analyzer job-status
# Statement failures, retries, and contention errors (last 1h default)
crdb-analyzer stmt-errors
# Look back further
crdb-analyzer stmt-errors --since 6h --limit 50When you see one node with more leases, replicas, or data than others,
node-hotspot explains why:
crdb-analyzer node-hotspot --node-id 2This shows:
- Node vs Cluster Average -- leases, replicas, and data size compared to the mean
- Top Tables by Lease Count -- which tables have disproportionately many leases on this node
- Largest Ranges -- the biggest ranges whose lease sits on this node
- Store Capacity -- disk usage across all nodes to spot hardware asymmetry
- Zone Configs -- any lease preferences or constraints that could be pinning to this node
- Locality Match -- whether the node's locality matches any zone config lease preferences
crdb-analyzer rebalance-statusThis checks:
- Replication Stats -- under/over-replicated and unavailable ranges (must all be 0)
- Store Balance -- range count spread across stores (configurable threshold, default 5%)
- Per-Node Range Distribution -- expected vs actual replicas per node with RF breakdown, flags OVER/UNDER nodes
- Rebalance Direction -- net add/remove voter events per store in the last hour
- Rangelog Activity -- detects active
add_voter/remove_voterpairs (rebalancing) vsadd_voter-only (initial placement) - Cluster Settings -- shows current values and adjustment commands for
kv.snapshot_rebalance.max_rate,kv.range_split.load_qps_threshold, andrange_max_bytes
Works on self-hosted and CockroachDB Cloud clusters (virtual clusters get a reduced view).
crdb-analyzer job-statusThis detects:
- Schema Change GC backlog -- stuck "waiting for MVCC GC" jobs from repeated
TRUNCATE TABLEoperations that occupy dead ranges and inflate node storage - Coordinator imbalance -- one node coordinating disproportionately many jobs (adds CPU/IO load)
- Failed/reverting jobs -- recent job failures with error details
- GC TTL configuration -- shows
gc.ttlsecondsper zone with the exactALTER ... CONFIGURE ZONEcommand to adjust it
Range IDs appear in the output of hot-ranges, data-skew, and table-stats. Run one of those first, then use the range_id column values:
# Step 1: find interesting ranges
crdb-analyzer hot-ranges --limit 5
# Output shows range_id column: 16895, 18879, 14166, ...
# Step 2: drill into specific ranges
crdb-analyzer range-details --range-ids 16895,18879,14166Any analysis can be saved as a point-in-time snapshot for later review or comparison.
# Save a snapshot automatically when running any analyzer
crdb-analyzer hot-nodes --save
crdb-analyzer hot-ranges --limit 20 --save
# Or explicitly capture a snapshot
crdb-analyzer snapshot hot-nodes
crdb-analyzer snapshot cluster-health
# List all snapshots
crdb-analyzer history
# Filter by type or time window
crdb-analyzer history --type hot-nodes
crdb-analyzer history --since 2h
crdb-analyzer history --since 1d
# View a specific snapshot
crdb-analyzer history --show abc123def456
# Compare two snapshots (shows added/removed/changed rows)
crdb-analyzer compare abc123 def456Snapshots are stored in SQLite by default (~/.crdb-analyzer/snapshots.db). To store them in the CockroachDB cluster instead:
crdb-analyzer --snapshot-store crdb snapshot hot-nodes
crdb-analyzer --snapshot-store crdb history --type hot-nodesThe daemon command runs snapshot cycles on a schedule so you always have
historical data to look back on. It captures all (or selected) analyses every
N seconds and optionally prunes old snapshots.
# Default: all 9 analyses every 15 minutes
crdb-analyzer daemon
# Every 5 minutes, only hot-ranges and hot-nodes
crdb-analyzer daemon --interval 300 --analyses hot-ranges,hot-nodes
# Every 10 minutes, keep only the last 7 days of data
crdb-analyzer daemon --interval 600 --retention-days 7
# Track specific nodes for hotspot analysis each cycle
crdb-analyzer daemon --node-ids 1,2,3
# Combine: every 5 min, track nodes 2 and 3, keep 14 days
crdb-analyzer daemon --interval 300 --node-ids 2,3 --retention-days 14
# Store snapshots in CockroachDB instead of local SQLite
crdb-analyzer --snapshot-store crdb daemon --interval 900The daemon prints a summary each cycle and responds to Ctrl+C / SIGTERM for graceful shutdown.
Running as a background service (systemd example):
# /etc/systemd/system/crdb-analyzer.service
[Unit]
Description=CockroachDB Analyzer Daemon
After=network.target
[Service]
Type=simple
EnvironmentFile=/etc/crdb-analyzer/env
ExecStart=/usr/local/bin/crdb-analyzer daemon --interval 900 --retention-days 30
Restart=on-failure
RestartSec=30
[Install]
WantedBy=multi-user.targetCreate the environment file (keep credentials out of unit files):
# /etc/crdb-analyzer/env
CRDB_SQL_URL=postgresql://myuser:changeme@host:26257/defaultdb?sslmode=verify-fullsudo systemctl enable --now crdb-analyzerRunning with nohup:
nohup crdb-analyzer daemon --interval 900 --retention-days 14 > /var/log/crdb-analyzer.log 2>&1 &Once the daemon has been collecting data, you can look back at any point:
# What did hot-ranges look like 24 hours ago?
crdb-analyzer history --type hot-ranges --since 24h
# View a specific snapshot
crdb-analyzer history --show <snapshot_id>
# Compare morning vs evening
crdb-analyzer compare <morning_id> <evening_id>crdb-analyzer hot-ranges --limit 5 # table (default)
crdb-analyzer --format json hot-nodes # JSON
crdb-analyzer --format csv data-skew --limit 100 > skew.csv # CSVcrdb-analyzer --sql-url "postgresql://root@localhost:26257/defaultdb?sslmode=disable" hot-rangescrdb-analyzer hot-ranges --from-file ranges.json
crdb-analyzer hot-nodes --from-file ranges.json
crdb-analyzer data-skew --from-file ranges.json# Self-hosted (insecure)
crdb-analyzer --admin-url "http://localhost:8080" hot-ranges
# With authentication (CLI flags)
crdb-analyzer \
--admin-url "https://your-cluster:8080" \
--admin-user "myuser" \
--admin-password "changeme" \
hot-ranges
# With authentication (environment variables)
export CRDB_ADMIN_USER="myuser"
export CRDB_ADMIN_PASSWORD="changeme"
crdb-analyzer --admin-url "https://your-cluster:8080" hot-rangesCredentials can also be set in the YAML config file (see below).
Note: CockroachDB Cloud admin UIs often use session-based auth which may not work with basic auth. For Cloud clusters, prefer
--sql-urlwhich all analyzers support.
Create ~/.crdb-analyzer.yaml or ./crdb-analyzer.yaml:
sql_url: "postgresql://root@localhost:26257/defaultdb?sslmode=disable"
admin_url: "http://localhost:8080"
admin_user: "myuser" # optional
admin_password: "changeme" # optional
ca_cert: "/path/to/ca.crt" # optional, for TLS
client_cert: "/path/to/client.crt" # optional
client_key: "/path/to/client.key" # optional
timeout: 30crdb-analyzer -v hot-ranges --limit 5| Command | Description |
|---|---|
hot-ranges |
Find hottest ranges by size (SQL) or QPS (API/file) |
hot-nodes |
Show load distribution across nodes |
data-skew |
Detect data skew by finding the largest ranges |
table-stats |
Per-table range distribution and size breakdown |
contention |
Lock contention and slow statements |
range-details |
Inspect specific ranges by ID |
index-usage |
Unused and hot indexes |
lease-balance |
Replica/lease distribution across nodes |
stmt-fingerprints |
Slow queries, most executed, highest rows read |
cluster-health |
Node liveness, capacity, version skew |
node-hotspot |
Diagnose why a specific node is running hot |
rebalance-status |
Check if cluster rebalancing is complete |
job-status |
Detect stuck GC jobs, coordinator imbalance |
stmt-errors |
Failing queries, retries, contention errors |
snapshot |
Capture point-in-time snapshot of any analysis |
history |
Browse and view historical snapshots |
compare |
Diff two snapshots side by side |
daemon |
Run continuous snapshots on a schedule |
crdb_analyzer/
├── cli.py # Click CLI with 19 commands
├── config.py # Config resolution (CLI > env > YAML file)
├── retry.py # CockroachDB-aware retry with exponential backoff
├── clients/
│ ├── http_client.py # HTTP client for /_status/ranges, /hotranges, etc.
│ └── sql_client.py # psycopg 3 client with retry and reconnect
├── analyzers/
│ ├── base.py # Base class with shared range data extraction
│ ├── hot_ranges.py # Top ranges by size (SQL) or QPS (API)
│ ├── hot_nodes.py # Per-node load aggregation
│ ├── data_skew.py # Largest ranges detection
│ ├── table_stats.py # Per-table/index breakdown
│ ├── contention.py # Lock contention + slow statements
│ ├── index_usage.py # Unused and hot indexes
│ ├── lease_balance.py # Replica/lease distribution
│ ├── stmt_fingerprints.py # Statement fingerprint analysis
│ ├── cluster_health.py # Cluster health overview
│ ├── node_hotspot.py # Per-node hotspot deep-dive
│ ├── rebalance_status.py # Rebalance completion check
│ ├── job_status.py # Stuck jobs, GC backlog, coordinator imbalance
│ └── stmt_errors.py # Statement failures, retries, contention
├── storage/
│ ├── base.py # Abstract snapshot store interface
│ ├── sqlite_store.py # SQLite backend (default, local)
│ └── crdb_store.py # CockroachDB backend (centralized)
└── formatters/
└── output.py # Table, JSON, CSV output
- Automatic retry with exponential backoff and jitter on CockroachDB serialization failures (40001), connection resets, and timeouts
- Auto-reconnect on stale or broken SQL connections
uv sync --dev
# Run tests
uv run pytest
# Lint
uv run ruff check .
# Type check
uv run mypy crdb_analyzer/
# Run a specific command
uv run crdb-analyzer hot-ranges --limit 5Apache License 2.0 - see LICENSE for details.
This project was inspired by Andrew Deally's article Troubleshooting Hot Ranges — CockroachDB.