|
| 1 | +# Lock Waits Metric Testing |
| 2 | + |
| 3 | +This directory contains tests and scripts to verify that the `lock_waits` metric is working correctly. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +The `lock_waits` metric collects detailed information about lock waits in PostgreSQL, including: |
| 8 | +- Waiting and blocking process IDs |
| 9 | +- User names and application names |
| 10 | +- Lock modes and types |
| 11 | +- Affected tables |
| 12 | +- Query IDs (PostgreSQL 14+) |
| 13 | +- Wait durations and blocker transaction durations |
| 14 | + |
| 15 | +## Test Components |
| 16 | + |
| 17 | +### 1. Python Test Script (`test_lock_waits_metric.py`) |
| 18 | + |
| 19 | +Automated test that: |
| 20 | +- Creates lock contention scenarios in the target database |
| 21 | +- Waits for pgwatch to collect metrics |
| 22 | +- Verifies the metric is collected in Prometheus/VictoriaMetrics |
| 23 | +- Validates the metric structure and labels |
| 24 | + |
| 25 | +### 2. SQL Script (`create_lock_contention.sql`) |
| 26 | + |
| 27 | +Manual SQL script to create lock contention for testing. Can be run in multiple psql sessions. |
| 28 | + |
| 29 | +## Prerequisites |
| 30 | + |
| 31 | +1. Docker Compose stack running: |
| 32 | + ```bash |
| 33 | + docker-compose up -d |
| 34 | + ``` |
| 35 | + |
| 36 | +2. Python dependencies: |
| 37 | + ```bash |
| 38 | + pip install psycopg requests |
| 39 | + ``` |
| 40 | + |
| 41 | +3. Ensure `lock_waits` metric is enabled in pgwatch configuration: |
| 42 | + - Check `config/pgwatch-prometheus/metrics.yml` includes `lock_waits` |
| 43 | + - Verify pgwatch is collecting metrics from the target database |
| 44 | + |
| 45 | +## Running the Automated Test |
| 46 | + |
| 47 | +### Basic Usage |
| 48 | + |
| 49 | +```bash |
| 50 | +# From the project root |
| 51 | +python tests/lock_waits/test_lock_waits_metric.py |
| 52 | +``` |
| 53 | + |
| 54 | +### With Custom Configuration |
| 55 | + |
| 56 | +```bash |
| 57 | +python tests/lock_waits/test_lock_waits_metric.py \ |
| 58 | + --target-db-url "postgresql://postgres:postgres@localhost:55432/target_database" \ |
| 59 | + --prometheus-url "http://localhost:59090" \ |
| 60 | + --test-dbname "target_database" \ |
| 61 | + --collection-wait 90 |
| 62 | +``` |
| 63 | + |
| 64 | +### Environment Variables |
| 65 | + |
| 66 | +You can also set these via environment variables: |
| 67 | + |
| 68 | +```bash |
| 69 | +export TARGET_DB_URL="postgresql://postgres:postgres@localhost:55432/target_database" |
| 70 | +export PROMETHEUS_URL="http://localhost:59090" |
| 71 | +export TEST_DBNAME="target_database" |
| 72 | +export COLLECTION_WAIT_SECONDS=90 |
| 73 | + |
| 74 | +python tests/lock_waits/test_lock_waits_metric.py |
| 75 | +``` |
| 76 | + |
| 77 | +## Manual Testing |
| 78 | + |
| 79 | +### Step 1: Create Lock Contention |
| 80 | + |
| 81 | +Open two psql sessions to the target database: |
| 82 | + |
| 83 | +**Session 1 (Blocker):** |
| 84 | +```sql |
| 85 | +BEGIN; |
| 86 | +SELECT * FROM lock_test_table WHERE id = 1 FOR UPDATE; |
| 87 | +-- Keep this transaction open |
| 88 | +``` |
| 89 | + |
| 90 | +**Session 2 (Waiter):** |
| 91 | +```sql |
| 92 | +BEGIN; |
| 93 | +SELECT * FROM lock_test_table WHERE id = 1 FOR UPDATE; |
| 94 | +-- This will wait for Session 1 to release the lock |
| 95 | +``` |
| 96 | + |
| 97 | +### Step 2: Verify Metric Collection |
| 98 | + |
| 99 | +Wait for pgwatch to collect metrics (check collection interval in pgwatch config, typically 15-30 seconds), then query Prometheus: |
| 100 | + |
| 101 | +```bash |
| 102 | +# Query Prometheus API for lock_waits metrics |
| 103 | +curl "http://localhost:59090/api/v1/query?query=pgwatch_lock_waits_waiting_ms{datname=\"target_database\"}" |
| 104 | + |
| 105 | +# Or use PromQL in Grafana Explore |
| 106 | +pgwatch_lock_waits_waiting_ms{datname="target_database"} |
| 107 | +pgwatch_lock_waits_blocker_tx_ms{datname="target_database"} |
| 108 | +``` |
| 109 | + |
| 110 | +### Step 3: Check Grafana Dashboard |
| 111 | + |
| 112 | +1. Open Grafana: http://localhost:3000 |
| 113 | +2. Navigate to "Lock waits details" dashboard |
| 114 | +3. Select the database from the dropdown |
| 115 | +4. Verify that lock wait events appear in the panels |
| 116 | + |
| 117 | +## Expected Results |
| 118 | + |
| 119 | +### Successful Test Output |
| 120 | + |
| 121 | +``` |
| 122 | +Setting up test environment... |
| 123 | +✓ Test table created |
| 124 | +
|
| 125 | +Creating lock contention for 30 seconds... |
| 126 | +✓ Blocker transaction started (holding lock on row id=1) |
| 127 | +✓ Waiter transaction started (waiting for lock on row id=1) |
| 128 | + Holding locks for 30 seconds... |
| 129 | +✓ Lock contention ended |
| 130 | +
|
| 131 | +Verifying metric collection... |
| 132 | + Waiting 60 seconds for pgwatch to collect metrics... |
| 133 | + ✓ Found 5 lock_waits records |
| 134 | +
|
| 135 | +Validating metric structure... |
| 136 | +
|
| 137 | + Record 1: |
| 138 | + ✓ All required data fields present |
| 139 | + ✓ waiting_ms is numeric: 25000 ms |
| 140 | + ✓ blocker_tx_ms is numeric: 30000 ms |
| 141 | +
|
| 142 | +✅ Test PASSED: lock_waits metric is working correctly |
| 143 | +``` |
| 144 | + |
| 145 | +## Troubleshooting |
| 146 | + |
| 147 | +### No Records Found |
| 148 | + |
| 149 | +- **Check pgwatch is running**: `docker ps | grep pgwatch-prometheus` |
| 150 | +- **Check pgwatch logs**: `docker logs pgwatch-prometheus` |
| 151 | +- **Verify metric is enabled**: Check `config/pgwatch-prometheus/metrics.yml` |
| 152 | +- **Check Prometheus is accessible**: `curl http://localhost:59090/api/v1/status/config` |
| 153 | +- **Increase wait time**: Use `--collection-wait 120` to wait longer |
| 154 | +- **Check database name**: Ensure `--test-dbname` matches the monitored database |
| 155 | +- **Verify metrics exist**: `curl "http://localhost:59090/api/v1/label/__name__/values" | grep lock_waits` |
| 156 | + |
| 157 | +### Invalid Data Structure |
| 158 | + |
| 159 | +- **Check PostgreSQL version**: Metric requires PostgreSQL 14+ for query_id support |
| 160 | +- **Verify metric SQL**: Check the SQL query in `metrics.yml` is correct |
| 161 | +- **Check pgwatch version**: Ensure pgwatch version supports the metric format |
| 162 | +- **Check Prometheus labels**: Verify metrics have expected labels (datname, waiting_pid, blocker_pid, etc.) |
| 163 | + |
| 164 | +### Connection Errors |
| 165 | + |
| 166 | +- **Verify Docker containers**: `docker-compose ps` |
| 167 | +- **Check connection strings**: Verify URLs match your docker-compose configuration |
| 168 | +- **Check Prometheus URL**: Ensure Prometheus/VictoriaMetrics is accessible at the specified URL |
| 169 | +- **Check network**: Ensure containers can communicate (same Docker network) |
| 170 | + |
| 171 | +## Integration with CI/CD |
| 172 | + |
| 173 | +The test can be integrated into CI/CD pipelines: |
| 174 | + |
| 175 | +```yaml |
| 176 | +# Example GitLab CI |
| 177 | +test_lock_waits: |
| 178 | + stage: test |
| 179 | + script: |
| 180 | + - docker-compose up -d |
| 181 | + - sleep 30 # Wait for services to start |
| 182 | + - pip install psycopg |
| 183 | + - python tests/lock_waits/test_lock_waits_metric.py |
| 184 | + --target-db-url "$TARGET_DB_URL" |
| 185 | + --sink-db-url "$SINK_DB_URL" |
| 186 | + --collection-wait 90 |
| 187 | + only: |
| 188 | + - merge_requests |
| 189 | + - main |
| 190 | +``` |
| 191 | +
|
| 192 | +## Additional Test Scenarios |
| 193 | +
|
| 194 | +### Test Different Lock Types |
| 195 | +
|
| 196 | +Modify the test to create different types of locks: |
| 197 | +
|
| 198 | +```sql |
| 199 | +-- Table-level lock |
| 200 | +LOCK TABLE lock_test_table IN EXCLUSIVE MODE; |
| 201 | + |
| 202 | +-- Advisory lock |
| 203 | +SELECT pg_advisory_lock(12345); |
| 204 | +``` |
| 205 | + |
| 206 | +### Test Multiple Concurrent Waits |
| 207 | + |
| 208 | +Create multiple waiting transactions to test the LIMIT clause: |
| 209 | + |
| 210 | +```sql |
| 211 | +-- Session 1: Blocker |
| 212 | +BEGIN; |
| 213 | +SELECT * FROM lock_test_table WHERE id = 1 FOR UPDATE; |
| 214 | + |
| 215 | +-- Sessions 2-10: Multiple waiters |
| 216 | +-- Each in separate psql session |
| 217 | +BEGIN; |
| 218 | +SELECT * FROM lock_test_table WHERE id = 1 FOR UPDATE; |
| 219 | +``` |
| 220 | + |
| 221 | +## Related Files |
| 222 | + |
| 223 | +- `config/pgwatch-prometheus/metrics.yml` - Metric definition |
| 224 | +- `config/grafana/dashboards/Dashboard_13_Lock_waits.json` - Grafana dashboard |
| 225 | +- `workload_examples/lock_wait_test.sql` - Basic lock test SQL |
| 226 | + |
0 commit comments