diff --git a/pgcopydb-helpers/AGENTS.md b/pgcopydb-helpers/AGENTS.md index 42ae5ac..9fefb35 100644 --- a/pgcopydb-helpers/AGENTS.md +++ b/pgcopydb-helpers/AGENTS.md @@ -320,6 +320,38 @@ psql "$PGCOPYDB_SOURCE_PGURI" -t -A -c "SELECT pg_current_wal_lsn();" --- +### Post-Migration + +#### `compare-bloat.sh` + +Compares database bloat between source and target by breaking down table heap, TOAST, and index sizes. Uses only catalog queries — no table scans, no writes, no locks, safe for production. Outputs per-table and per-index size comparisons with reduction percentages, plus an overall bloat reduction summary. + +```bash +~/compare-bloat.sh +~/compare-bloat.sh --min-size-mb 500 --top-indexes 30 +``` + +**Output includes:** +- Database overview with total size comparison +- Per-table breakdown of heap, TOAST, and index sizes with reduction percentages +- Top N indexes ranked by absolute size reduction +- Summary table with component-level and total bloat reduction + +**Options:** + +| Flag | Default | Description | +|------|---------|-------------| +| `--min-size-mb ` | 100 | Only include tables larger than N MB on the source | +| `--top-indexes ` | 20 | Number of indexes to show in the top-indexes-by-reduction section | + +**When to use:** After the migration is complete, to quantify how much bloat was eliminated by the fresh copy. + +**Requires:** `PGCOPYDB_SOURCE_PGURI`, `PGCOPYDB_TARGET_PGURI` + +**Read-only** — makes no modifications to either database. + +--- + ## Troubleshooting with the Migration Log The migration log (`~/migration_*/migration.log`) is the single most valuable troubleshooting artifact. It contains the full pgcopydb output including: diff --git a/pgcopydb-helpers/README.md b/pgcopydb-helpers/README.md index 3b85e55..12d7863 100644 --- a/pgcopydb-helpers/README.md +++ b/pgcopydb-helpers/README.md @@ -195,7 +195,29 @@ When `check-cdc-status.sh` reports **"CDC IS CAUGHT UP"** (apply backlog < 100 M 6. **Switch** your application to the PlanetScale target. -### 5. Clean Up +### 5. Post-Migration (Optional) + +These steps are optional and can be run after the migration is complete and traffic has been switched. + +**Compare bloat reduction** between source and target to see how much space was reclaimed by the fresh copy. This breaks down heap, TOAST, and index sizes per table and reports overall reduction: + +```bash +~/compare-bloat.sh # tables > 100 MB (default) +~/compare-bloat.sh --min-size-mb 500 # only tables > 500 MB +``` + +The script uses catalog queries only — no table scans, no writes, no locks — and is safe to run against production databases. + +**Verify migration data** by comparing schema, row counts, sequences, and data spot-checks between source and target. Safe for multi-TB databases — uses catalog statistics and index seeks instead of full table scans: + +```bash +~/verify_migration.sh +~/verify_migration.sh --row-count-tolerance 1 --exact-count-tables 20 +``` + +The script runs 12 checks and reports PASS/WARN/FAIL for each. Run it multiple times for higher confidence — exact-count tables are chosen randomly, so repeated runs cover more of the database. See `verify_migration.sh --help` for all options. + +### 6. Clean Up After the migration is complete (or abandoned), clean up replication artifacts: @@ -389,6 +411,8 @@ sqlite3 ~/migration_*/schema/filter.db "SELECT COUNT(*) FROM s_depend;" | `target-clean.sh` | Recovery | Wipe target database for re-migration (prompts for confirmation) | | `drop-replication-slots.sh` | Cleanup | Remove replication slots and origins | | `stop_cdc.sh` | Cutover | Set CDC endpoint via SQLite to initiate cutover | +| `compare-bloat.sh` | Post-migration | Compare heap, TOAST, and index bloat between source and target | +| `verify_migration.sh` | Post-migration | Verify schema, row counts, sequences, and data between source and target | ## Critical Warnings diff --git a/pgcopydb-helpers/compare-bloat.sh b/pgcopydb-helpers/compare-bloat.sh new file mode 100755 index 0000000..fb6bd85 --- /dev/null +++ b/pgcopydb-helpers/compare-bloat.sh @@ -0,0 +1,294 @@ +#!/bin/bash +# +# compare-bloat.sh — Compare database bloat between SOURCE and TARGET +# +# Run on the migration instance where both databases are accessible. +# Reads connection strings from ~/.env (PGCOPYDB_SOURCE_PGURI, PGCOPYDB_TARGET_PGURI) +# +# Compares table heap, TOAST, and index sizes between source and target to +# quantify bloat reduction after migration. Uses only catalog queries — +# no table scans, no writes, no locks, safe for production. +# +# Usage: ./compare-bloat.sh [--min-size-mb N] [--top-indexes N] +# +set -euo pipefail + +# --- Load environment --- +set +u +set -a +source ~/.env +set +a +set -u + +if [ -z "${PGCOPYDB_SOURCE_PGURI:-}" ] || [ -z "${PGCOPYDB_TARGET_PGURI:-}" ]; then + echo "ERROR: PGCOPYDB_SOURCE_PGURI and PGCOPYDB_TARGET_PGURI must be set in ~/.env" + exit 1 +fi +# --- loaded --- + +# --- Colors --- +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +BOLD='\033[1m' +NC='\033[0m' + +# --- Config (overridable via flags) --- +MIN_TABLE_SIZE_MB=100 +TOP_INDEX_COUNT=20 + +while [[ $# -gt 0 ]]; do + case "$1" in + --min-size-mb) MIN_TABLE_SIZE_MB="$2"; shift 2 ;; + --top-indexes) TOP_INDEX_COUNT="$2"; shift 2 ;; + *) echo "Unknown option: $1"; exit 1 ;; + esac +done + +MIN_TABLE_SIZE_BYTES=$((MIN_TABLE_SIZE_MB * 1024 * 1024)) + +# --- Helper functions --- +src_query() { + psql "$PGCOPYDB_SOURCE_PGURI" -t -A -F'|' -c "$1" 2>/dev/null || echo "" +} + +tgt_query() { + psql "$PGCOPYDB_TARGET_PGURI" -t -A -F'|' -c "$1" 2>/dev/null || echo "" +} + +human_size() { + local bytes="${1:-0}" + if [ "$bytes" -eq 0 ] 2>/dev/null; then + echo "0 B" + return + fi + numfmt --to=iec-i --suffix=B "$bytes" 2>/dev/null || echo "${bytes} B" +} + +pct() { + local num="${1:-0}" + local den="${2:-0}" + if [ "$den" -eq 0 ] 2>/dev/null; then + echo "—" + else + echo "$((num * 100 / den))%" + fi +} + +# ══════════════════════════════════════════════════════════════════ +NOW=$(date -u '+%Y-%m-%d %H:%M:%S UTC') + +echo "" +echo "══════════════════════════════════════════════════════════════════" +echo " Database Bloat Comparison — $NOW" +echo "══════════════════════════════════════════════════════════════════" + +# --- Section 1: Database Overview --- +echo "" +echo " DATABASE OVERVIEW" +echo " ────────────────────────────────────────────────────────────────" + +SRC_VER=$(psql "$PGCOPYDB_SOURCE_PGURI" -t -A -c "SHOW server_version;" 2>/dev/null || echo "unknown") +TGT_VER=$(psql "$PGCOPYDB_TARGET_PGURI" -t -A -c "SHOW server_version;" 2>/dev/null || echo "unknown") + +SRC_DB_SIZE=$(psql "$PGCOPYDB_SOURCE_PGURI" -t -A -c "SELECT pg_database_size(current_database());" 2>/dev/null || echo "0") +TGT_DB_SIZE=$(psql "$PGCOPYDB_TARGET_PGURI" -t -A -c "SELECT pg_database_size(current_database());" 2>/dev/null || echo "0") + +DB_DIFF=$((SRC_DB_SIZE - TGT_DB_SIZE)) +DB_PCT=$(pct "$DB_DIFF" "$SRC_DB_SIZE") + +echo "" +printf " %-12s %-20s %s\n" "" "SOURCE" "TARGET" +printf " %-12s %-20s %s\n" "Version" "PostgreSQL $SRC_VER" "PostgreSQL $TGT_VER" +printf " %-12s %-20s %s\n" "Total size" "$(human_size "$SRC_DB_SIZE")" "$(human_size "$TGT_DB_SIZE")" +echo "" +echo -e " ${BOLD}Size reduction: $(human_size "$DB_DIFF") ($DB_PCT)${NC}" + +# --- Section 2: Per-Table Comparison --- +echo "" +echo "" +echo " PER-TABLE COMPARISON (tables > ${MIN_TABLE_SIZE_MB} MB on source)" +echo " ────────────────────────────────────────────────────────────────" + +TABLE_QUERY=" +SELECT + n.nspname || '.' || c.relname, + pg_relation_size(c.oid, 'main'), + COALESCE(pg_relation_size(c.reltoastrelid), 0), + pg_indexes_size(c.oid), + c.reltuples::bigint +FROM pg_class c +JOIN pg_namespace n ON n.oid = c.relnamespace +WHERE c.relkind = 'r' + AND n.nspname NOT IN ('pg_catalog', 'information_schema', 'pg_toast') + AND pg_relation_size(c.oid, 'main') > ${MIN_TABLE_SIZE_BYTES} +ORDER BY pg_total_relation_size(c.oid) DESC; +" + +SRC_TABLES=$(src_query "$TABLE_QUERY") +TGT_TABLES=$(tgt_query "$TABLE_QUERY") + +# Parse target into associative arrays +declare -A TGT_HEAP TGT_TOAST TGT_IDX TGT_ROWS +while IFS='|' read -r name heap toast idx rows; do + [ -z "$name" ] && continue + TGT_HEAP["$name"]="$heap" + TGT_TOAST["$name"]="$toast" + TGT_IDX["$name"]="$idx" + TGT_ROWS["$name"]="$rows" +done <<< "$TGT_TABLES" + +# Accumulators for summary +TOTAL_SRC_HEAP=0 +TOTAL_TGT_HEAP=0 +TOTAL_SRC_TOAST=0 +TOTAL_TGT_TOAST=0 +TOTAL_SRC_IDX=0 +TOTAL_TGT_IDX=0 +TABLE_COUNT=0 + +echo "" +printf " ${BOLD}%-40s %10s %10s %6s %10s %10s %6s %10s %10s %6s${NC}\n" \ + "Table" "Src Heap" "Tgt Heap" "Heap%" "Src TOAST" "Tgt TOAST" "TOAST%" "Src Idx" "Tgt Idx" "Idx%" +printf " %-40s %10s %10s %6s %10s %10s %6s %10s %10s %6s\n" \ + "────────────────────────────────────────" "──────────" "──────────" "──────" "──────────" "──────────" "──────" "──────────" "──────────" "──────" + +while IFS='|' read -r name src_heap src_toast src_idx src_rows; do + [ -z "$name" ] && continue + + tgt_heap="${TGT_HEAP[$name]:-0}" + tgt_toast="${TGT_TOAST[$name]:-0}" + tgt_idx="${TGT_IDX[$name]:-0}" + + TOTAL_SRC_HEAP=$((TOTAL_SRC_HEAP + src_heap)) + TOTAL_TGT_HEAP=$((TOTAL_TGT_HEAP + tgt_heap)) + TOTAL_SRC_TOAST=$((TOTAL_SRC_TOAST + src_toast)) + TOTAL_TGT_TOAST=$((TOTAL_TGT_TOAST + tgt_toast)) + TOTAL_SRC_IDX=$((TOTAL_SRC_IDX + src_idx)) + TOTAL_TGT_IDX=$((TOTAL_TGT_IDX + tgt_idx)) + TABLE_COUNT=$((TABLE_COUNT + 1)) + + # Truncate long table names + display_name="$name" + if [ ${#display_name} -gt 40 ]; then + display_name="${display_name:0:37}..." + fi + + heap_diff=$((src_heap - tgt_heap)) + toast_diff=$((src_toast - tgt_toast)) + idx_diff=$((src_idx - tgt_idx)) + + printf " %-40s %10s %10s %6s %10s %10s %6s %10s %10s %6s\n" \ + "$display_name" \ + "$(human_size "$src_heap")" "$(human_size "$tgt_heap")" "$(pct "$heap_diff" "$src_heap")" \ + "$(human_size "$src_toast")" "$(human_size "$tgt_toast")" "$(pct "$toast_diff" "$src_toast")" \ + "$(human_size "$src_idx")" "$(human_size "$tgt_idx")" "$(pct "$idx_diff" "$src_idx")" +done <<< "$SRC_TABLES" + +echo "" +HEAP_DIFF_TOTAL=$((TOTAL_SRC_HEAP - TOTAL_TGT_HEAP)) +TOAST_DIFF_TOTAL=$((TOTAL_SRC_TOAST - TOTAL_TGT_TOAST)) +IDX_DIFF_TOTAL=$((TOTAL_SRC_IDX - TOTAL_TGT_IDX)) + +printf " ${BOLD}%-40s %10s %10s %6s %10s %10s %6s %10s %10s %6s${NC}\n" \ + "TOTALS ($TABLE_COUNT tables)" \ + "$(human_size "$TOTAL_SRC_HEAP")" "$(human_size "$TOTAL_TGT_HEAP")" "$(pct "$HEAP_DIFF_TOTAL" "$TOTAL_SRC_HEAP")" \ + "$(human_size "$TOTAL_SRC_TOAST")" "$(human_size "$TOTAL_TGT_TOAST")" "$(pct "$TOAST_DIFF_TOTAL" "$TOTAL_SRC_TOAST")" \ + "$(human_size "$TOTAL_SRC_IDX")" "$(human_size "$TOTAL_TGT_IDX")" "$(pct "$IDX_DIFF_TOTAL" "$TOTAL_SRC_IDX")" + +# --- Section 3: Top Indexes by Size Difference --- +echo "" +echo "" +echo " TOP ${TOP_INDEX_COUNT} INDEXES BY SIZE DIFFERENCE" +echo " ────────────────────────────────────────────────────────────────" + +INDEX_QUERY=" +SELECT + n.nspname || '.' || ci.relname, + ct.relname, + pg_relation_size(ci.oid) +FROM pg_class ci +JOIN pg_index i ON i.indexrelid = ci.oid +JOIN pg_class ct ON ct.oid = i.indrelid +JOIN pg_namespace n ON n.oid = ci.relnamespace +WHERE ci.relkind = 'i' + AND n.nspname NOT IN ('pg_catalog', 'information_schema') +ORDER BY pg_relation_size(ci.oid) DESC; +" + +SRC_INDEXES=$(src_query "$INDEX_QUERY") +TGT_INDEXES=$(tgt_query "$INDEX_QUERY") + +# Parse target indexes +declare -A TGT_IDX_SIZE +while IFS='|' read -r idx_name tbl_name idx_size; do + [ -z "$idx_name" ] && continue + TGT_IDX_SIZE["$idx_name"]="$idx_size" +done <<< "$TGT_INDEXES" + +# Build array of (diff, name, src_size, tgt_size, table) and sort +declare -a IDX_DIFFS=() +while IFS='|' read -r idx_name tbl_name src_size; do + [ -z "$idx_name" ] && continue + tgt_size="${TGT_IDX_SIZE[$idx_name]:-0}" + diff=$((src_size - tgt_size)) + IDX_DIFFS+=("${diff}|${idx_name}|${src_size}|${tgt_size}|${tbl_name}") +done <<< "$SRC_INDEXES" + +# Sort by diff descending and take top N +SORTED_IDXS=$(printf '%s\n' "${IDX_DIFFS[@]}" | sort -t'|' -k1 -rn | head -n "$TOP_INDEX_COUNT") + +echo "" +printf " ${BOLD}%-50s %-20s %10s %10s %10s${NC}\n" \ + "Index" "Table" "Source" "Target" "Reduction" +printf " %-50s %-20s %10s %10s %10s\n" \ + "──────────────────────────────────────────────────" "────────────────────" "──────────" "──────────" "──────────" + +while IFS='|' read -r diff idx_name src_size tgt_size tbl_name; do + [ -z "$idx_name" ] && continue + + display_idx="$idx_name" + if [ ${#display_idx} -gt 50 ]; then + display_idx="${display_idx:0:47}..." + fi + display_tbl="$tbl_name" + if [ ${#display_tbl} -gt 20 ]; then + display_tbl="${display_tbl:0:17}..." + fi + + printf " %-50s %-20s %10s %10s %10s\n" \ + "$display_idx" "$display_tbl" \ + "$(human_size "$src_size")" "$(human_size "$tgt_size")" \ + "$(human_size "$diff")" +done <<< "$SORTED_IDXS" + +# --- Section 4: Summary --- +HEAP_DIFF=$((TOTAL_SRC_HEAP - TOTAL_TGT_HEAP)) +TOAST_DIFF=$((TOTAL_SRC_TOAST - TOTAL_TGT_TOAST)) +IDX_DIFF=$((TOTAL_SRC_IDX - TOTAL_TGT_IDX)) +TOTAL_DIFF=$((HEAP_DIFF + TOAST_DIFF + IDX_DIFF)) +TOTAL_SRC=$((TOTAL_SRC_HEAP + TOTAL_SRC_TOAST + TOTAL_SRC_IDX)) + +echo "" +echo "" +echo " ══════════════════════════════════════════════════════════════════" +echo -e " ${BOLD}BLOAT REDUCTION SUMMARY${NC} (tables > ${MIN_TABLE_SIZE_MB} MB)" +echo " ══════════════════════════════════════════════════════════════════" +echo "" +printf " %-20s %12s %12s %12s %8s\n" "Component" "Source" "Target" "Reduction" "Pct" +printf " %-20s %12s %12s %12s %8s\n" "────────────────────" "────────────" "────────────" "────────────" "────────" +printf " %-20s %12s %12s %12s %8s\n" \ + "Table heap" "$(human_size "$TOTAL_SRC_HEAP")" "$(human_size "$TOTAL_TGT_HEAP")" "$(human_size "$HEAP_DIFF")" "$(pct "$HEAP_DIFF" "$TOTAL_SRC_HEAP")" +printf " %-20s %12s %12s %12s %8s\n" \ + "TOAST data" "$(human_size "$TOTAL_SRC_TOAST")" "$(human_size "$TOTAL_TGT_TOAST")" "$(human_size "$TOAST_DIFF")" "$(pct "$TOAST_DIFF" "$TOTAL_SRC_TOAST")" +printf " %-20s %12s %12s %12s %8s\n" \ + "Indexes" "$(human_size "$TOTAL_SRC_IDX")" "$(human_size "$TOTAL_TGT_IDX")" "$(human_size "$IDX_DIFF")" "$(pct "$IDX_DIFF" "$TOTAL_SRC_IDX")" +printf " %-20s %12s %12s %12s %8s\n" \ + "────────────────────" "────────────" "────────────" "────────────" "────────" +printf " ${BOLD}%-20s %12s %12s %12s %8s${NC}\n" \ + "TOTAL" "$(human_size "$TOTAL_SRC")" "$(human_size "$((TOTAL_SRC - TOTAL_DIFF))")" "$(human_size "$TOTAL_DIFF")" "$(pct "$TOTAL_DIFF" "$TOTAL_SRC")" +echo "" +echo " Database-level: $(human_size "$SRC_DB_SIZE") → $(human_size "$TGT_DB_SIZE") ($(human_size "$DB_DIFF") / $DB_PCT reduction)" +echo "" +echo "══════════════════════════════════════════════════════════════════" +echo ""