diff --git a/pgcopydb-helpers/AGENTS.md b/pgcopydb-helpers/AGENTS.md index 036d747..7237744 100644 --- a/pgcopydb-helpers/AGENTS.md +++ b/pgcopydb-helpers/AGENTS.md @@ -201,9 +201,9 @@ Resumes a previously interrupted `pgcopydb clone --follow` migration. Backs up t ~/resume-migration.sh ~/migration_YYYYMMDD-HHMMSS # specify explicitly ``` -**Important:** This script intentionally does NOT use `--split-tables-larger-than` with `--resume`. pgcopydb truncates the entire table before checking split parts on resume, which causes data loss. +**Important:** The script passes `--split-tables-larger-than` to match `run-migration.sh`. pgcopydb requires catalog consistency — if the original run used split tables, the resume must pass the same value. -**When to use:** After pgcopydb crashes, the instance reboots, or the migration is interrupted. Do NOT use after a successful migration — use `run-migration.sh` to start fresh. +**When to use:** After pgcopydb crashes, the instance reboots, or the migration is interrupted. To start completely over instead, run `~/target-clean.sh` + `~/drop-replication-slots.sh` first, then `~/start-migration-screen.sh`. **Requires:** `PGCOPYDB_SOURCE_PGURI`, `PGCOPYDB_TARGET_PGURI`, existing migration directory @@ -396,12 +396,11 @@ All scripts use variables at the top that can be adjusted per migration. See [Cl | `TABLE_JOBS` | 16 | run-migration.sh, resume-migration.sh | | `INDEX_JOBS` | 12 | run-migration.sh, resume-migration.sh | | `FILTER_FILE` | ~/filters.ini | run-migration.sh, resume-migration.sh | -| `--split-tables-larger-than` | 50GB | run-migration.sh only (not resume) | +| `--split-tables-larger-than` | 50GB | run-migration.sh, resume-migration.sh | ## Critical Warnings -- **Never use `--split-tables-larger-than` with `--resume`** — pgcopydb truncates the entire table before checking parts, causing data loss. -- **Never use `pgcopydb --restart`** without backing up first — it wipes the CDC directory AND SQLite catalogs. +- **Do not use `pgcopydb --restart`** — it wipes the CDC directory and SQLite catalogs without cleaning the target database or correcting previous failures. To start over, use `~/target-clean.sh` + `~/drop-replication-slots.sh` + `~/start-migration-screen.sh` instead. - **Always clean up replication slots** after a migration — unconsumed slots cause WAL accumulation on the source. - **Verify extension filtering after STEP 1** — check `SELECT COUNT(*) FROM s_depend;` in `filter.db`. If it's 0, extension-owned objects in `public` won't be filtered. - **pg_restore error tolerance** — pgcopydb allows up to 10 restore errors by default. If your migration has more, you may need a custom build with a higher `MAX_TOLERATED_RESTORE_ERRORS`. diff --git a/pgcopydb-helpers/README.md b/pgcopydb-helpers/README.md index 3b85e55..1cadd89 100644 --- a/pgcopydb-helpers/README.md +++ b/pgcopydb-helpers/README.md @@ -215,7 +215,7 @@ If pgcopydb crashes, the instance reboots, or the migration is interrupted: ~/resume-migration.sh ~/migration_YYYYMMDD-HHMMSS # or specify explicitly ``` -This backs up the SQLite catalog before resuming. It uses `--not-consistent` to allow resuming from a mid-transaction state, and intentionally omits `--split-tables-larger-than` because pgcopydb truncates the entire table before checking split parts on resume, which causes data loss. +This backs up the SQLite catalog before resuming and uses `--not-consistent` to allow resuming from a mid-transaction state. The script passes `--split-tables-larger-than` to match `run-migration.sh` — pgcopydb requires catalog consistency, so the resume must use the same split value as the original run. To start completely over, wipe the target and clean up replication: @@ -392,7 +392,6 @@ sqlite3 ~/migration_*/schema/filter.db "SELECT COUNT(*) FROM s_depend;" ## Critical Warnings -- **Never use `--split-tables-larger-than` with `--resume`** — pgcopydb truncates the entire table before checking parts, causing data loss. -- **Never use `pgcopydb --restart`** without backing up first — it wipes the CDC directory AND SQLite catalogs. +- **Do not use `pgcopydb --restart`** — it wipes the CDC directory and SQLite catalogs without cleaning the target database or correcting previous failures. To start over, use `~/target-clean.sh` + `~/drop-replication-slots.sh` + `~/start-migration-screen.sh` instead. - **Always clean up replication slots** when done — unconsumed slots cause unbounded WAL growth on the source. - **Verify extension filtering after STEP 1** — if `s_depend` count is 0, extension-owned objects won't be excluded. diff --git a/pgcopydb-helpers/resume-migration.sh b/pgcopydb-helpers/resume-migration.sh index 2675efa..8bd2c90 100755 --- a/pgcopydb-helpers/resume-migration.sh +++ b/pgcopydb-helpers/resume-migration.sh @@ -5,8 +5,11 @@ # # Resumes a previously interrupted pgcopydb clone --follow migration. # If no directory is given, uses the most recent ~/migration_* directory. -# Backs up the SQLite catalog before resuming. Does NOT use -# --split-tables-larger-than (unsafe with --resume). +# Backs up the SQLite catalog before resuming. +# +# Uses --split-tables-larger-than to match run-migration.sh. pgcopydb +# requires catalog consistency — if the original run used split tables, +# the resume must pass the same value. # set -eo pipefail @@ -57,8 +60,6 @@ cp "$MIGRATION_DIR/schema/source.db" "$MIGRATION_DIR/schema/source.db.bak.$(date echo "Migration dir: $MIGRATION_DIR" echo "==========================================" - # NOTE: Do NOT use --split-tables-larger-than with --resume. - # pgcopydb truncates the entire table before checking parts, causing data loss. /usr/lib/postgresql/17/bin/pgcopydb clone \ --follow \ --plugin wal2json \ @@ -73,6 +74,8 @@ cp "$MIGRATION_DIR/schema/source.db" "$MIGRATION_DIR/schema/source.db.bak.$(date --skip-db-properties \ --table-jobs "$TABLE_JOBS" \ --index-jobs "$INDEX_JOBS" \ + --split-tables-larger-than 50GB \ + --split-max-parts "$TABLE_JOBS" \ --dir "$MIGRATION_DIR" EXIT_CODE=$?