These scripts migrate data from PostgreSQL tables to ClickHouse tables with:
- Incremental batch processing
- Resume capability on errors
- Progress tracking
- Schema validation
PostgreSQL (tutorial database) → ClickHouse (wsprdaemon database):
wsprdaemon_spots→wsprdaemon.spots_extendedwsprdaemon_noise→wsprdaemon.noise
- test_pg_to_ch_migration.sh - Test migration with a few rows
- migrate_pg_to_ch.sh - Full incremental migration (single table)
- migrate_all_tables.sh - Migrate both tables sequentially
Run this first to validate schemas and test with 10 rows:
chmod +x test_pg_to_ch_migration.sh
./test_pg_to_ch_migration.shThis will:
- Check PostgreSQL and ClickHouse table schemas
- Compare column names and types
- Export 10 test rows from PostgreSQL
- Import to temporary ClickHouse test table
- Verify data integrity
- Report if migration is ready
Only proceed if tests pass!
After successful testing, migrate the tables:
chmod +x migrate_all_tables.sh
./migrate_all_tables.shchmod +x migrate_pg_to_ch.sh
# Migrate wsprdaemon_spots to spots_extended
./migrate_pg_to_ch.sh wsprdaemon_spots spots_extended
# Migrate wsprdaemon_noise to noise
./migrate_pg_to_ch.sh wsprdaemon_noise noiseWatch the migration in real-time:
tail -f /tmp/pg_to_ch_migration.logCheck progress files:
cat /tmp/pg_to_ch_progress/wsprdaemon_spots.offset
cat /tmp/pg_to_ch_progress/wsprdaemon_noise.offsetEnvironment variables (with defaults):
# Batch size (rows per batch)
BATCH_SIZE=10000
# PostgreSQL connection
PG_HOST=localhost
PG_USER=wdread
PG_PASSWORD=JTWSPR2008
PG_DB=tutorial
# ClickHouse connection
CH_HOST=localhost
CH_USER=default
CH_DB=wsprdaemon
# Progress tracking
PROGRESS_DIR=/tmp/pg_to_ch_progress
LOG_FILE=/tmp/pg_to_ch_migration.logExample with custom settings:
BATCH_SIZE=5000 PG_HOST=wd1 ./migrate_pg_to_ch.sh spots_extendedIf migration fails or is interrupted:
- Fix the issue (network, disk space, etc.)
- Simply run the same command again
- Migration will resume from last saved offset
# Will automatically resume from where it stopped
./migrate_pg_to_ch.sh wsprdaemon_spots spots_extendedProgress is saved after each batch in:
/tmp/pg_to_ch_progress/wsprdaemon_spots.offset/tmp/pg_to_ch_progress/wsprdaemon_noise.offset
To start over from scratch:
# Remove progress files
rm -f /tmp/pg_to_ch_progress/wsprdaemon_spots.*
rm -f /tmp/pg_to_ch_progress/wsprdaemon_noise.*
# Then re-run migration
./migrate_pg_to_ch.sh wsprdaemon_spots spots_extendedAfter migration, verify row counts:
# PostgreSQL
psql -U wdread -d tutorial -c "SELECT COUNT(*) FROM wsprdaemon_spots;"
psql -U wdread -d tutorial -c "SELECT COUNT(*) FROM wsprdaemon_noise;"
# ClickHouse
clickhouse-client --query="SELECT COUNT(*) FROM wsprdaemon.spots_extended"
clickhouse-client --query="SELECT COUNT(*) FROM wsprdaemon.noise"If test shows column name differences:
- Check PostgreSQL schema:
\d table_namein psql - Check ClickHouse schema:
DESCRIBE TABLE wsprdaemon.table_name - Ensure columns match exactly
If ClickHouse import fails:
- Check data types are compatible
- Look for NULL values if columns are NOT NULL
- Check for special characters in data
- Review
/tmp/pg_to_ch_migration.log
- Increase
BATCH_SIZEfor faster migration (but more memory) - Decrease
BATCH_SIZEif hitting memory limits - Default 10,000 rows per batch is usually good
Monitor temp directory:
df -h /tmpClean up if needed:
rm -f /tmp/pg_to_ch_temp/*Approximate migration times (depends on data size and system):
- 1 million rows: ~5-15 minutes
- 10 million rows: ~30-90 minutes
- 100 million rows: ~5-15 hours
Rate typically: 1,000-10,000 rows/second
- Migration log:
/tmp/pg_to_ch_migration.log - Test log:
/tmp/pg_to_ch_test.log - Progress:
/tmp/pg_to_ch_progress/*.offset - Completion markers:
/tmp/pg_to_ch_progress/*.complete
- No data loss: Only reads from PostgreSQL, never deletes
- Incremental: Processes in small batches
- Resumable: Saves progress after each batch
- Verified: Compares row counts before marking complete
- Logged: All actions logged with timestamps