- Introduction
- Environment Setup
- Database Initialization
- Prepare
pgstreamConfiguration - Validate
pgstreamstatus - Run
pgstream - Verify Replication
- Troubleshooting
- Summary
This tutorial demonstrates how to use pgstream to replicate data from one PostgreSQL database to another. It covers setting up the environment, configuring pgstream, and verifying the replication of both data and schema changes.
- A source PostgreSQL database
- A target PostgreSQL database
- pgstream (see installation instructions for more details)
tutorial_pg2pg_demo.mov
Youtube link here.
The first step is to start the two PostgreSQL databases that will be used as the source and target for replication. The pgstream repository provides a Docker setup for this purpose, but any PostgreSQL servers with wal2json installed can be used.
To start the docker provided PostgreSQL servers, run the following command:
docker-compose -f build/docker/docker-compose.yml --profile pg2pg upThis will start two PostgreSQL databases:
- Source database on port
5432 - Target database on port
7654
Once both PostgreSQL servers are up and running, initialise pgstream on the source database. This step will create the pgstream schema in the configured Postgres database, along with the tables/functions/triggers required to keep track of the schema changes. See Tracking schema changes section for more details. It will also create a replication slot on the source database which will be used by the pgstream service.
The initialisation step allows to provide both the URL of the PostgreSQL database and the name of the replication slot to be created. The PostgreSQL URL is required, but the replication slot name is optional. If not provided, it will default to pgstream_<dbname>_slot, where <dbname> is the name of the PostgreSQL database. The configuration can be provided either via CLI parameters or environment variables.
For this tutorial, we'll create a replication slot with the name pgstream_tutorial_slot.
-
Using the
--initflag in theruncommand -
Using CLI parameters:
pgstream init --postgres-url "postgres://postgres:postgres@localhost:5432?sslmode=disable" --replication-slot pgstream_tutorial_slot- Using environment variables:
PGSTREAM_POSTGRES_REPLICATION_SLOT_NAME=pgstream_tutorial_slot \
PGSTREAM_POSTGRES_LISTENER_URL=postgres://postgres:postgres@localhost:5432?sslmode=disable \
pgstream initAfter initialization, you should see the following message:
SUCCESS pgstream initialisation complete
If at any point the initialisation performed by pgstream needs to be reverted, all state will be removed by running the destroy CLI command.
pgstream destroy --postgres-url "postgres://postgres:postgres@localhost:5432?sslmode=disable" --replication-slot pgstream_tutorial_slotThe listener reads changes from the source database's WAL. Configure it as follows:
PGSTREAM_POSTGRES_LISTENER_URL="postgres://postgres:postgres@localhost:5432?sslmode=disable"
PGSTREAM_POSTGRES_REPLICATION_SLOT_NAME=pgstream_tutorial_slotIf you want to perform an initial snapshot of existing tables, add:
# URL of the PostgreSQL database we want to snapshot
PGSTREAM_POSTGRES_SNAPSHOT_STORE_URL="postgres://postgres:postgres@localhost:5432?sslmode=disable"
# List of tables we want to snapshot. If the tables are not schema qualified, the public schema will be assumed.
# Wildcards are supported.
#
# The following example will snapshot all tables in the `test_schema` and the table `test` from the public schema.
PGSTREAM_POSTGRES_SNAPSHOT_TABLES="test_schema.* test"Further configuration can be provided to optimize the performance of the snapshot process. For more information, check the snapshot tutorial.
The processor writes changes to the target database. The only required value is the URL of the target database. Configure it as follows:
PGSTREAM_POSTGRES_WRITER_TARGET_URL="postgres://postgres:postgres@localhost:7654?sslmode=disable"If we need to disable triggers on the target database during the replication process(ie., to avoid foreign key constraint violations), we can do so by setting the following variable:
PGSTREAM_POSTGRES_WRITER_DISABLE_TRIGGERS=trueFor further granularity, we can also configure the action that should be taken when there's a conflict on an insert.
# Insert on conflict action. Options are update, nothing or error (error is the default behaviour)
PGSTREAM_POSTGRES_WRITER_ON_CONFLICT_ACTION=nothingThe PostgreSQL writer uses batching under the hood to reduce the number of IO calls to the target database and improve performance. The batch size and send timeout can both be configured to be able to better fit the different traffic patterns. The writer will send a batch when the timeout or the batch size is reached, whichever happens first.
# Number of DML queries that will be batched and sent together in a given transaction. It defaults to 100.
PGSTREAM_POSTGRES_WRITER_BATCH_SIZE=25
# Max delay between batch sending. The batches will be sent every 5s by default.
PGSTREAM_POSTGRES_WRITER_BATCH_TIMEOUT=5sFor the PostgreSQL writer to keep track of DDL changes, it needs to keep track of the schema log. To enable this behaviour, an environment variable needs to be configured to point to the pgstream.schema_log store database. In this case, it will be the same as the source PostgreSQL database, since that's where we've initialised pgstream.
PGSTREAM_POSTGRES_WRITER_SCHEMALOG_STORE_URL="postgres://postgres:postgres@localhost:5432?sslmode=disable"Save the configuration in a file named pg2pg_tutorial.env. An equivalent pg2pg_tutorial.yaml configuration can be found below the environment one, and can be used interchangeably.
# Listener config
PGSTREAM_POSTGRES_LISTENER_URL="postgres://postgres:postgres@localhost:5432?sslmode=disable"
PGSTREAM_POSTGRES_REPLICATION_SLOT_NAME=pgstream_tutorial_slot
# Processor config
PGSTREAM_POSTGRES_WRITER_TARGET_URL="postgres://postgres:postgres@localhost:7654?sslmode=disable"
PGSTREAM_POSTGRES_WRITER_BATCH_SIZE=25
PGSTREAM_POSTGRES_WRITER_BATCH_TIMEOUT=5s
PGSTREAM_POSTGRES_WRITER_SCHEMALOG_STORE_URL="postgres://postgres:postgres@localhost:5432?sslmode=disable"
PGSTREAM_POSTGRES_WRITER_DISABLE_TRIGGERS=true
PGSTREAM_POSTGRES_WRITER_ON_CONFLICT_ACTION=nothingsource:
postgres:
url: "postgres://postgres:postgres@localhost:5432?sslmode=disable"
mode: replication # options are replication, snapshot or snapshot_and_replication
replication:
replication_slot: pgstream_tutorial_slot
target:
postgres:
url: "postgres://postgres:postgres@localhost:7654?sslmode=disable"
batch:
timeout: 5000 # batch timeout in milliseconds
size: 25 # number of messages in a batch
schema_log_store_url: "postgres://postgres:postgres@localhost:5432?sslmode=disable" # url to the postgres database where the schema log is stored to be used when performing schema change diffs
disable_triggers: false # whether to disable triggers on the target database
on_conflict_action: "nothing" # options are update, nothing or errorWe can validate that the initialisation and the configuration are valid by running the status command before starting pgstream.
# using yaml configuration file
./pgstream status -c pg2pg_tutorial.yaml
# using env configuration file
./pgstream status -c pg2pg_tutorial.envSUCCESS pgstream status check encountered no issues
Initialisation status:
- Pgstream schema exists: true
- Pgstream schema_log table exists: true
- Migration current version: 7
- Migration status: success
- Replication slot name: pgstream_tutorial_slot
- Replication slot plugin: wal2json
- Replication slot database: postgres
Config status:
- Valid: true
Transformation rules status:
- Valid: true
Source status:
- Reachable: trueRun pgstream with the prepared configuration. In this case we set the log level as trace to provide more context for debugging and have more visibility into what pgstream is doing under the hood.
# with initialization
pgstream run -c pg2pg_tutorial.env --init --log-level trace
# with the environment configuration
pgstream run -c pg2pg_tutorial.env --log-level trace
# with the yaml configuration
pgstream run -c pg2pg_tutorial.yaml --log-level trace
# with the CLI flags and relying on defaults
pgstream run --source postgres --source-url "postgres://postgres:postgres@localhost:5432?sslmode=disable" --target postgres --target-url "postgres://postgres:postgres@localhost:7654?sslmode=disable" --log-level trace-
Connect to the source database:
psql postgresql://postgres:postgres@localhost:5432/postgres
-
Create a table:
CREATE TABLE test(id SERIAL PRIMARY KEY, name TEXT);
-
Connect to the target database:
psql postgresql://postgres:postgres@localhost:7654/postgres
-
Verify the table was replicated:
\d+ test +--------+---------+-----------+----------+--------------+-------------+ | Column | Type | Modifiers | Storage | Stats target | Description | |--------+---------+-----------+----------+--------------+-------------| | id | integer | not null | plain | <null> | <null> | | name | text | | extended | <null> | <null> | +--------+---------+-----------+----------+--------------+-------------+
Similarly when performing other DDL operations, they should be properly replicated on the target database.
ALTER TABLE test RENAME TO tutorial_test;
ALTER TABLE tutorial_test ADD COLUMN age INT DEFAULT 0;
ALTER TABLE tutorial_test ALTER COLUMN age TYPE bigint;
ALTER TABLE tutorial_test RENAME COLUMN age TO new_age;
ALTER TABLE tutorial_test DROP COLUMN new_age;
DROP TABLE tutorial_test;-
Insert data into the source database:
INSERT INTO test(name) VALUES('alice'),('bob'),('charlie');
-
Verify the data was replicated to the target database:
SELECT * FROM test;
Here are some common issues you might encounter while following this tutorial and how to resolve them:
- Cause: The PostgreSQL database is not running or the connection URL is incorrect.
- Solution:
- Ensure the Docker containers for the source and target databases are running.
- Verify the database URLs in the configuration file (
pg2pg_tutorial.env). - Test the connection using
psql:psql postgresql://postgres:postgres@localhost:5432/postgres
- Cause: The replication slot was not created during initialization.
- Solution:
- Reinitialize
pgstreamor manually create the replication slot. - Run the
pgstream statuscommand to validate the initialisation was successful. - Verify the replication slot exists by running:
SELECT slot_name FROM pg_replication_slots;
- Reinitialize
- Cause: The replication process did not complete successfully or the target database URL is incorrect.
- Solution:
- Verify the target database URL in the configuration file.
- Check the
pgstreamlogs to confirm the replication process completed without errors. - Query the target database to ensure the data was replicated:
SELECT * FROM test;
- Cause: The database user does not have sufficient privileges.
- Solution:
- Grant the required privileges to the database user:
GRANT ALL PRIVILEGES ON DATABASE postgres TO postgres;
- Grant the required privileges to the database user:
- Cause: The configuration file contains invalid or missing values.
- Solution:
- Double-check the
pg2pg_tutorial.envfile for typos or missing variables. - Refer to the pgstream configuration documentation for details on required variables.
- Run the
pgstream statuscommand to validate the configuration is correct.
- Double-check the
- Cause: The initial snapshot process encountered an issue.
- Solution:
- Ensure the
PGSTREAM_POSTGRES_SNAPSHOT_TABLESvariable is set if a snapshot is required. - Check the
pgstreamlogs for detailed error messages:pgstream run -c pg2pg_tutorial.env --log-level trace
- Ensure the
If you encounter issues not listed here, consult the pgstream documentation or open an issue on the project's GitHub repository.
In this tutorial, we successfully configured pgstream to replicate data from a source PostgreSQL database to a target PostgreSQL database. We:
- Set up the source and target PostgreSQL databases using Docker.
- Initialized
pgstreamon the source database, creating the necessary schema and replication slot. - Configured the listener to capture changes from the source database's WAL.
- Configured the processor to replicate changes to the target database.
- Verified that both DML (data manipulation) and DDL (schema changes) were replicated correctly.
This tutorial demonstrates how pgstream can be used for real-time replication between PostgreSQL databases. For more advanced use cases, such as transformations or webhook integration, refer to the pgstream tutorials.