🐘 PostgreSQL replication to PostgreSQL 🐘

Introduction
Environment Setup
Database Initialization
Prepare pgstream Configuration
- Listener
- Processor
Validate pgstream status
Run pgstream
Verify Replication
- Schema Changes
- Data Changes
Troubleshooting
Summary

Introduction

This tutorial demonstrates how to use pgstream to replicate data from one PostgreSQL database to another. It covers setting up the environment, configuring pgstream, and verifying the replication of both data and schema changes.

Requirements

A source PostgreSQL database
A target PostgreSQL database
pgstream (see installation instructions for more details)

Demo

tutorial_pg2pg_demo.mov

Youtube link here.

Environment setup

The first step is to start the two PostgreSQL databases that will be used as the source and target for replication. The pgstream repository provides a Docker setup for this purpose, but any PostgreSQL servers with wal2json installed can be used.

To start the docker provided PostgreSQL servers, run the following command:

docker-compose -f build/docker/docker-compose.yml --profile pg2pg up

This will start two PostgreSQL databases:

Source database on port 5432
Target database on port 7654

Database initialisation

Once both PostgreSQL servers are up and running, initialise pgstream on the source database. This step will create the pgstream schema in the configured Postgres database, along with the tables/functions/triggers required to keep track of the schema changes. See Tracking schema changes section for more details. It will also create a replication slot on the source database which will be used by the pgstream service.

The initialisation step allows to provide both the URL of the PostgreSQL database and the name of the replication slot to be created. The PostgreSQL URL is required, but the replication slot name is optional. If not provided, it will default to pgstream_<dbname>_slot, where <dbname> is the name of the PostgreSQL database. The configuration can be provided either via CLI parameters or environment variables.

For this tutorial, we'll create a replication slot with the name pgstream_tutorial_slot.

Using the --init flag in the run command
Using CLI parameters:

pgstream init --postgres-url "postgres://postgres:postgres@localhost:5432?sslmode=disable" --replication-slot pgstream_tutorial_slot

Using environment variables:

PGSTREAM_POSTGRES_REPLICATION_SLOT_NAME=pgstream_tutorial_slot \
PGSTREAM_POSTGRES_LISTENER_URL=postgres://postgres:postgres@localhost:5432?sslmode=disable \
pgstream init

After initialization, you should see the following message:

SUCCESS  pgstream initialisation complete

If at any point the initialisation performed by pgstream needs to be reverted, all state will be removed by running the destroy CLI command.

pgstream destroy --postgres-url "postgres://postgres:postgres@localhost:5432?sslmode=disable" --replication-slot pgstream_tutorial_slot

Prepare `pgstream` configuration

Listener

The listener reads changes from the source database's WAL. Configure it as follows:

PGSTREAM_POSTGRES_LISTENER_URL="postgres://postgres:postgres@localhost:5432?sslmode=disable"
PGSTREAM_POSTGRES_REPLICATION_SLOT_NAME=pgstream_tutorial_slot

If you want to perform an initial snapshot of existing tables, add:

# URL of the PostgreSQL database we want to snapshot
PGSTREAM_POSTGRES_SNAPSHOT_STORE_URL="postgres://postgres:postgres@localhost:5432?sslmode=disable"
# List of tables we want to snapshot. If the tables are not schema qualified, the public schema will be assumed.
# Wildcards are supported.
#
# The following example will snapshot all tables in the `test_schema` and the table `test` from the public schema.
PGSTREAM_POSTGRES_SNAPSHOT_TABLES="test_schema.* test"

Further configuration can be provided to optimize the performance of the snapshot process. For more information, check the snapshot tutorial.

Processor

The processor writes changes to the target database. The only required value is the URL of the target database. Configure it as follows:

PGSTREAM_POSTGRES_WRITER_TARGET_URL="postgres://postgres:postgres@localhost:7654?sslmode=disable"

If we need to disable triggers on the target database during the replication process(ie., to avoid foreign key constraint violations), we can do so by setting the following variable:

PGSTREAM_POSTGRES_WRITER_DISABLE_TRIGGERS=true

For further granularity, we can also configure the action that should be taken when there's a conflict on an insert.

# Insert on conflict action. Options are update, nothing or error (error is the default behaviour)
PGSTREAM_POSTGRES_WRITER_ON_CONFLICT_ACTION=nothing

The PostgreSQL writer uses batching under the hood to reduce the number of IO calls to the target database and improve performance. The batch size and send timeout can both be configured to be able to better fit the different traffic patterns. The writer will send a batch when the timeout or the batch size is reached, whichever happens first.

# Number of DML queries that will be batched and sent together in a given transaction. It defaults to 100.
PGSTREAM_POSTGRES_WRITER_BATCH_SIZE=25
# Max delay between batch sending. The batches will be sent every 5s by default.
PGSTREAM_POSTGRES_WRITER_BATCH_TIMEOUT=5s

For the PostgreSQL writer to keep track of DDL changes, it needs to keep track of the schema log. To enable this behaviour, an environment variable needs to be configured to point to the pgstream.schema_log store database. In this case, it will be the same as the source PostgreSQL database, since that's where we've initialised pgstream.

PGSTREAM_POSTGRES_WRITER_SCHEMALOG_STORE_URL="postgres://postgres:postgres@localhost:5432?sslmode=disable"

Save the configuration in a file named pg2pg_tutorial.env. An equivalent pg2pg_tutorial.yaml configuration can be found below the environment one, and can be used interchangeably.

# Listener config
PGSTREAM_POSTGRES_LISTENER_URL="postgres://postgres:postgres@localhost:5432?sslmode=disable"
PGSTREAM_POSTGRES_REPLICATION_SLOT_NAME=pgstream_tutorial_slot

# Processor config
PGSTREAM_POSTGRES_WRITER_TARGET_URL="postgres://postgres:postgres@localhost:7654?sslmode=disable"
PGSTREAM_POSTGRES_WRITER_BATCH_SIZE=25
PGSTREAM_POSTGRES_WRITER_BATCH_TIMEOUT=5s
PGSTREAM_POSTGRES_WRITER_SCHEMALOG_STORE_URL="postgres://postgres:postgres@localhost:5432?sslmode=disable"
PGSTREAM_POSTGRES_WRITER_DISABLE_TRIGGERS=true
PGSTREAM_POSTGRES_WRITER_ON_CONFLICT_ACTION=nothing

source:
  postgres:
    url: "postgres://postgres:postgres@localhost:5432?sslmode=disable"
    mode: replication # options are replication, snapshot or snapshot_and_replication
    replication:
      replication_slot: pgstream_tutorial_slot

target:
  postgres:
    url: "postgres://postgres:postgres@localhost:7654?sslmode=disable"
    batch:
      timeout: 5000 # batch timeout in milliseconds
      size: 25 # number of messages in a batch
    schema_log_store_url: "postgres://postgres:postgres@localhost:5432?sslmode=disable" # url to the postgres database where the schema log is stored to be used when performing schema change diffs
    disable_triggers: false # whether to disable triggers on the target database
    on_conflict_action: "nothing" # options are update, nothing or error

Validate `pgstream` status

We can validate that the initialisation and the configuration are valid by running the status command before starting pgstream.

# using yaml configuration file
./pgstream status -c pg2pg_tutorial.yaml
# using env configuration file
./pgstream status -c pg2pg_tutorial.env

SUCCESS  pgstream status check encountered no issues
Initialisation status:
 - Pgstream schema exists: true
 - Pgstream schema_log table exists: true
 - Migration current version: 7
 - Migration status: success
 - Replication slot name: pgstream_tutorial_slot
 - Replication slot plugin: wal2json
 - Replication slot database: postgres
Config status:
 - Valid: true
Transformation rules status:
 - Valid: true
Source status:
 - Reachable: true

Run `pgstream`

Run pgstream with the prepared configuration. In this case we set the log level as trace to provide more context for debugging and have more visibility into what pgstream is doing under the hood.

# with initialization
pgstream run -c pg2pg_tutorial.env --init --log-level trace

# with the environment configuration
pgstream run -c pg2pg_tutorial.env --log-level trace

# with the yaml configuration
pgstream run -c pg2pg_tutorial.yaml --log-level trace

# with the CLI flags and relying on defaults
pgstream run --source postgres --source-url "postgres://postgres:postgres@localhost:5432?sslmode=disable" --target postgres --target-url "postgres://postgres:postgres@localhost:7654?sslmode=disable" --log-level trace

Verify Replication

Schema Changes

Connect to the source database:

psql postgresql://postgres:postgres@localhost:5432/postgres

Create a table:

CREATE TABLE test(id SERIAL PRIMARY KEY, name TEXT);

Connect to the target database:

psql postgresql://postgres:postgres@localhost:7654/postgres

Verify the table was replicated:

 \d+ test
+--------+---------+-----------+----------+--------------+-------------+
| Column | Type    | Modifiers | Storage  | Stats target | Description |
|--------+---------+-----------+----------+--------------+-------------|
| id     | integer |  not null | plain    | <null>       | <null>      |
| name   | text    |           | extended | <null>       | <null>      |
+--------+---------+-----------+----------+--------------+-------------+

Similarly when performing other DDL operations, they should be properly replicated on the target database.

ALTER TABLE test RENAME TO tutorial_test;
ALTER TABLE tutorial_test ADD COLUMN age INT DEFAULT 0;
ALTER TABLE tutorial_test ALTER COLUMN age TYPE bigint;
ALTER TABLE tutorial_test RENAME COLUMN age TO new_age;
ALTER TABLE tutorial_test DROP COLUMN new_age;
DROP TABLE tutorial_test;

Data Changes

Insert data into the source database:

INSERT INTO test(name) VALUES('alice'),('bob'),('charlie');

Verify the data was replicated to the target database:
```
SELECT * FROM test;
```

Troubleshooting

Here are some common issues you might encounter while following this tutorial and how to resolve them:

1. Error: `Connection refused`

Cause: The PostgreSQL database is not running or the connection URL is incorrect.
Solution:
- Ensure the Docker containers for the source and target databases are running.
- Verify the database URLs in the configuration file (pg2pg_tutorial.env).
- Test the connection using psql:
```
psql postgresql://postgres:postgres@localhost:5432/postgres
```

2. Error: `Replication slot not found`

Cause: The replication slot was not created during initialization.
Solution:
- Reinitialize pgstream or manually create the replication slot.
- Run the pgstream status command to validate the initialisation was successful.
- Verify the replication slot exists by running:
```
SELECT slot_name FROM pg_replication_slots;
```

3. Error: `Target database does not contain replicated data`

Cause: The replication process did not complete successfully or the target database URL is incorrect.
Solution:
- Verify the target database URL in the configuration file.
- Check the pgstream logs to confirm the replication process completed without errors.
- Query the target database to ensure the data was replicated:
```
SELECT * FROM test;
```

4. Error: `Permission denied`

Cause: The database user does not have sufficient privileges.
Solution:
- Grant the required privileges to the database user:
```
GRANT ALL PRIVILEGES ON DATABASE postgres TO postgres;
```

5. Error: `Invalid configuration`

Cause: The configuration file contains invalid or missing values.
Solution:
- Double-check the pg2pg_tutorial.env file for typos or missing variables.
- Refer to the pgstream configuration documentation for details on required variables.
- Run the pgstream status command to validate the configuration is correct.

6. Error: `Snapshot failed`

Cause: The initial snapshot process encountered an issue.
Solution:
- Ensure the PGSTREAM_POSTGRES_SNAPSHOT_TABLES variable is set if a snapshot is required.
- Check the pgstream logs for detailed error messages:
```
pgstream run -c pg2pg_tutorial.env --log-level trace
```

If you encounter issues not listed here, consult the pgstream documentation or open an issue on the project's GitHub repository.

Summary

In this tutorial, we successfully configured pgstream to replicate data from a source PostgreSQL database to a target PostgreSQL database. We:

Set up the source and target PostgreSQL databases using Docker.
Initialized pgstream on the source database, creating the necessary schema and replication slot.
Configured the listener to capture changes from the source database's WAL.
Configured the processor to replicate changes to the target database.
Verified that both DML (data manipulation) and DDL (schema changes) were replicated correctly.

This tutorial demonstrates how pgstream can be used for real-time replication between PostgreSQL databases. For more advanced use cases, such as transformations or webhook integration, refer to the pgstream tutorials.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐘 PostgreSQL replication to PostgreSQL 🐘

Table of Contents

Introduction

Requirements

Demo

Environment setup

Database initialisation

Prepare `pgstream` configuration

Listener

Processor

Validate `pgstream` status

Run `pgstream`

Verify Replication

Schema Changes

Data Changes

Troubleshooting

1. Error: `Connection refused`

2. Error: `Replication slot not found`

3. Error: `Target database does not contain replicated data`

4. Error: `Permission denied`

5. Error: `Invalid configuration`

6. Error: `Snapshot failed`

Summary

FilesExpand file tree

postgres_to_postgres.md

Latest commit

History

postgres_to_postgres.md

File metadata and controls

🐘 PostgreSQL replication to PostgreSQL 🐘

Table of Contents

Introduction

Requirements

Demo

Environment setup

Database initialisation

Prepare pgstream configuration

Listener

Processor

Validate pgstream status

Run pgstream

Verify Replication

Schema Changes

Data Changes

Troubleshooting

1. Error: Connection refused

2. Error: Replication slot not found

3. Error: Target database does not contain replicated data

4. Error: Permission denied

5. Error: Invalid configuration

6. Error: Snapshot failed

Summary

Prepare `pgstream` configuration

Validate `pgstream` status

Run `pgstream`

1. Error: `Connection refused`

2. Error: `Replication slot not found`

3. Error: `Target database does not contain replicated data`

4. Error: `Permission denied`

5. Error: `Invalid configuration`

6. Error: `Snapshot failed`