diff --git a/docs/cli/clone.mdx b/docs/cli/clone.mdx index 52137c4..c0fb3a5 100644 --- a/docs/cli/clone.mdx +++ b/docs/cli/clone.mdx @@ -1,6 +1,6 @@ --- title: Clone Command -description: Command for cloning Xata databases and external PostgreSQL databases +description: "Commands for managing database streaming (logical replication) operations" --- The `clone` command helps you create a copy of your Xata database or clone an external PostgreSQL database into Xata. It supports data anonymization and advanced configuration for complex migration scenarios. @@ -9,7 +9,7 @@ The `clone` command helps you create a copy of your Xata database or clone an ex ### start -Snapshot performs a snapshot of the configured source Postgres database into the configured target. +Start performs a snapshot of the configured source Postgres database into the configured target. ```bash xata clone start [--source-url ] [--config ] [--log-level ] [--dump-file ] [--postgres-url ] [--profile] [--reset] [--tables ] [--target ] [--target-url ] [--organization ] [--project ] [--branch ] [--filter-tables ] [--validation-mode ] [--role ] [-h|--help] @@ -35,7 +35,7 @@ xata clone start [--source-url ] [--config ] [--log-level ] [- ### config -Automatically configure the transforms for the clone command. +Automatically configure the transformations for the clone command. ```bash xata clone config [--source-url ] [--mode ] [--validation-mode ] [--organization ] [--project ] [--branch ] [-h|--help] @@ -49,6 +49,33 @@ xata clone config [--source-url ] [--mode ] [--validation-mode - `--branch`: Branch ID (default: "") - `-h, --help`: Print help information and exit +### stream + +Start a continuous data stream from the configured source to the configured target using Postgres's logical replication. + +```bash +xata clone stream --source-url [--config ] [--log-level ] [--init] [--profile] [--replication-slot ] [--reset] [--snapshot-tables ] [--source ] [--target ] [--target-url ] [--organization ] [--project ] [--branch ] [--filter-tables ] [--validation-mode ] [--role ] [-h|--help] +``` + +- `--source-url`: The source URL of the database to stream from (required) +- `--config`: .env or .yaml config file to use with pgstream if any +- `--log-level`: Log level for pgstream (trace|debug|info|warn|error|fatal|panic, default: info) +- `--init`: Whether to initialize pgstream before starting replication +- `--profile`: Whether to expose a /debug/pprof endpoint on localhost:6060 +- `--replication-slot`: Name of the postgres replication slot for pgstream to connect to +- `--reset`: Whether to reset the target before snapshotting (only for postgres target) +- `--snapshot-tables`: List of tables to snapshot if initial snapshot is required, in the format `.`. If not specified, the schema `public` will be assumed. Wildcards are supported +- `--source`: Source type. One of postgres, kafka +- `--target`: Target type. One of postgres, opensearch, elasticsearch, kafka +- `--target-url`: Target URL +- `--organization`: Organization ID +- `--project`: Project ID +- `--branch`: Branch ID +- `--filter-tables`: Tables to filter (default: _._) +- `--validation-mode`: Anonymization validation mode, strict implies that all tables and columns should be specified (strict|relaxed|prompt, default: prompt) +- `--role`: Postgres role to use for streaming (it should have atleast REPLICATION privilege) +- `-h, --help`: Print help information and exit + ## Global Flags - `-h, --help` - Print help information and exit diff --git a/docs/cli/stream.mdx b/docs/cli/stream.mdx new file mode 100644 index 0000000..9180ac7 --- /dev/null +++ b/docs/cli/stream.mdx @@ -0,0 +1,28 @@ +--- +title: Stream Command +description: Commands for managing logical streaming replication operations +--- + +The stream command helps you manage database streaming operations with `pgstream`, using logical replication. + +## Subcommands + +### destroy + +Destroy any pgstream setup, removing the replication slot and all the relevant tables/functions/triggers, along with the internal pgstream schema. + +```bash +xata stream destroy --source-url [--config ] [--log-level ] [--postgres-url ] [--replication-slot ] [-h|--help] +``` + +- `--source-url`: The source URL of the database to clone (required) +- `--config`: .env or .yaml config file to use with pgstream if any +- `--log-level`: Log level for pgstream (trace|debug|info|warn|error|fatal|panic, default: info) +- `--postgres-url`: Source postgres URL where pgstream destroy will be run +- `--replication-slot`: Name of the postgres replication slot to be deleted by pgstream from the source url +- `-h, --help`: Print help information and exit + +## Global Flags + +- `-h, --help` - Print help information and exit +- `--json` - Output in JSON format diff --git a/docs/config.json b/docs/config.json index b6fe2e8..1348f78 100644 --- a/docs/config.json +++ b/docs/config.json @@ -52,6 +52,11 @@ "href": "/tutorials/create-staging-replica", "file": "docs/tutorials/create-staging-replica.mdx" }, + { + "title": "Set up streaming replication", + "href": "/tutorials/streaming-replication", + "file": "docs/tutorials/streaming-replication.mdx" + }, { "title": "Schema changes", "href": "/tutorials/schema-change", @@ -288,6 +293,11 @@ "href": "/cli/status", "file": "docs/cli/status.mdx" }, + { + "title": "stream", + "href": "/cli/stream", + "file": "docs/cli/stream.mdx" + }, { "title": "upgrade", "href": "/cli/upgrade", diff --git a/docs/tutorials/streaming-replication.mdx b/docs/tutorials/streaming-replication.mdx new file mode 100644 index 0000000..19c00e7 --- /dev/null +++ b/docs/tutorials/streaming-replication.mdx @@ -0,0 +1,162 @@ +--- +title: Set up a logical streaming replica +description: Use Xata's streaming replication to keep your database continuously synchronized with real-time changes. +--- + +This guide shows you how to set up continuous logical streaming replication from your production PostgreSQL database to Xata, enabling real-time data synchronization with optional anonymization. + +![Setting up streaming replication to Xata](assets/images/xata-streaming-replication.png) + +## 1. Prerequisites + +- A Xata account ([sign up here](https://console.xata.io)) +- The [Xata CLI](/cli) installed: + ```bash + curl -fsSL https://xata.io/install.sh | bash + ``` +- A PostgreSQL database with: + - Logical replication enabled + - Role with permissions to create a replication slow (`xata clone stream` command does that automatically) + - Network connectivity from Xata to your database + +## 2. Enable logical replication on source database + +First, ensure your source PostgreSQL database has logical replication enabled. You'll need to set these parameters: + +```sql +-- Check current settings +SHOW wal_level; +SHOW max_replication_slots; +SHOW max_wal_senders; +``` + +If not already configured, update your PostgreSQL configuration: + +```sql +ALTER SYSTEM SET wal_level = logical; +ALTER SYSTEM SET max_replication_slots = 10; +ALTER SYSTEM SET max_wal_senders = 10; +``` + +Restart your PostgreSQL instance for the changes to take effect. + +## 3. Create a Xata project and branch + +In the Console, create a new project and then click the **Create main branch** button to create the PostgreSQL instance. + +For streaming replication, consider using at least 1 replica to ensure high availability during continuous synchronization. Select an instance size that can handle your expected write throughput. + +> **Note:** Streaming replication maintains a persistent connection to your source database. Ensure your network allows stable, long-lived connections between Xata and your PostgreSQL instance. + +## 4. Configure the Xata CLI + +Authenticate the CLI by running: + +```sh +xata auth login +``` + +Initialize the project by running: + +```sh +xata init +``` + +## 5. Configure streaming replication + +Generate a configuration for the streaming process: + +```bash +xata clone config --source-url $CONN_STRING +``` + +Where `CONN_STRING` is your PostgreSQL connection string with replication permissions. + +The configuration prompt will ask you to: + +- Select tables to replicate +- Set up transformation pipelines i.e. anonymization rules + +This creates a configuration file at `.xata/clone.yaml` that you can further customize. + +## 6. Initialize and start streaming + +```bash +xata clone stream --source-url $CONN_STRING +``` + +This command will: + +- Create an initial snapshot of your specified tables +- Set up the streaming pipeline +- Begin continuous replication + +## 7. Advanced configuration + +### Filtering specific tables + +To stream only specific tables, use the `--filter-tables` flag: + +```bash +xata clone stream --source-url $CONN_STRING \ + --filter-tables "users.*,orders.*,products.*" +``` + +If this option is not specified it defaults to `*.*` + +### Custom transformations + +Edit your `.xata/clone.yaml` file to add custom transformations: + +```yaml +transforms: + - table: users + columns: + - name: email + transformer: mask_email + - name: phone + transformer: redact + - table: orders + columns: + - name: credit_card + transformer: mask_credit_card +``` + +### Running with Docker + +For production deployments, consider running the streaming process in a containerized environment: + +```bash +docker run -d \ + --name xata-stream \ + --restart unless-stopped \ + -v $(pwd)/.xata:/config \ + xata/cli clone stream \ + --source-url $CONN_STRING +``` + +## 10. Handling failures and recovery + +If the streaming connection is interrupted, the replication slot ensures no data is lost. Simply restart the streaming command: + +```bash +xata clone stream --source-url $CONN_STRING +``` + +The process will resume from where it left off, catching up with any changes that occurred during the downtime. +However, if the too much lag accumulates then the Postgres server might slow down as it has to do both catching up on the lag and its normal operations. + +If you terminate the `xata clone stream` process and do not wish to run streaming replication again, clean up the replication slot and +other `pgstream` objects using `xata stream destroy` command. + +Not cleaning up the replication slot will cause the WAL to be aggregated continuously and that would lead to full disk space. Use options like `max_slot_wal_keep_size` +to keep the max WAL size in check. + +## Summary + +- You now have real-time streaming replication (Postgres's logical replication) from your PostgreSQL database to Xata +- Changes in your source database are automatically synchronized +- Your data can be anonymized in transit using configurable transformers +- The replication slot ensures no data loss during network interruptions + +For more details on advanced streaming configurations and monitoring, see the [clone command documentation](/cli/clone).