Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
120 changes: 115 additions & 5 deletions website/docs/guides/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -173,15 +173,125 @@ The problem will resolve itself as client/proxy caches empty. You can force this

### WAL growth — why is my Postgres database storage filling up?

Electric creates a durable replication slot in Postgres to prevent data loss during downtime.
Electric creates a logical replication slot in Postgres to stream changes. This slot tracks a position in the Write-Ahead Log (WAL) and prevents Postgres from removing WAL segments that Electric hasn't yet processed. If the slot doesn't advance, WAL accumulates and consumes disk space.

During normal execution, Electric consumes the WAL file and keeps advancing `confirmed_flush_lsn`. However, if Electric is disconnected, the WAL file accumulates the changes that haven't been delivered to Electric.
#### Understanding replication slot status

##### Solution — Remove replication slot after Electric is gone
Run this query to check your replication slot's health:

If you're stopping Electric for the weekend, we recommend removing the `electric_slot_default` replication slot to prevent unbounded WAL growth. When Electric restarts, if it doesn't find the replication slot at resume point, it will recreate the replication slot and drop all shape logs.
```sql
SELECT
slot_name,
active,
wal_status,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained_wal,
pg_size_pretty(safe_wal_size) AS safe_wal_remaining,
restart_lsn,
confirmed_flush_lsn
FROM pg_replication_slots
WHERE slot_name LIKE 'electric%';
```

**Key columns:**

| Column | Meaning |
|--------|---------|
| `active` | `true` if Electric is currently connected |
| `wal_status` | Current WAL retention state (see below) |
| `retained_wal` | Total WAL size held by this slot |
| `confirmed_flush_lsn` | Last position Electric confirmed processing |

**Understanding `wal_status` values:**

| Status | Meaning | Action |
|--------|---------|--------|
| `reserved` | Normal — WAL is within `max_wal_size` | None required |
| `extended` | Warning — exceeded `max_wal_size` but protected by slot limits | Monitor closely |
| `unreserved` | Danger — WAL may be removed at next checkpoint | Urgent: slot will be invalidated |
| `lost` | Critical — required WAL was removed, slot is invalid | Must recreate slot |

#### Common causes and solutions

##### Electric is disconnected

When Electric isn't running, its replication slot remains but becomes inactive. WAL accumulates indefinitely until Electric reconnects or the slot is removed.

**Solution:** If stopping Electric for an extended period, remove the replication slot:

```sql
SELECT pg_drop_replication_slot('electric_slot_default');
```

When Electric restarts, it will recreate the slot and rebuild shape logs from scratch.

##### Slot is active but not advancing

If `active = true` but `confirmed_flush_lsn` isn't advancing, verify Electric is processing changes:

1. **Check for errors in Electric logs** — storage issues or database connectivity problems can prevent processing

2. **Verify shaped tables are in the publication:**
```sql
SELECT * FROM pg_publication_tables
WHERE pubname LIKE 'electric_publication%';
```

3. **Test that changes flow through** — make a change to a shaped table and check if `confirmed_flush_lsn` advances:
```sql
-- Note the current position
SELECT confirmed_flush_lsn FROM pg_replication_slots
WHERE slot_name = 'electric_slot_default';

-- Make a change to a table with an active shape
UPDATE your_shaped_table SET updated_at = now() WHERE id = 1;

-- After a few seconds, check if position advanced
SELECT confirmed_flush_lsn FROM pg_replication_slots
WHERE slot_name = 'electric_slot_default';
```

4. **Check Electric's storage** — if [`ELECTRIC_STORAGE_DIR`](/docs/api/config#electric-storage-dir) has disk space or permission issues, Electric can't flush data and won't acknowledge progress

##### High write volume

If your database has heavy write activity, there will always be some lag between writes and Electric's acknowledgment. This is normal, but you should configure limits to prevent unbounded growth.

**Solution:** Set `max_slot_wal_keep_size` to cap WAL retention:

```sql
-- Limit each slot to 10GB of WAL (adjust based on your needs)
ALTER SYSTEM SET max_slot_wal_keep_size = '10GB';
SELECT pg_reload_conf();
```

> [!WARNING]
> If a slot exceeds this limit, Postgres will invalidate it at the next checkpoint. Electric will detect this, drop all shapes, and recreate the slot. This is generally preferable to filling your disk.

#### Recommended PostgreSQL settings

| Setting | Recommended Value | Purpose |
|---------|-------------------|---------|
| `max_slot_wal_keep_size` | `10GB` - `50GB` | Prevents any single slot from causing unbounded WAL growth. Default is `-1` (unlimited). |
| `wal_keep_size` | `2GB` (RDS default) | Minimum WAL retained regardless of slots |

For AWS RDS, these can be set in your parameter group. Note that `max_slot_wal_keep_size` requires PostgreSQL 13+.

#### Monitoring replication health

Electric exposes metrics for monitoring replication slot health. If you have [Prometheus configured](/docs/api/config#electric-prometheus-port), watch these metrics:

- `electric.postgres.replication.slot_retained_wal_size` — bytes of WAL retained by the slot
- `electric.postgres.replication.slot_confirmed_flush_lsn_lag` — bytes between Electric's confirmed position and current WAL

Set alerts when retained WAL exceeds your threshold or when lag grows continuously.

#### Quick diagnostic checklist

You can also control the size of the WAL with [`wal_keep_size`](https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-WAL-KEEP-SIZE). On restart, Electric will detect if the WAL is past the resume point too.
1. **Is the slot active?** — `active = true` means Electric is connected
2. **Is `confirmed_flush_lsn` advancing?** — should increase after changes to shaped tables
3. **What's the `wal_status`?** — `reserved` is healthy, `extended` needs attention
4. **Is `max_slot_wal_keep_size` set?** — prevents unbounded growth (default is unlimited)
5. **Any errors in Electric logs?** — storage or connectivity issues prevent processing

### Database permissions — how do I configure PostgreSQL users for Electric?

Expand Down