Former primary stuck at 1/2 after failover due to barman-cloud-check-wal-archive "Expected empty archive"

### Description

After a failover/switchover on a running cluster (bootstrapped with `initdb`, **not** a recovery/restore), the `plugin-barman-cloud` sidecar on replica pods keeps failing with `"WAL archive check failed: Expected empty archive"`. This prevents the replica from becoming fully ready (1/2 containers).

The root cause is that `barman-cloud-check-wal-archive` is executed on **every WAL Archive gRPC call**, including on **replica pods**, and it fails because the S3 bucket already contains WAL files from previous timelines (which is expected after a switchover).

### Environment

- **CloudNativePG operator**: v1.28.1
- **plugin-barman-cloud**: v0.11.0
- **PostgreSQL**: 18.3 (`ghcr.io/cloudnative-pg/postgresql:18.3`)
- **Kubernetes**: v1.31
- **Object storage**: Ceph RGW (S3-compatible, via Rook)

### Cluster configuration

The cluster is bootstrapped with `initdb` (no recovery, no externalClusters):

```yaml
spec:
  instances: 3
  bootstrap:
    initdb:
      database: system
      encoding: UTF8
      owner: app
  plugins:
    - name: barman-cloud.cloudnative-pg.io
      enabled: true
      isWALArchiver: true
      parameters:
        barmanObjectName: my-object-store
```

### Steps to reproduce

1. Create a 3-instance CNPG cluster with `initdb` bootstrap and `plugin-barman-cloud` with `isWALArchiver: true`
2. Wait for all 3 pods to be 2/2 Running
3. A failover or switchover occurs (automatic or manual), changing the timeline (e.g., timeline 1 → 2 → 3)
4. After the failover, one or more replica pods get stuck at 1/2 — the `plugin-barman-cloud` sidecar blocks WAL archiving

### What happens

After the switchover, the former primary is demoted to replica. The instance manager detects leftover WAL files and explicitly triggers archiving on the demoted pod:

```
{"msg":"Detected ready WAL files in a former primary, triggering WAL archiving"}
```

This causes the `plugin-barman-cloud` sidecar to attempt WAL archiving. However, the S3 bucket legitimately contains WAL files from previous timelines (archived by this same pod when it was primary). The plugin executes `barman-cloud-check-wal-archive`, finds the bucket is not empty, and fails:

```
barman-cloud-check-wal-archive: ERROR: WAL archive check failed for server <cluster-name>: Expected empty archive
```

plugin-barman-cloud sidecar log (replica pod):

```
{"level":"info","msg":"barman-cloud-check-wal-archive checking the first wal"}
{"level":"info","logger":"barman-cloud-check-wal-archive","msg":"ERROR: WAL archive check failed for server <name>: Expected empty archive","pipe":"stderr"}
{"level":"error","msg":"Error invoking barman-cloud-check-wal-archive",
  "options":["--endpoint-url","http://<ceph-rgw>","--cloud-provider","aws-s3","s3://<bucket>","<server-name>"],
  "exitCode":-1,"error":"exit status 1"}
```

### Expected behavior
All the cluster pods are correctly running after a switchover/failover.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Former primary stuck at 1/2 after failover due to barman-cloud-check-wal-archive "Expected empty archive" #828

Description

Environment

Cluster configuration

Steps to reproduce

What happens

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Former primary stuck at 1/2 after failover due to barman-cloud-check-wal-archive "Expected empty archive" #828

Description

Description

Environment

Cluster configuration

Steps to reproduce

What happens

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions