Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ Use this field only in case of
## Environment variables

!!!important

Environment variables reserved for operator usage (names starting with `PG` or
`CNP_`, plus `POD_NAME`, `NAMESPACE`, and `CLUSTER_NAME`) cannot be set
through the `env` and `envFrom` fields and are rejected at admission time.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -379,6 +379,7 @@ The operator manages most of the [configuration options for PgBouncer](https://w
allowing you to modify only a subset of them.

!!!warning

The operator passes these settings directly to PgBouncer without validation.
To prevent configuration errors or crash loops, ensure each parameter is
supported by your specific PgBouncer image version.
Expand Down Expand Up @@ -667,6 +668,42 @@ spec:
- port: metrics
```

### TLS for the Metrics Endpoint

Set `.spec.monitoring.tls.enabled: true` to serve the metrics endpoint over
HTTPS. By default, the cluster's server certificate is being used.
The certificate is reloaded on every TLS handshake, so rotations are
picked up without restarting the pod.

```yaml
spec:
monitoring:
tls:
enabled: true
```

When `.spec.pgbouncer.clientTLSSecret` is set, the metrics server presents
that certificate instead.

```yaml
spec:
pgbouncer:
clientTLSSecret:
name: <CLIENT_TLS_SECRET>
monitoring:
tls:
enabled: true
```

The generated `PodMonitor` scrapes with `insecureSkipVerify=true` because
Prometheus scrapes pods by IP and the certificate's SANs do not generally
cover the pod IP.

If you need strict verification, set `.spec.monitoring.enablePodMonitor: false`
and manage the `PodMonitor` yourself: the operator-generated one is hardcoded
to `insecureSkipVerify=true` and overwrites its spec on every reconcile, so a
manual patch on the generated `PodMonitor` would not survive.

### Deprecation of Automatic `PodMonitor` Creation

!!!warning Feature Deprecation Notice
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,11 @@ data:
, state
, usename
, COALESCE(application_name, '') AS application_name
, COUNT(*)
, COALESCE(EXTRACT (EPOCH FROM (max(now() - xact_start))), 0) AS max_tx_secs
, pg_catalog.count(*)
, COALESCE(EXTRACT (EPOCH FROM (pg_catalog.max(pg_catalog.now() OPERATOR(pg_catalog.-) xact_start))), 0) AS max_tx_secs
FROM pg_catalog.pg_stat_activity
GROUP BY datname, state, usename, application_name
) sa ON states.state = sa.state
) sa ON states.state OPERATOR(pg_catalog.=) sa.state
WHERE sa.usename IS NOT NULL
metrics:
- datname:
Expand All @@ -55,10 +55,10 @@ data:

backends_waiting:
query: |
SELECT count(*) AS total
SELECT pg_catalog.count(*) AS total
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_locks blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
ON blocking_locks.locktype OPERATOR(pg_catalog.=) blocked_locks.locktype
AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
Expand All @@ -68,8 +68,8 @@ data:
AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
AND blocking_locks.pid OPERATOR(pg_catalog.<>) blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid OPERATOR(pg_catalog.=) blocking_locks.pid
WHERE NOT blocked_locks.granted
metrics:
- total:
Expand Down Expand Up @@ -110,14 +110,14 @@ data:
pg_replication:
query: "SELECT CASE WHEN (
NOT pg_catalog.pg_is_in_recovery()
OR pg_catalog.pg_last_wal_receive_lsn() = pg_catalog.pg_last_wal_replay_lsn())
OR pg_catalog.pg_last_wal_receive_lsn() OPERATOR(pg_catalog.=) pg_catalog.pg_last_wal_replay_lsn())
THEN 0
ELSE GREATEST (0,
EXTRACT(EPOCH FROM (now() - pg_catalog.pg_last_xact_replay_timestamp())))
EXTRACT(EPOCH FROM (pg_catalog.now() OPERATOR(pg_catalog.-) pg_catalog.pg_last_xact_replay_timestamp())))
END AS lag,
pg_catalog.pg_is_in_recovery() AS in_recovery,
EXISTS (TABLE pg_stat_wal_receiver) AS is_wal_receiver_up,
(SELECT count(*) FROM pg_catalog.pg_stat_replication) AS streaming_replicas"
EXISTS (TABLE pg_catalog.pg_stat_wal_receiver) AS is_wal_receiver_up,
(SELECT pg_catalog.count(*) FROM pg_catalog.pg_stat_replication) AS streaming_replicas"
metrics:
- lag:
usage: "GAUGE"
Expand Down Expand Up @@ -165,17 +165,17 @@ data:
query: |
SELECT archived_count
, failed_count
, COALESCE(EXTRACT(EPOCH FROM (now() - last_archived_time)), -1) AS seconds_since_last_archival
, COALESCE(EXTRACT(EPOCH FROM (now() - last_failed_time)), -1) AS seconds_since_last_failure
, COALESCE(EXTRACT(EPOCH FROM (pg_catalog.now() OPERATOR(pg_catalog.-) last_archived_time)), -1) AS seconds_since_last_archival
, COALESCE(EXTRACT(EPOCH FROM (pg_catalog.now() OPERATOR(pg_catalog.-) last_failed_time)), -1) AS seconds_since_last_failure
, COALESCE(EXTRACT(EPOCH FROM last_archived_time), -1) AS last_archived_time
, COALESCE(EXTRACT(EPOCH FROM last_failed_time), -1) AS last_failed_time
, COALESCE(CAST(CAST('x'||pg_catalog.right(pg_catalog.split_part(last_archived_wal, '.', 1), 16) AS pg_catalog.bit(64)) AS pg_catalog.int8), -1) AS last_archived_wal_start_lsn
, COALESCE(CAST(CAST('x'||pg_catalog.right(pg_catalog.split_part(last_failed_wal, '.', 1), 16) AS pg_catalog.bit(64)) AS pg_catalog.int8), -1) AS last_failed_wal_start_lsn
, COALESCE(CAST(CAST('x' OPERATOR(pg_catalog.||) pg_catalog.right(pg_catalog.split_part(last_archived_wal, '.', 1), 16) AS pg_catalog.bit(64)) AS pg_catalog.int8), -1) AS last_archived_wal_start_lsn
, COALESCE(CAST(CAST('x' OPERATOR(pg_catalog.||) pg_catalog.right(pg_catalog.split_part(last_failed_wal, '.', 1), 16) AS pg_catalog.bit(64)) AS pg_catalog.int8), -1) AS last_failed_wal_start_lsn
, EXTRACT(EPOCH FROM stats_reset) AS stats_reset_time
FROM pg_catalog.pg_stat_archiver
predicate_query: |
SELECT NOT pg_catalog.pg_is_in_recovery()
OR pg_catalog.current_setting('archive_mode') = 'always'
OR pg_catalog.current_setting('archive_mode') OPERATOR(pg_catalog.=) 'always'
metrics:
- archived_count:
usage: "COUNTER"
Expand Down Expand Up @@ -461,12 +461,12 @@ data:
pg_extensions:
query: |
SELECT
current_database() as datname,
pg_catalog.current_database() as datname,
name as extname,
default_version,
installed_version,
CASE
WHEN default_version = installed_version THEN 0
WHEN default_version OPERATOR(pg_catalog.=) installed_version THEN 0
ELSE 1
END AS update_available
FROM pg_catalog.pg_available_extensions
Expand Down
43 changes: 43 additions & 0 deletions product_docs/docs/postgres_for_kubernetes/1/failover.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,49 @@ expected outage.
Enabling a new configuration option to delay failover provides a mechanism to
prevent premature failover for short-lived network or node instability.

## Detection of node-level failures

When the node hosting the primary becomes unreachable (for example, due to a
kubelet crash or a network partition between the node and the Kubernetes API
server), the operator relies on the pod's `Ready` condition to decide that the
primary is no longer serviceable. While the node is healthy the kubelet keeps
that condition up to date from the readiness probe; once the node stops
reporting, the Kubernetes node lifecycle controller is the one that flips the
condition to `False` as soon as it declares the node `Unknown`.

With stock kube-controller-manager settings, the transition is governed by
`--node-monitor-grace-period` (default `40s` on Kubernetes 1.29-1.31, raised
to `50s` in 1.32 and later): after that window the controller marks the node
`Unknown` and, in the same monitoring pass, issues a patch per pod on that
node to flip the `Ready` condition. In practice the operator observes the
primary as unready about **40 to 55 seconds** after the node becomes
unreachable (the grace period plus up to one `--node-monitor-period` poll,
default `5s`). Managed Kubernetes distributions (GKE, EKS, AKS) may tune
these values; consult the provider's documentation if the observed timing
does not match. After that, the failover procedure starts (further gated by
`.spec.failoverDelay`).

The `Ready` condition flip is not subject to the rate limiters that throttle
pod *eviction* during partial-zonal or large-cluster disruptions
(`--node-eviction-rate`, `--secondary-node-eviction-rate`,
`--unhealthy-zone-threshold`). The operator reacts to the condition flip as
soon as the controller emits the patch, regardless of the zone or cluster-wide
health state.

Pod *eviction* (actual deletion from the unreachable node) is a separate
mechanism, driven by `tolerationSeconds` on the
`node.kubernetes.io/unreachable` `NoExecute` taint (`300s` by default). That
timer does not hold up the operator's failover decision; {{name.ln}}
promotes a new primary as soon as the `Ready` condition flips. By that point
the kubelet on the isolated node has already stopped the old PostgreSQL
container locally: with the default
`.spec.probes.liveness.isolationCheck.enabled: true`, the instance manager
fails its own liveness probe once it can reach neither the API server nor
the rest of the cluster, and the kubelet kills the container within
approximately three probe periods (`~30s`). Full high availability
(recreation of the old primary on a healthy node by the operator) is still
gated on the taint-based eviction actually deleting the pod.

## Failover Quorum (Quorum-based Failover)

Failover quorum is a mechanism that enhances data durability and safety during
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ Use this annotation **with extreme caution** and only during emergency
operations.

!!!warning

This annotation should be removed as soon as the issue is resolved. Leaving
it in place prevents the operator from managing the annotated resource. On a
Cluster, this includes self-healing actions and failover.
Expand Down
34 changes: 33 additions & 1 deletion product_docs/docs/postgres_for_kubernetes/1/image_catalog.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ catalog entry is updated, all associated clusters automatically
[roll out the new image](rolling_update.md).

While you can build custom catalogs, {{name.ln}} provides
[official catalogs](#edb-cloudnativepg-cluster-catalogs) as `ClusterImageCatalog`
[official catalogs](#cloudnativepg-catalogs) as `ClusterImageCatalog`
resources, covering all official Community PostgreSQL container images.

## Catalog scoping
Expand All @@ -34,6 +34,7 @@ Both resources share a common schema:
PostgreSQL 18+ via `extension_control_path`).

!!!warning

While the operator trusts the user-defined `major` version without performing
image detection, the official {{name.ln}} catalogs are pre-validated by the
community to ensure that every extension and operand image entry correctly
Expand Down Expand Up @@ -132,6 +133,36 @@ API schema and structure.
Clusters referencing an image catalog can load any of its associated extensions
by name.

!!!info

Refer to the [documentation of image volume extensions](imagevolume_extensions.md)
for details on the internal image structure, configuration options, and
instructions on how to select or override catalog extensions within a cluster.
!!!

[Image Volume Extensions](imagevolume_extensions.md) allow you to bundle
containers for extensions directly within the catalog entry:

```yaml
apiVersion: postgresql.k8s.enterprisedb.io/v1
kind: ImageCatalog
metadata:
name: postgresql
spec:
images:
- major: 18
image: docker.enterprisedb.com/k8s_enterprise/postgresql:18.3-minimal-ubi9
extensions:
- name: foo
image:
reference: # registry path for your `foo` extension image
```

The `extensions` section follows the [`ExtensionConfiguration`](pg4k.v1.md#extensionconfiguration)
API schema and structure.
Clusters referencing an image catalog can load any of its associated extensions
by name.

!!!info
Refer to the [documentation of image volume extensions](imagevolume_extensions.md)
for details on the internal image structure, configuration options, and
Expand All @@ -158,6 +189,7 @@ release (e.g., `trixie`). It lists the most up-to-date container images for
every supported PostgreSQL major version.

!!!important

To ensure maximum security and immutability, all images within official
{{name.ln}} catalogs are identified by their **SHA256 digests** rather than
just tags.
Expand Down
Loading