EnterpriseDB · github-actions · May 9, 2026
@@ -90,6 +90,7 @@ Use this field only in case of
 ## Environment variables
 
 !!!important
+
 Environment variables reserved for operator usage (names starting with `PG` or
 `CNP_`, plus `POD_NAME`, `NAMESPACE`, and `CLUSTER_NAME`) cannot be set
 through the `env` and `envFrom` fields and are rejected at admission time.

@@ -379,6 +379,7 @@ The operator manages most of the [configuration options for PgBouncer](https://w
 allowing you to modify only a subset of them.
 
 !!!warning
+
 The operator passes these settings directly to PgBouncer without validation.
 To prevent configuration errors or crash loops, ensure each parameter is
 supported by your specific PgBouncer image version.
@@ -667,6 +668,42 @@ spec:
   - port: metrics
 ```
 
+### TLS for the Metrics Endpoint
+
+Set `.spec.monitoring.tls.enabled: true` to serve the metrics endpoint over
+HTTPS. By default, the cluster's server certificate is being used.
+The certificate is reloaded on every TLS handshake, so rotations are
+picked up without restarting the pod.
+
+```yaml
+spec:
+  monitoring:
+    tls:
+      enabled: true
+```
+
+When `.spec.pgbouncer.clientTLSSecret` is set, the metrics server presents
+that certificate instead.
+
+```yaml
+spec:
+  pgbouncer:
+    clientTLSSecret:
+      name: <CLIENT_TLS_SECRET>
+  monitoring:
+    tls:
+      enabled: true
+```
+
+The generated `PodMonitor` scrapes with `insecureSkipVerify=true` because
+Prometheus scrapes pods by IP and the certificate's SANs do not generally
+cover the pod IP.
+
+If you need strict verification, set `.spec.monitoring.enablePodMonitor: false`
+and manage the `PodMonitor` yourself: the operator-generated one is hardcoded
+to `insecureSkipVerify=true` and overwrites its spec on every reconcile, so a
+manual patch on the generated `PodMonitor` would not survive.
+
 ### Deprecation of Automatic `PodMonitor` Creation
 
 !!!warning Feature Deprecation Notice

@@ -27,11 +27,11 @@ data:
                    , state
                    , usename
                    , COALESCE(application_name, '') AS application_name
-                   , COUNT(*)
-                   , COALESCE(EXTRACT (EPOCH FROM (max(now() - xact_start))), 0) AS max_tx_secs
+                   , pg_catalog.count(*)
+                   , COALESCE(EXTRACT (EPOCH FROM (pg_catalog.max(pg_catalog.now() OPERATOR(pg_catalog.-) xact_start))), 0) AS max_tx_secs
                FROM pg_catalog.pg_stat_activity
                GROUP BY datname, state, usename, application_name
-           ) sa ON states.state = sa.state
+           ) sa ON states.state OPERATOR(pg_catalog.=) sa.state
            WHERE sa.usename IS NOT NULL
       metrics:
         - datname:
@@ -55,10 +55,10 @@ data:
 
     backends_waiting:
       query: |
-       SELECT count(*) AS total
+       SELECT pg_catalog.count(*) AS total
        FROM pg_catalog.pg_locks blocked_locks
        JOIN pg_catalog.pg_locks blocking_locks
-         ON blocking_locks.locktype = blocked_locks.locktype
+         ON blocking_locks.locktype OPERATOR(pg_catalog.=) blocked_locks.locktype
          AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
          AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
          AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
@@ -68,8 +68,8 @@ data:
          AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
          AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
          AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
-         AND blocking_locks.pid != blocked_locks.pid
-       JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
+         AND blocking_locks.pid OPERATOR(pg_catalog.<>) blocked_locks.pid
+       JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid OPERATOR(pg_catalog.=) blocking_locks.pid
        WHERE NOT blocked_locks.granted
       metrics:
         - total:
@@ -110,14 +110,14 @@ data:
     pg_replication:
       query: "SELECT CASE WHEN (
                 NOT pg_catalog.pg_is_in_recovery()
-                OR pg_catalog.pg_last_wal_receive_lsn() = pg_catalog.pg_last_wal_replay_lsn())
+                OR pg_catalog.pg_last_wal_receive_lsn() OPERATOR(pg_catalog.=) pg_catalog.pg_last_wal_replay_lsn())
               THEN 0
               ELSE GREATEST (0,
-                EXTRACT(EPOCH FROM (now() - pg_catalog.pg_last_xact_replay_timestamp())))
+                EXTRACT(EPOCH FROM (pg_catalog.now() OPERATOR(pg_catalog.-) pg_catalog.pg_last_xact_replay_timestamp())))
               END AS lag,
               pg_catalog.pg_is_in_recovery() AS in_recovery,
-              EXISTS (TABLE pg_stat_wal_receiver) AS is_wal_receiver_up,
-              (SELECT count(*) FROM pg_catalog.pg_stat_replication) AS streaming_replicas"
+              EXISTS (TABLE pg_catalog.pg_stat_wal_receiver) AS is_wal_receiver_up,
+              (SELECT pg_catalog.count(*) FROM pg_catalog.pg_stat_replication) AS streaming_replicas"
       metrics:
         - lag:
             usage: "GAUGE"
@@ -165,17 +165,17 @@ data:
       query: |
         SELECT archived_count
           , failed_count
-          , COALESCE(EXTRACT(EPOCH FROM (now() - last_archived_time)), -1) AS seconds_since_last_archival
-          , COALESCE(EXTRACT(EPOCH FROM (now() - last_failed_time)), -1) AS seconds_since_last_failure
+          , COALESCE(EXTRACT(EPOCH FROM (pg_catalog.now() OPERATOR(pg_catalog.-) last_archived_time)), -1) AS seconds_since_last_archival
+          , COALESCE(EXTRACT(EPOCH FROM (pg_catalog.now() OPERATOR(pg_catalog.-) last_failed_time)), -1) AS seconds_since_last_failure
           , COALESCE(EXTRACT(EPOCH FROM last_archived_time), -1) AS last_archived_time
           , COALESCE(EXTRACT(EPOCH FROM last_failed_time), -1) AS last_failed_time
-          , COALESCE(CAST(CAST('x'||pg_catalog.right(pg_catalog.split_part(last_archived_wal, '.', 1), 16) AS pg_catalog.bit(64)) AS pg_catalog.int8), -1) AS last_archived_wal_start_lsn
-          , COALESCE(CAST(CAST('x'||pg_catalog.right(pg_catalog.split_part(last_failed_wal, '.', 1), 16) AS pg_catalog.bit(64)) AS pg_catalog.int8), -1) AS last_failed_wal_start_lsn
+          , COALESCE(CAST(CAST('x' OPERATOR(pg_catalog.||) pg_catalog.right(pg_catalog.split_part(last_archived_wal, '.', 1), 16) AS pg_catalog.bit(64)) AS pg_catalog.int8), -1) AS last_archived_wal_start_lsn
+          , COALESCE(CAST(CAST('x' OPERATOR(pg_catalog.||) pg_catalog.right(pg_catalog.split_part(last_failed_wal, '.', 1), 16) AS pg_catalog.bit(64)) AS pg_catalog.int8), -1) AS last_failed_wal_start_lsn
           , EXTRACT(EPOCH FROM stats_reset) AS stats_reset_time
         FROM pg_catalog.pg_stat_archiver
       predicate_query: |
         SELECT NOT pg_catalog.pg_is_in_recovery()
-          OR pg_catalog.current_setting('archive_mode') = 'always'
+          OR pg_catalog.current_setting('archive_mode') OPERATOR(pg_catalog.=) 'always'
       metrics:
         - archived_count:
             usage: "COUNTER"
@@ -461,12 +461,12 @@ data:
     pg_extensions:
       query: |
         SELECT
-         current_database() as datname,
+         pg_catalog.current_database() as datname,
          name as extname,
          default_version,
          installed_version,
          CASE
-           WHEN default_version = installed_version THEN 0
+           WHEN default_version OPERATOR(pg_catalog.=) installed_version THEN 0
            ELSE 1
         END AS update_available
         FROM pg_catalog.pg_available_extensions

@@ -101,6 +101,49 @@ expected outage.
 Enabling a new configuration option to delay failover provides a mechanism to
 prevent premature failover for short-lived network or node instability.
 
+## Detection of node-level failures
+
+When the node hosting the primary becomes unreachable (for example, due to a
+kubelet crash or a network partition between the node and the Kubernetes API
+server), the operator relies on the pod's `Ready` condition to decide that the
+primary is no longer serviceable. While the node is healthy the kubelet keeps
+that condition up to date from the readiness probe; once the node stops
+reporting, the Kubernetes node lifecycle controller is the one that flips the
+condition to `False` as soon as it declares the node `Unknown`.
+
+With stock kube-controller-manager settings, the transition is governed by
+`--node-monitor-grace-period` (default `40s` on Kubernetes 1.29-1.31, raised
+to `50s` in 1.32 and later): after that window the controller marks the node
+`Unknown` and, in the same monitoring pass, issues a patch per pod on that
+node to flip the `Ready` condition. In practice the operator observes the
+primary as unready about **40 to 55 seconds** after the node becomes
+unreachable (the grace period plus up to one `--node-monitor-period` poll,
+default `5s`). Managed Kubernetes distributions (GKE, EKS, AKS) may tune
+these values; consult the provider's documentation if the observed timing
+does not match. After that, the failover procedure starts (further gated by
+`.spec.failoverDelay`).
+
+The `Ready` condition flip is not subject to the rate limiters that throttle
+pod *eviction* during partial-zonal or large-cluster disruptions
+(`--node-eviction-rate`, `--secondary-node-eviction-rate`,
+`--unhealthy-zone-threshold`). The operator reacts to the condition flip as
+soon as the controller emits the patch, regardless of the zone or cluster-wide
+health state.
+
+Pod *eviction* (actual deletion from the unreachable node) is a separate
+mechanism, driven by `tolerationSeconds` on the
+`node.kubernetes.io/unreachable` `NoExecute` taint (`300s` by default). That
+timer does not hold up the operator's failover decision; {{name.ln}}
+promotes a new primary as soon as the `Ready` condition flips. By that point
+the kubelet on the isolated node has already stopped the old PostgreSQL
+container locally: with the default
+`.spec.probes.liveness.isolationCheck.enabled: true`, the instance manager
+fails its own liveness probe once it can reach neither the API server nor
+the rest of the cluster, and the kubelet kills the container within
+approximately three probe periods (`~30s`). Full high availability
+(recreation of the old primary on a healthy node by the operator) is still
+gated on the taint-based eviction actually deleting the pod.
+
 ## Failover Quorum (Quorum-based Failover)
 
 Failover quorum is a mechanism that enhances data durability and safety during

@@ -67,6 +67,7 @@ Use this annotation **with extreme caution** and only during emergency
 operations.
 
 !!!warning
+
 This annotation should be removed as soon as the issue is resolved. Leaving
 it in place prevents the operator from managing the annotated resource. On a
 Cluster, this includes self-healing actions and failover.

@@ -12,7 +12,7 @@ catalog entry is updated, all associated clusters automatically
 [roll out the new image](rolling_update.md).
 
 While you can build custom catalogs, {{name.ln}} provides
-[official catalogs](#edb-cloudnativepg-cluster-catalogs) as `ClusterImageCatalog`
+[official catalogs](#cloudnativepg-catalogs) as `ClusterImageCatalog`
 resources, covering all official Community PostgreSQL container images.
 
 ## Catalog scoping
@@ -34,6 +34,7 @@ Both resources share a common schema:
     PostgreSQL 18+ via `extension_control_path`).
 
 !!!warning
+
 While the operator trusts the user-defined `major` version without performing
 image detection, the official {{name.ln}} catalogs are pre-validated by the
 community to ensure that every extension and operand image entry correctly
@@ -132,6 +133,36 @@ API schema and structure.
 Clusters referencing an image catalog can load any of its associated extensions
 by name.
 
+!!!info
+
+Refer to the [documentation of image volume extensions](imagevolume_extensions.md)
+for details on the internal image structure, configuration options, and
+instructions on how to select or override catalog extensions within a cluster.
+!!!
+
+[Image Volume Extensions](imagevolume_extensions.md) allow you to bundle
+containers for extensions directly within the catalog entry:
+
+```yaml
+apiVersion: postgresql.k8s.enterprisedb.io/v1
+kind: ImageCatalog
+metadata:
+  name: postgresql
+spec:
+  images:
+    - major: 18
+      image: docker.enterprisedb.com/k8s_enterprise/postgresql:18.3-minimal-ubi9
+      extensions:
+        - name: foo
+          image:
+            reference: # registry path for your `foo` extension image
+```
+
+The `extensions` section follows the [`ExtensionConfiguration`](pg4k.v1.md#extensionconfiguration)
+API schema and structure.
+Clusters referencing an image catalog can load any of its associated extensions
+by name.
+
 !!!info
 Refer to the [documentation of image volume extensions](imagevolume_extensions.md)
 for details on the internal image structure, configuration options, and
@@ -158,6 +189,7 @@ release (e.g., `trixie`). It lists the most up-to-date container images for
 every supported PostgreSQL major version.
 
 !!!important
+
 To ensure maximum security and immutability, all images within official
 {{name.ln}} catalogs are identified by their **SHA256 digests** rather than
 just tags.