Skip to content

PKI CRL health check failure should not cause overall health status to be DOWN #4151

@reinkrul

Description

@reinkrul

Problem

When a CRL endpoint is unavailable (e.g. returns 404), the pki.crl health check reports as failing, which causes the overall /health endpoint to return a non-healthy status. In Kubernetes environments using liveness/readiness probes on /health, this causes the pod to be killed and restarted — but a restart cannot resolve an external CRL endpoint being down.

Example warning seen in such a scenario:

time="2026-04-07T10:49:19Z" level=warning msg="Health check status is not UP, failing components:\n - pki.crl: [{CN=TEST Zorg CSP Private Root CA G1,O=CIBG,C=NL http://www.uzi-register-test.nl/cdp/test_zorg_csp_private_root_ca_g1.crl 0001-01-01 00:00:00 +0000 UTC} ...]"

Root cause

External resources (CRL endpoints from CAs) are outside the operator's control. Including them in the top-level health check creates a feedback loop: the node becomes unhealthy, gets restarted, and will never recover on its own even though the application itself is perfectly functional.

Proposed solution

Separate the health check into two layers, similar to the Spring Boot Actuator pattern:

  • /health — aggregates only components where a restart could help (i.e. internal, self-recoverable issues). This is what Kubernetes liveness/readiness probes should target.
  • /health/<component> — exposes the full detail per component, including external dependencies like CRL availability. Operators can wire these into their monitoring/alerting stack.

This way, a CA's CRL endpoint being temporarily unavailable is surfaced as observable via monitoring, but does not cause the pod to be indefinitely crash-looped.

Trade-off

Adding a new component to a future release would require operators to explicitly add it to their monitoring setup; it won't be picked up automatically. This is acceptable since the alternative (one external failure killing the entire pod) is worse.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions