Problem
When a CRL endpoint is unavailable (e.g. returns 404), the pki.crl health check reports as failing, which causes the overall /health endpoint to return a non-healthy status. In Kubernetes environments using liveness/readiness probes on /health, this causes the pod to be killed and restarted — but a restart cannot resolve an external CRL endpoint being down.
Example warning seen in such a scenario:
time="2026-04-07T10:49:19Z" level=warning msg="Health check status is not UP, failing components:\n - pki.crl: [{CN=TEST Zorg CSP Private Root CA G1,O=CIBG,C=NL http://www.uzi-register-test.nl/cdp/test_zorg_csp_private_root_ca_g1.crl 0001-01-01 00:00:00 +0000 UTC} ...]"
Root cause
External resources (CRL endpoints from CAs) are outside the operator's control. Including them in the top-level health check creates a feedback loop: the node becomes unhealthy, gets restarted, and will never recover on its own even though the application itself is perfectly functional.
Proposed solution
Separate the health check into two layers, similar to the Spring Boot Actuator pattern:
/health — aggregates only components where a restart could help (i.e. internal, self-recoverable issues). This is what Kubernetes liveness/readiness probes should target.
/health/<component> — exposes the full detail per component, including external dependencies like CRL availability. Operators can wire these into their monitoring/alerting stack.
This way, a CA's CRL endpoint being temporarily unavailable is surfaced as observable via monitoring, but does not cause the pod to be indefinitely crash-looped.
Trade-off
Adding a new component to a future release would require operators to explicitly add it to their monitoring setup; it won't be picked up automatically. This is acceptable since the alternative (one external failure killing the entire pod) is worse.
Problem
When a CRL endpoint is unavailable (e.g. returns 404), the
pki.crlhealth check reports as failing, which causes the overall/healthendpoint to return a non-healthy status. In Kubernetes environments using liveness/readiness probes on/health, this causes the pod to be killed and restarted — but a restart cannot resolve an external CRL endpoint being down.Example warning seen in such a scenario:
Root cause
External resources (CRL endpoints from CAs) are outside the operator's control. Including them in the top-level health check creates a feedback loop: the node becomes unhealthy, gets restarted, and will never recover on its own even though the application itself is perfectly functional.
Proposed solution
Separate the health check into two layers, similar to the Spring Boot Actuator pattern:
/health— aggregates only components where a restart could help (i.e. internal, self-recoverable issues). This is what Kubernetes liveness/readiness probes should target./health/<component>— exposes the full detail per component, including external dependencies like CRL availability. Operators can wire these into their monitoring/alerting stack.This way, a CA's CRL endpoint being temporarily unavailable is surfaced as observable via monitoring, but does not cause the pod to be indefinitely crash-looped.
Trade-off
Adding a new component to a future release would require operators to explicitly add it to their monitoring setup; it won't be picked up automatically. This is acceptable since the alternative (one external failure killing the entire pod) is worse.