You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If an error happens during application startup or runtime, the container logs the error but the pod is not restarted automatically.
This leaves the pod in a broken state, and a manual restart is required.
The logs show "ERROR: Application startup failed. Exiting.", but the container does not actually exit. The pod continues running and appears healthy when checked through the Kubernetes API.
Scenario
stac-auth-proxy runs in Azure AKS. When a node scales up or down, both the stac-auth-proxy pod and its dependency go-api pod are recreated simultaneously if they were running on the node that was scaled down.
As the dependent service goadmin-stage.ifrc.org is temporarily unavailable during that time, the startup health check fails and stac-auth-proxy throws error but doesn't exit.
The pod then stays in this failed state and does not restart automatically. Recovery only happens if:
the node scales again, or
someone manually deletes the pod.
Possible Solution
Exit the container when a startup error occurs so Kubernetes can restart the pod automatically.
Add proper liveness and readiness probes to ensure Kubernetes can detect unhealthy pods and restart them when needed.
Reference
Problem
If an error happens during application startup or runtime, the container logs the error but the pod is not restarted automatically.
This leaves the pod in a broken state, and a manual restart is required.
Error log
Note
The logs show "ERROR: Application startup failed. Exiting.", but the container does not actually exit. The pod continues running and appears healthy when checked through the Kubernetes API.
Scenario
stac-auth-proxyruns in Azure AKS. When a node scales up or down, both thestac-auth-proxypod and its dependencygo-apipod are recreated simultaneously if they were running on the node that was scaled down.As the dependent service
goadmin-stage.ifrc.orgis temporarily unavailable during that time, the startup health check fails andstac-auth-proxythrows error but doesn't exit.The pod then stays in this failed state and does not restart automatically. Recovery only happens if:
Possible Solution
@batpad @alukach @pantierra