Summary
Implement a robust Kubernetes preStop lifecycle feature in the interlink-slurm-plugin with the following requirements:
Key Features:
- PreStop handlers run only on SIGTERM (never on normal or EXIT cleanup)
- PreStop execution is synchronously ordered: run preStops in the order containers are declared (matches Kubernetes container shutdown order)
- PreStop always runs to completion (with per-container timeout) before container kill and probe cleanup
- Both HTTP and Exec preStop lifecycle handlers supported (same as probe subsystem, no TCP for now)
- PreStop actions run in their dedicated bash function and called via a global runner for all containers
- Job shell script (
job.sh) traps SIGTERM and invokes all preStop handlers before probe cleanup
- New config flags in SlurmConfig.yaml:
EnablePreStop (bool): enables/disables preStop lifecycle processing
PreStopTimeoutSeconds (int): per-preStop max time (default 5 seconds)
- References and basic how-to/example added to docs/README-probes.md
- Backwards compatible: default is disabled, so jobs behave exactly as before until enabled.
Implementation Tasks
Acceptance Criteria
- When
preStop is set in a pod (exec or httpGet), it is run only if the SLURM job is signaled with SIGTERM, before background probe processes are killed
- PreStops execute in the same order as containers are defined in the pod spec
- Each preStop is forcibly killed after the configured timeout (default 5s, configurable)
- If no preStop or config disables it, there is zero change to job shell or behavior for backward compatibility
- Example(s) in README-probes.md demonstrate usage and config
References
PRs/branches: Please associate any PR implementing this feature to this issue.
CC: @dciangot
Summary
Implement a robust Kubernetes
preStoplifecycle feature in the interlink-slurm-plugin with the following requirements:Key Features:
job.sh) traps SIGTERM and invokes all preStop handlers before probe cleanupEnablePreStop(bool): enables/disablespreStoplifecycle processingPreStopTimeoutSeconds(int): per-preStop max time (default 5 seconds)Implementation Tasks
SlurmConfigto supportEnablePreStopandPreStopTimeoutSeconds(default: 5)container.Lifecycle.PreStop→ PreStopCommand at container parsing, only if config enabledrunPreStop_<container>()shell functions for each container with preStoprunAllPreStops()that runs all preStops in order, each with configured timeout; log errors but do not abort on failureAcceptance Criteria
preStopis set in a pod (exec or httpGet), it is run only if the SLURM job is signaled with SIGTERM, before background probe processes are killedReferences
PRs/branches: Please associate any PR implementing this feature to this issue.
CC: @dciangot