Graceful termination/restart when Slurm job hits time limit

Currently (plugin v0.6.0), when the underlying Slurm job reaches time limit, the k8s pod goes into `Error: 15` status.

It would be great to have some k8s-native way to treat it, for example:
* treat it as a crash of pod's containers - run a pod termination routine and let user handle resubmission via Deployment policy, etc.
* OR treat it as a crash of the node and evict/restart pod or whatever Kubernetes does in this case.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graceful termination/restart when Slurm job hits time limit #111

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Graceful termination/restart when Slurm job hits time limit #111

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions