💪 Motivation
I'd like to be able to set the backoffLimit property of KDP nodes to something other than the default of 6 when I deploy the operator.
📖 Additional Details
Nodes in KDP fail for several reasons, from running on EC2 spot instances that were evicted, to the host node running OOM. In most cases, just because Pods are failing, the entire pipeline shouldn't come to a halt.
⚖️ Acceptance Criteria
- When I deploy the operator, I can specify what the
backoffLimit of Jobs should be
- Processing doesn't come to a standstill when any Node has its Pod die 6 times
⚙️ Engineering Details
💪 Motivation
I'd like to be able to set the
backoffLimitproperty of KDP nodes to something other than the default of6when I deploy the operator.📖 Additional Details
Nodes in KDP fail for several reasons, from running on EC2 spot instances that were evicted, to the host node running OOM. In most cases, just because Pods are failing, the entire pipeline shouldn't come to a halt.
⚖️ Acceptance Criteria
backoffLimitof Jobs should be⚙️ Engineering Details
backoffLimit