feat: Expected Runtime Plugin for Soft Eviction via Requeue Action #941
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Adds the Expected Runtime plugin: running jobs that exceed their configured expected runtime get nominated as requeue candidates. The plugin only does nomination; eviction is done by the Requeue action (elsewhere). Soft eviction: jobs become eligible when runtime ≥ expected, but are only evicted when a higher-priority workload needs the slot.
Why: Time-aware fairness (requeue only when there’s contention), opt-in via
kai.scheduler/expected-runtime, cooldown viakai.scheduler/requeue-not-beforeto avoid thrashing.What changed:
expectedruntime: registersRequeueCandidateNominationFn, nominates jobs that pass checks (running, preemptible, valid expected-runtime, runtime ≥ expected, cooldown expired).RequeueCandidateNominationFn,AddRequeueCandidateNominationFn,CollectRequeueCandidates()(dedup by PodGroup UID).kai.scheduler/expected-runtime,requeue-delay,requeue-not-before.kai_requeue_nominations_total,kai_requeue_nomination_skipped_total(prefix from--metrics-namespace, defaultkai).expectedruntimein default plugin list; docs indocs/plugins/expectedruntime.md.Uses existing
LastStartTimestamp; MinRuntime stays in Requeue action filters.Related Issues
Closes #904
Checklist
Breaking Changes
Additional Notes