Skip to content

Conversation

@rich7420
Copy link
Contributor

Description

Adds the Expected Runtime plugin: running jobs that exceed their configured expected runtime get nominated as requeue candidates. The plugin only does nomination; eviction is done by the Requeue action (elsewhere). Soft eviction: jobs become eligible when runtime ≥ expected, but are only evicted when a higher-priority workload needs the slot.

Why: Time-aware fairness (requeue only when there’s contention), opt-in via kai.scheduler/expected-runtime, cooldown via kai.scheduler/requeue-not-before to avoid thrashing.

What changed:

  • Plugin expectedruntime: registers RequeueCandidateNominationFn, nominates jobs that pass checks (running, preemptible, valid expected-runtime, runtime ≥ expected, cooldown expired).
  • Session API: RequeueCandidateNominationFn, AddRequeueCandidateNominationFn, CollectRequeueCandidates() (dedup by PodGroup UID).
  • Annotations: kai.scheduler/expected-runtime, requeue-delay, requeue-not-before.
  • Metrics: kai_requeue_nominations_total, kai_requeue_nomination_skipped_total (prefix from --metrics-namespace, default kai).
  • Operator: expectedruntime in default plugin list; docs in docs/plugins/expectedruntime.md.

Uses existing LastStartTimestamp; MinRuntime stays in Requeue action filters.

Related Issues

Closes #904

Checklist

Note: Ensure your PR title follows the Conventional Commits format (e.g., feat(scheduler): add new feature)

  • Self-reviewed
  • Added/updated tests (if needed)
  • Updated documentation (if needed)

Breaking Changes

Additional Notes

@rich7420
Copy link
Contributor Author

cc @itsomri , @romanbaron

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expected Runtime Plugin for Soft Eviction via Requeue Action

1 participant