Skip to content

Conversation

@grandeit
Copy link

Fix static pod pruning logic for non-contiguous set of revisions

Problem

The PruneController contains a logic bug in revisionsToKeep() that prevents pruning when the protected revision set is non-contiguous but spans from revision 1 to LatestAvailableRevision.

Scenario that triggers the bug:
Node has very old LastFailedRevision: 5
Cluster is now at LatestAvailableRevision: 100
Limits are failedRevisionLimit: 5, succeededRevisionLimit: 5
Protected set becomes {1,2,3,4,5,96,97,98,99,100} (10 revisions)

The buggy logic sees:
First element: 1
Last element: 100
Returns keepAll = true -> No pruning happens.
This causes a lot of revision-status-* ConfigMaps (and their owned ConfigMaps) to accumulate until a later failed revision eventually removes the first revision from the set.

Solution

Check if the set has exactly LatestAvailableRevision elements before triggering the keepAll optimization. This ensures that the set has no gaps and is in-fact contiguous.

Testing

Added test case: "prunes non-contiguous set (keeps 1-10 and 96-100, prunes 11-95)" that verifies:

Two nodes with LastFailedRevision: 5 and LastFailedRevision: 10
CurrentRevision: 100 on both nodes
LatestAvailableRevision: 100

Protected set: {1,2,3,4,5,6,7,8,9,10,96,97,98,99,100} (15 revisions)
Revisions 11 - 95 are pruned.
keepAll optimization does not trigger.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 27, 2025
@openshift-ci-robot
Copy link

@grandeit: This pull request explicitly references no jira issue.

Details

In response to this:

Fix static pod pruning logic for non-contiguous set of revisions

Problem

The PruneController contains a logic bug in revisionsToKeep() that prevents pruning when the protected revision set is non-contiguous but spans from revision 1 to LatestAvailableRevision.

Scenario that triggers the bug:
Node has very old LastFailedRevision: 5
Cluster is now at LatestAvailableRevision: 100
Limits are failedRevisionLimit: 5, succeededRevisionLimit: 5
Protected set becomes {1,2,3,4,5,96,97,98,99,100} (10 revisions)

The buggy logic sees:
First element: 1
Last element: 100
Returns keepAll = true -> No pruning happens.
This causes a lot of revision-status-* ConfigMaps (and their owned ConfigMaps) to accumulate until a later failed revision eventually removes the first revision from the set.

Solution

Check if the set has exactly LatestAvailableRevision elements before triggering the keepAll optimization. This ensures that the set has no gaps and is in-fact contiguous.

Testing

Added test case: "prunes non-contiguous set (keeps 1-10 and 96-100, prunes 11-95)" that verifies:

Two nodes with LastFailedRevision: 5 and LastFailedRevision: 10
CurrentRevision: 100 on both nodes
LatestAvailableRevision: 100

Protected set: {1,2,3,4,5,6,7,8,9,10,96,97,98,99,100} (15 revisions)
Revisions 11 - 95 are pruned.
keepAll optimization does not trigger.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Nov 27, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 27, 2025

Hi @grandeit. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 27, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: grandeit
Once this PR has been reviewed and has the lgtm label, please assign dgrisonnet for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@grandeit
Copy link
Author

Friendly ping to some active reviewers:
@p0lyn0mial @JoelSpeed @damdo 👋

@JoelSpeed
Copy link
Contributor

@grandeit Correct me if I'm wrong, but, doesn't the static pod controller here upgrade through sequential versions?

Therefore if I have protected version 5, this implies to me that some pod is still stuck on this version, meanwhile, others are on a later version say 100. That stuck pod needs to upgrade through every version 5 through 100 to catch up. Therefore, if we were to prune the intermediate versions, that pod would become perma-stuck as we would no longer have the intermediate versions for it to iterate through?

@grandeit
Copy link
Author

grandeit commented Jan 2, 2026

@grandeit Correct me if I'm wrong, but, doesn't the static pod controller here upgrade through sequential versions?

Therefore if I have protected version 5, this implies to me that some pod is still stuck on this version, meanwhile, others are on a later version say 100. That stuck pod needs to upgrade through every version 5 through 100 to catch up. Therefore, if we were to prune the intermediate versions, that pod would become perma-stuck as we would no longer have the intermediate versions for it to iterate through?

Hey @JoelSpeed
The static pods do not upgrade sequentially through the versions. Each node jumps directly to the target revision.

The controller uses getRevisionToStart to determine which revision to install on a certain node. When the previous node (that was just upgraded) has a newer revision and the current node did not fail upgrading to it, it will set this revision as new upgrade target (around line 900):

// getRevisionToStart returns the revision we need to start or zero if none
func (c *InstallerController) getRevisionToStart(currNodeState, prevNodeState *operatorv1.NodeStatus, operatorStatus *operatorv1.StaticPodOperatorStatus) int32 {
if prevNodeState == nil {
currentAtLatest := currNodeState.CurrentRevision == operatorStatus.LatestAvailableRevision
if !currentAtLatest {
return operatorStatus.LatestAvailableRevision
}
return 0
}
prevFinished := prevNodeState.TargetRevision == 0
prevInTransition := prevNodeState.CurrentRevision != prevNodeState.TargetRevision
if prevInTransition && !prevFinished {
return 0
}
prevAhead := prevNodeState.CurrentRevision > currNodeState.CurrentRevision
failedAtPrev := currNodeState.LastFailedRevision == prevNodeState.CurrentRevision
if prevAhead && !failedAtPrev {
return prevNodeState.CurrentRevision
}
return 0
}

In your example, the node with currentRevision=5 will jump to revision 100.

I think the idea behind keeping the failed revisions is to have a debug possibility later on.

Even before my proposed fix, the intermediate revisions are pruned if there is no failed revision upgrade from a very early point in time which is now still hanging around and preventing the pruning because it imho incorrectly triggers a shortcut.
There are some details of the current behaviour described in the comment of the revisonsToKeep function:

// revisionsToKeep approximates the set of revisions to keep: spec.failedRevisionsLimit for failed revisions,
// spec.succeededRevisionsLimit for succeed revisions (for all nodes). The approximation goes by:
// - don't prune LatestAvailableRevision and the max(spec.failedRevisionLimit, spec.succeededRevisionLimit) - 1 revisions before it.
// - don't prune a node's CurrentRevision and the spec.succeededRevisionLimit - 1 revisions before it.
// - don't prune a node's TargetRevision and the spec.failedRevisionLimit - 1 revisions before it.
// - don't prune a node's LastFailedRevision and the spec.failedRevisionLimit - 1 revisions before it.
func (c *PruneController) revisionsToKeep(status *operatorv1.StaticPodOperatorStatus, failedLimit, succeededLimit int) (all bool, keep sets.Set[int32]) {

Hope this helps :)

@JoelSpeed
Copy link
Contributor

/ok-to-test

@openshift-ci openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 2, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 2, 2026

@grandeit: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. ok-to-test Indicates a non-member PR verified by an org member that is safe to test.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants