Skip to content

OCPBUGS-78148: [release-4.21] block device plugin until SR-IOV config applied#1178

Merged
openshift-merge-bot[bot] merged 5 commits intoopenshift:release-4.21from
zeeke:ds/421/OCPBUGS-66342
Apr 20, 2026
Merged

OCPBUGS-78148: [release-4.21] block device plugin until SR-IOV config applied#1178
openshift-merge-bot[bot] merged 5 commits intoopenshift:release-4.21from
zeeke:ds/421/OCPBUGS-66342

Conversation

@zeeke
Copy link
Copy Markdown
Contributor

@zeeke zeeke commented Mar 10, 2026

ykulazhenkov and others added 3 commits March 10, 2026 12:49
Add blockDevicePluginUntilConfigured feature gate that prevents the
SR-IOV device plugin from starting until the sriov-config-daemon
has applied the configuration for the node.

When enabled, the device plugin daemonset runs an init container
that sets a wait-for-config annotation on its pod. The init
container then waits until the sriov-config-daemon removes this
annotation, which happens after the daemon has applied the SR-IOV
configuration for the node.

This feature addresses the race condition where the device plugin
starts and reports available resources before the configuration
is actually applied, which can lead to pods being scheduled
prematurely.

Key changes:
- Add wait-for-config subcommand to sriov-network-config-daemon
- Add init container to device plugin daemonset (when feature enabled)
- Add logic in daemon to remove annotation after config is applied
- Add Role/RoleBinding for device plugin pod access

Signed-off-by: Yury Kulazhenkov <ykulazhenkov@nvidia.com>
When the blockDevicePluginUntilConfigured feature gate is enabled and
there are no SriovNetworkNodePolicy resources targeting a node, the
config-daemon's apply() function calls waitForDevicePluginPodAndTryUnblock
which polls for up to 2 minutes waiting for a device plugin pod that
will never arrive. The device plugin daemonset is only scheduled on
nodes with policies (SriovDevicePluginLabel=Enabled), so this wait
always times out when Spec.Interfaces is empty.

Skip the device plugin wait and the periodic unblock API call when the
desired node state has no interfaces configured. This matches the
existing guard in tryUnblockDevicePlugin() which already checks for
empty interfaces before removing the wait-for-config annotation.

Signed-off-by: Sebastian Sch <sebassch@gmail.com>
@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 10, 2026
@zeeke
Copy link
Copy Markdown
Contributor Author

zeeke commented Mar 10, 2026

/jira cherrypick OCPBUGS-66342

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@zeeke: Jira Issue OCPBUGS-66342 has been cloned as Jira Issue OCPBUGS-78148. Will retitle bug to link to clone.
/retitle OCPBUGS-78148: Ds/421/ocpbugs 66342

Details

In response to this:

/jira cherrypick OCPBUGS-66342

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot changed the title Ds/421/ocpbugs 66342 OCPBUGS-78148: Ds/421/ocpbugs 66342 Mar 10, 2026
@zeeke zeeke changed the title OCPBUGS-78148: Ds/421/ocpbugs 66342 [release-4.21]: block device plugin until SR-IOV config applied Mar 10, 2026
@zeeke zeeke force-pushed the ds/421/OCPBUGS-66342 branch from 55e0da9 to 4758925 Compare March 10, 2026 13:50
@SchSeba
Copy link
Copy Markdown
Contributor

SchSeba commented Mar 10, 2026

Hi @zeeke can you check this one? it's failing in the CI

zeeke added 2 commits March 10, 2026 17:36
to align to `/deploy/role.yaml`

Signed-off-by: Andrea Panattoni <apanatto@redhat.com>
…0.20`

Signed-off-by: Andrea Panattoni <apanatto@redhat.com>
@zeeke zeeke force-pushed the ds/421/OCPBUGS-66342 branch from 4758925 to 1390679 Compare March 10, 2026 16:36
@zeeke zeeke changed the title [release-4.21]: block device plugin until SR-IOV config applied OCPBUGS-78148: [release-4.21] block device plugin until SR-IOV config applied Mar 10, 2026
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Mar 10, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@zeeke: This pull request references Jira Issue OCPBUGS-78148, which is invalid:

  • release note text must be set and not match the template OR release note type must be set to "Release Note Not Required". For more information you can reference the OpenShift Bug Process.

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

backport of

Conflicts faced and solved in pkg/daemon/daemon_test.go

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@zeeke
Copy link
Copy Markdown
Contributor Author

zeeke commented Mar 10, 2026

/test e2e-telco5g-sriov

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Mar 10, 2026

@zeeke: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-telco5g-sriov 1390679 link false /test e2e-telco5g-sriov

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@zeeke
Copy link
Copy Markdown
Contributor Author

zeeke commented Mar 11, 2026

Failing job is not related to this backport

[It] [sriov] NetworkPool Check rdma metrics inside a pod in exclusive mode should run pod with RDMA cni and expose nic metrics and another one without rdma info 

@zeeke
Copy link
Copy Markdown
Contributor Author

zeeke commented Mar 23, 2026

@SchSeba please take another look

@zeeke
Copy link
Copy Markdown
Contributor Author

zeeke commented Mar 24, 2026

/jira backport release-4.20

@zeeke
Copy link
Copy Markdown
Contributor Author

zeeke commented Mar 24, 2026

/jira refresh
/label backport-risk-assessed

@openshift-ci openshift-ci Bot added the backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. label Mar 24, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@zeeke: This pull request references Jira Issue OCPBUGS-78148, which is invalid:

  • release note text must be set and not match the template OR release note type must be set to "Release Note Not Required". For more information you can reference the OpenShift Bug Process.

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

/jira refresh
/label backport-risk-assessed

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@zeeke
Copy link
Copy Markdown
Contributor Author

zeeke commented Apr 7, 2026

@SchSeba please take another look

@SchSeba
Copy link
Copy Markdown
Contributor

SchSeba commented Apr 13, 2026

/lgtm
/approve
/verified by ci

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Apr 13, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@SchSeba: This PR has been marked as verified by ci.

Details

In response to this:

/lgtm
/approve
/verified by ci

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Apr 13, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 13, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: SchSeba, zeeke

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@zeeke
Copy link
Copy Markdown
Contributor Author

zeeke commented Apr 20, 2026

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Apr 20, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@zeeke: This pull request references Jira Issue OCPBUGS-78148, which is valid. The bug has been moved to the POST state.

7 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.z) matches configured target version for branch (4.21.z)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
  • release note text is set and does not match the template
  • dependent bug Jira Issue OCPBUGS-66342 is in the state Verified, which is one of the valid states (VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA))
  • dependent Jira Issue OCPBUGS-66342 targets the "4.22.0" version, which is one of the valid target versions: 4.22.0
  • bug has dependents

Requesting review from QA contact:
/cc @zhiqiangf

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot requested a review from zhiqiangf April 20, 2026 07:30
@openshift-merge-bot openshift-merge-bot Bot merged commit 91d9df4 into openshift:release-4.21 Apr 20, 2026
11 of 12 checks passed
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@zeeke: Jira Issue OCPBUGS-78148: All pull requests linked via external trackers have merged:

All linked pull requests have the verified tag. Jira Issue OCPBUGS-78148 has been moved to the VERIFIED state.

Details

In response to this:

backport of

Conflicts faced and solved in pkg/daemon/daemon_test.go

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@zeeke
Copy link
Copy Markdown
Contributor Author

zeeke commented Apr 20, 2026

/cherrypick release-4.20

@openshift-cherrypick-robot
Copy link
Copy Markdown

@zeeke: #1178 failed to apply on top of branch "release-4.20":

Applying: feat: block device plugin until SR-IOV config applied
Using index info to reconstruct a base tree...
M	README.md
M	bindata/manifests/plugins/sriov-device-plugin.yaml
M	pkg/consts/constants.go
M	pkg/daemon/daemon.go
M	pkg/daemon/daemon_test.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/daemon/daemon_test.go
Auto-merging pkg/daemon/daemon.go
Auto-merging pkg/consts/constants.go
Auto-merging bindata/manifests/plugins/sriov-device-plugin.yaml
Auto-merging README.md
Applying: review comments
Using index info to reconstruct a base tree...
M	README.md
M	bindata/manifests/plugins/sriov-device-plugin.yaml
M	cmd/sriov-network-config-daemon/start.go
M	controllers/sriovoperatorconfig_controller.go
M	pkg/daemon/daemon.go
M	pkg/daemon/daemon_test.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/daemon/daemon_test.go
CONFLICT (content): Merge conflict in pkg/daemon/daemon_test.go
Auto-merging pkg/daemon/daemon.go
CONFLICT (content): Merge conflict in pkg/daemon/daemon.go
Auto-merging controllers/sriovoperatorconfig_controller.go
Auto-merging cmd/sriov-network-config-daemon/start.go
Auto-merging bindata/manifests/plugins/sriov-device-plugin.yaml
Auto-merging README.md
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Patch failed at 0002 review comments

Details

In response to this:

/cherrypick release-4.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants