Conversation
5ff4ce5 to
52526fd
Compare
52526fd to
032b056
Compare
There was a problem hiding this comment.
Pull request overview
Adds VFIO (KubeVirt) support to the Intel GPU device plugin by introducing a VFIO run mode, a GPU initcontainer for binding devices to vfio-pci, and shared PCI/VFIO scanning utilities.
Changes:
- Add
vfioModeto the GPU DevicePlugin CRD/webhook validation and wire VFIO behavior into the GPU controller/DaemonSet generation. - Introduce
-run-mode(default/wsl/vfio) to the GPU plugin and implement BDF-based VFIO scanning + KubeVirt env var injection. - Add shared PCI helper utilities (
pkg/pluginutils/pci.go) plus tests; adjust existing VFIO plugin tests accordingly.
Reviewed changes
Copilot reviewed 32 out of 36 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| test/envtest/gpudeviceplugin_controller_test.go | Updates controller envtest expectations for VFIO mode args/initcontainer behavior. |
| pkg/vfio/plugin_test.go | Adjusts VFIO plugin tests to match updated scan and env behavior. |
| pkg/vfio/plugin.go | Refactors VFIO scan logic to shared PCI scanner + KubeVirt env injection. |
| pkg/pluginutils/pci_test.go | Adds unit tests for new PCI/VFIO helper utilities. |
| pkg/pluginutils/pci.go | Introduces shared PCI GPU compatibility checks, VFIO binding, and PCI scanning utility. |
| pkg/controllers/gpu/controller.go | Adds VFIO mode DaemonSet mutations, initcontainer support, and -run-mode=vfio arg handling. |
| pkg/apis/deviceplugin/v1/gpudeviceplugin_webhook.go | Adds VFIO-specific validation and refactors allow/deny ID validation. |
| pkg/apis/deviceplugin/v1/gpudeviceplugin_types.go | Adds vfioMode field to the GPU DevicePlugin spec. |
| deployments/operator/crd/bases/deviceplugin.intel.com_gpudeviceplugins.yaml | Exposes vfioMode in the CRD schema. |
| deployments/gpu_plugin/overlays/wsl/wsl_args.yaml | Switches WSL overlay to -run-mode=wsl. |
| deployments/gpu_plugin/overlays/kubevirt/vfio_mounts.yaml | Adds VFIO-related mounts and removes default DRM mounts for KubeVirt overlay. |
| deployments/gpu_plugin/overlays/kubevirt/kustomization.yaml | Adds new KubeVirt overlay patches. |
| deployments/gpu_plugin/overlays/kubevirt/initcontainer.yaml | Adds initcontainer for VFIO bind/unbind in the KubeVirt overlay. |
| deployments/gpu_plugin/overlays/kubevirt/add-args.yaml | Configures VFIO run mode args in the KubeVirt overlay. |
| cmd/xpumanager_sidecar/main.go | Switches xpumanager sidecar to shared pkg/pluginutils. |
| cmd/internal/pluginutils/devicedriver_test.go | Removes old internal pluginutils tests (migrated/replaced). |
| cmd/internal/pluginutils/devicedriver.go | Removes old internal device driver helper (migrated/replaced). |
| cmd/internal/labeler/labeler_test.go | Switches labeler tests to shared pkg/pluginutils. |
| cmd/internal/labeler/labeler.go | Switches labeler to shared pkg/pluginutils. |
| cmd/gpu_plugin/kubevirt.md | Adds documentation for KubeVirt/VFIO usage. |
| cmd/gpu_plugin/gpu_plugin_test.go | Adds VFIO scan + PostAllocate coverage; updates WSL and device compatibility test data. |
| cmd/gpu_plugin/gpu_plugin.go | Implements -run-mode, VFIO scan path, and VFIO PostAllocate env injection. |
| cmd/gpu_plugin/device_props.go | Removes old device properties helper in favor of shared pluginutils. |
| cmd/gpu_plugin/README.md | Documents VFIO resource and -run-mode. |
| cmd/gpu_nfdhook/main.go | Removes deprecated GPU NFD hook binary entrypoint. |
| cmd/gpu_nfdhook/README.md | Removes deprecated GPU NFD hook documentation. |
| cmd/gpu_init/main.go | Adds new GPU initcontainer binary to bind Intel GPUs to vfio-pci. |
| cmd/gpu_init/README.md | Documents new initcontainer behavior and manual rebind example. |
| cmd/dlb_plugin/dlb_plugin.go | Switches DLB plugin to shared pkg/pluginutils. |
| build/docker/templates/intel-gpu-initcontainer.Dockerfile.in | Updates initcontainer template to build/run new gpu_init binary. |
| build/docker/intel-gpu-initcontainer.Dockerfile | Updates generated Dockerfile entrypoint/labels for new initcontainer. |
| README.md | Updates top-level resource summary to include gpu.intel.com/vfio. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
032b056 to
a21af7d
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 32 out of 36 changed files in this pull request and generated 6 comments.
Comments suppressed due to low confidence (1)
cmd/gpu_plugin/gpu_plugin.go:1
- This changes scan error handling from a warning/continue pattern to
klog.Fatalf, which terminates the device plugin process on any transient scan issue. For long-running kubelet device plugins, it's usually safer to log the error and retry on the next tick (or implement backoff), rather than exiting.
// Copyright 2017-2026 Intel Corporation. All Rights Reserved.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
a21af7d to
d33c5e1
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 32 out of 36 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
bacace0 to
7899ee5
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 34 out of 38 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (1)
cmd/gpu_plugin/gpu_plugin.go:1
- The scan loop now exits the whole plugin process on any scan error. Since scan errors can be transient (e.g., momentary sysfs/devfs read issues or mounts not ready at startup), crashing the plugin can cause unnecessary restart loops. Prefer logging a warning/error and retrying on the next tick (as the previous implementation did), unless the error is truly unrecoverable and should fail fast.
// Copyright 2017-2026 Intel Corporation. All Rights Reserved.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| const timeout = time.Second * 30 | ||
| const interval = time.Second * 1 | ||
| const timeout = time.Second * 5 |
There was a problem hiding this comment.
Reducing the envtest timeout from 30s to 5s increases the chance of flakes on slower CI nodes (controller reconciliation + cache sync + API server latency can exceed 5s intermittently). Consider keeping a higher timeout (or deriving it from an env/config) while keeping the shorter polling interval if desired.
| const timeout = time.Second * 5 | |
| const timeout = time.Second * 30 |
| @@ -0,0 +1,48 @@ | |||
| # Using Intel GPU plugin with KubeVirt | |||
There was a problem hiding this comment.
nit: KubeVirt is not the only use case for VFIO devices since Kata containers could consume those too. Can we update the changes in this PR to not use KubeVirt in places that are generic to virt/vfio
7899ee5 to
66f6509
Compare
eero-t
left a comment
There was a problem hiding this comment.
Few documentation comments.
| | gpu.intel.com/i915_monitoring | Monitoring resource for the `i915` KMD provided devices | | ||
| | gpu.intel.com/xe | `xe` KMD provided GPU instance | | ||
| | gpu.intel.com/xe_monitoring | Monitoring resource for the `xe` KMD provided devices | | ||
| | gpu.intel.com/vfio | VFIO-PCI bound GPU devices (KubeVirt)| |
There was a problem hiding this comment.
Can the plugin provide them also as resource corresponding to required KMD? I mean, do workloads need to be modified to query different resource under kubevirt?
=> IMHO that should be explained a bit more in the doc (not necessarily here).
There was a problem hiding this comment.
I'm not sure I follow. The vfio resources are essentially just handles for the PCI pass-through to occur. If an integrated/older GPU is passed through, it will use i915. A newer one will then use xe.
Plugin could register the vfio resources with a kmd suffix, but I think that using the allow/denyIDs one can aid which GPUs to use for vfio mode.
There was a problem hiding this comment.
So GPU workload resource requests do not need to change; they are not intend to use this resource? What then requests it, Kubevirt? If yes, please make it clearer.
There was a problem hiding this comment.
Well, it does say "KubeVirt" in the resource line. That's the only known user for the vfio resources. In theory a container could request them, but without binding a GPU driver to the device, there's little the container could do with the device.
With other devices, like DSA, the user space driver can use the vfio-pci bound device. But that's not the case for GPU.
I'll add a link to the kubevirt md file.
There was a problem hiding this comment.
The problem is that it does not say what is supposed to use it.
"To be used with KubeVirt" is better, but could still be interpreted that workload should request that resource when KubeVirt is used.
I suggest rephrasing it as "used by KubeVirt".
There was a problem hiding this comment.
I changed it to:
| gpu.intel.com/vfio | VFIO-PCI bound GPU devices. To be used with KubeVirt |
66f6509 to
c937046
Compare
I removed the dsa portion and one operator fix. This now only includes plugin, initcontainer and operator changes. |
7d2fa91 to
7bc9480
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 31 out of 35 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
7bc9480 to
4848342
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 30 out of 34 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| t.Errorf("Expected %t, got %t for device %s", tc.expectPass, bindOk, dpath) | ||
| } |
There was a problem hiding this comment.
This test never asserts that an error is returned when expectPass is false (so a nil error would still pass). Also, the error is formatted with %t even though bindOk is an error. Add an explicit failure-case assertion (e.g., if !tc.expectPass && bindOk == nil) and use %v for the error value.
| t.Errorf("Expected %t, got %t for device %s", tc.expectPass, bindOk, dpath) | |
| } | |
| t.Errorf("Expected success %t, got error %v for device %s", tc.expectPass, bindOk, dpath) | |
| } | |
| if !tc.expectPass && bindOk == nil { | |
| t.Errorf("Expected failure (non-nil error) for device %s, but got nil", dpath) | |
| } |
| changed := false | ||
|
|
||
| if initConts[0].Image != dp.Spec.InitImage { | ||
| initConts[0].Image = dp.Spec.InitImage | ||
| changed = true | ||
| } | ||
|
|
||
| args := getInitArgs(dp) | ||
| if !changed { | ||
| changed = slices.Compare(args, initConts[0].Args) != 0 | ||
| } | ||
|
|
||
| initConts[0].Args = args | ||
|
|
||
| return changed |
There was a problem hiding this comment.
Reconciliation here only self-heals initcontainer Image and Args. If an existing DaemonSet has drifted in other initcontainer fields you rely on (e.g., SecurityContext.Privileged, VolumeMounts for /sys/bus/pci, SELinux options), this won't correct it even though VFIO mode is intended to be idempotent/self-healing. Consider reconciling the full initcontainer spec (or at least the required securityContext + mount) instead of only image/args.
| changed := false | |
| if initConts[0].Image != dp.Spec.InitImage { | |
| initConts[0].Image = dp.Spec.InitImage | |
| changed = true | |
| } | |
| args := getInitArgs(dp) | |
| if !changed { | |
| changed = slices.Compare(args, initConts[0].Args) != 0 | |
| } | |
| initConts[0].Args = args | |
| return changed | |
| // Reconcile the full init container spec using the same helper that is | |
| // used when creating it, so that any drift in securityContext, mounts, | |
| // etc. is also corrected. | |
| desiredSpec := ds.Spec.Template.Spec | |
| setInitContainer(&desiredSpec, dp.Spec.InitImage, getInitArgs(dp)) | |
| // Defensive check: setInitContainer should always create exactly one init container. | |
| if len(desiredSpec.InitContainers) != 1 { | |
| return false | |
| } | |
| if !reflect.DeepEqual(initConts[0], desiredSpec.InitContainers[0]) { | |
| ds.Spec.Template.Spec.InitContainers[0] = desiredSpec.InitContainers[0] | |
| return true | |
| } | |
| return false |
| // Unbinds Intel GPU devices from xe and i915 drivers and binds them to vfio-pci driver. | ||
| func main() { | ||
| var ( | ||
| denyIds string | ||
| allowIds string | ||
| ) | ||
|
|
||
| flag.StringVar(&denyIds, "deny-ids", "", "Comma-separated list of device IDs to deny (0x1234 format)") | ||
| flag.StringVar(&allowIds, "allow-ids", "", "Comma-separated list of device IDs to allow (0x1234 format)") | ||
|
|
||
| flag.Parse() | ||
|
|
||
| if err := pluginutils.ValidatePCIDeviceIDs(allowIds); err != nil { | ||
| klog.Fatalf("allow ID validation failed: %+v", err) | ||
| } | ||
| if err := pluginutils.ValidatePCIDeviceIDs(denyIds); err != nil { | ||
| klog.Fatalf("deny ID validation failed: %+v", err) | ||
| } |
There was a problem hiding this comment.
This introduces a new command that performs host driver rebinding and option validation, but there are no direct tests for its mode/validation behavior (e.g., allow+deny mutual exclusion, invalid ID handling). Consider extracting the core logic into a testable function and adding unit tests using a fake sysfs tree (similar to pkg/pluginutils/pci_test.go) to cover success and failure paths.
Move common code under pluginutils and modify the old nfdhook to rebind GPU devices from xe/i915 to vfio-pci. New resource will be gpu.intel.com/vfio. Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
Also fix "pref allocation policy" check Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
And add a kubevirt specific file. Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
13d3234 to
1a775ae
Compare
Introduces VFIO support to GPU plugin. Follows the example set by DSA VFIO plugin.
The old nfdhook initcontainer is reused to do the unbind-bind operation for the GPU devices.