Skip to content

GPU plugin KubeVirt support#2219

Open
tkatila wants to merge 3 commits intointel:mainfrom
tkatila:gpu-kubevirt-support
Open

GPU plugin KubeVirt support#2219
tkatila wants to merge 3 commits intointel:mainfrom
tkatila:gpu-kubevirt-support

Conversation

@tkatila
Copy link
Copy Markdown
Contributor

@tkatila tkatila commented Feb 5, 2026

Introduces VFIO support to GPU plugin. Follows the example set by DSA VFIO plugin.
The old nfdhook initcontainer is reused to do the unbind-bind operation for the GPU devices.

@tkatila tkatila force-pushed the gpu-kubevirt-support branch 2 times, most recently from 5ff4ce5 to 52526fd Compare February 6, 2026 09:58
@tkatila tkatila marked this pull request as ready for review February 6, 2026 11:18
@tkatila tkatila requested review from bart0sh, kad and mythi as code owners February 6, 2026 11:18
@tkatila tkatila force-pushed the gpu-kubevirt-support branch from 52526fd to 032b056 Compare February 6, 2026 12:03
@tkatila tkatila changed the title Gpu kubevirt support GPU plugin KubeVirt support Feb 6, 2026
@mythi mythi requested a review from Copilot February 9, 2026 06:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds VFIO (KubeVirt) support to the Intel GPU device plugin by introducing a VFIO run mode, a GPU initcontainer for binding devices to vfio-pci, and shared PCI/VFIO scanning utilities.

Changes:

  • Add vfioMode to the GPU DevicePlugin CRD/webhook validation and wire VFIO behavior into the GPU controller/DaemonSet generation.
  • Introduce -run-mode (default/wsl/vfio) to the GPU plugin and implement BDF-based VFIO scanning + KubeVirt env var injection.
  • Add shared PCI helper utilities (pkg/pluginutils/pci.go) plus tests; adjust existing VFIO plugin tests accordingly.

Reviewed changes

Copilot reviewed 32 out of 36 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
test/envtest/gpudeviceplugin_controller_test.go Updates controller envtest expectations for VFIO mode args/initcontainer behavior.
pkg/vfio/plugin_test.go Adjusts VFIO plugin tests to match updated scan and env behavior.
pkg/vfio/plugin.go Refactors VFIO scan logic to shared PCI scanner + KubeVirt env injection.
pkg/pluginutils/pci_test.go Adds unit tests for new PCI/VFIO helper utilities.
pkg/pluginutils/pci.go Introduces shared PCI GPU compatibility checks, VFIO binding, and PCI scanning utility.
pkg/controllers/gpu/controller.go Adds VFIO mode DaemonSet mutations, initcontainer support, and -run-mode=vfio arg handling.
pkg/apis/deviceplugin/v1/gpudeviceplugin_webhook.go Adds VFIO-specific validation and refactors allow/deny ID validation.
pkg/apis/deviceplugin/v1/gpudeviceplugin_types.go Adds vfioMode field to the GPU DevicePlugin spec.
deployments/operator/crd/bases/deviceplugin.intel.com_gpudeviceplugins.yaml Exposes vfioMode in the CRD schema.
deployments/gpu_plugin/overlays/wsl/wsl_args.yaml Switches WSL overlay to -run-mode=wsl.
deployments/gpu_plugin/overlays/kubevirt/vfio_mounts.yaml Adds VFIO-related mounts and removes default DRM mounts for KubeVirt overlay.
deployments/gpu_plugin/overlays/kubevirt/kustomization.yaml Adds new KubeVirt overlay patches.
deployments/gpu_plugin/overlays/kubevirt/initcontainer.yaml Adds initcontainer for VFIO bind/unbind in the KubeVirt overlay.
deployments/gpu_plugin/overlays/kubevirt/add-args.yaml Configures VFIO run mode args in the KubeVirt overlay.
cmd/xpumanager_sidecar/main.go Switches xpumanager sidecar to shared pkg/pluginutils.
cmd/internal/pluginutils/devicedriver_test.go Removes old internal pluginutils tests (migrated/replaced).
cmd/internal/pluginutils/devicedriver.go Removes old internal device driver helper (migrated/replaced).
cmd/internal/labeler/labeler_test.go Switches labeler tests to shared pkg/pluginutils.
cmd/internal/labeler/labeler.go Switches labeler to shared pkg/pluginutils.
cmd/gpu_plugin/kubevirt.md Adds documentation for KubeVirt/VFIO usage.
cmd/gpu_plugin/gpu_plugin_test.go Adds VFIO scan + PostAllocate coverage; updates WSL and device compatibility test data.
cmd/gpu_plugin/gpu_plugin.go Implements -run-mode, VFIO scan path, and VFIO PostAllocate env injection.
cmd/gpu_plugin/device_props.go Removes old device properties helper in favor of shared pluginutils.
cmd/gpu_plugin/README.md Documents VFIO resource and -run-mode.
cmd/gpu_nfdhook/main.go Removes deprecated GPU NFD hook binary entrypoint.
cmd/gpu_nfdhook/README.md Removes deprecated GPU NFD hook documentation.
cmd/gpu_init/main.go Adds new GPU initcontainer binary to bind Intel GPUs to vfio-pci.
cmd/gpu_init/README.md Documents new initcontainer behavior and manual rebind example.
cmd/dlb_plugin/dlb_plugin.go Switches DLB plugin to shared pkg/pluginutils.
build/docker/templates/intel-gpu-initcontainer.Dockerfile.in Updates initcontainer template to build/run new gpu_init binary.
build/docker/intel-gpu-initcontainer.Dockerfile Updates generated Dockerfile entrypoint/labels for new initcontainer.
README.md Updates top-level resource summary to include gpu.intel.com/vfio.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cmd/gpu_plugin/gpu_plugin.go
Comment thread pkg/pluginutils/pci.go
Comment thread pkg/apis/deviceplugin/v1/gpudeviceplugin_webhook.go Outdated
Comment thread pkg/apis/deviceplugin/v1/gpudeviceplugin_webhook.go Outdated
Comment thread pkg/controllers/gpu/controller.go Outdated
Comment thread pkg/pluginutils/pci.go Outdated
Comment thread deployments/gpu_plugin/overlays/kubevirt/vfio_mounts.yaml Outdated
Comment thread pkg/pluginutils/pci_test.go Outdated
Comment thread cmd/gpu_plugin/kubevirt.md Outdated
Comment thread cmd/gpu_plugin/kubevirt.md Outdated
@tkatila tkatila force-pushed the gpu-kubevirt-support branch from 032b056 to a21af7d Compare February 9, 2026 08:38
@tkatila tkatila requested a review from Copilot February 9, 2026 08:40
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 32 out of 36 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (1)

cmd/gpu_plugin/gpu_plugin.go:1

  • This changes scan error handling from a warning/continue pattern to klog.Fatalf, which terminates the device plugin process on any transient scan issue. For long-running kubelet device plugins, it's usually safer to log the error and retry on the next tick (or implement backoff), rather than exiting.
// Copyright 2017-2026 Intel Corporation. All Rights Reserved.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/envtest/gpudeviceplugin_controller_test.go Outdated
Comment thread pkg/apis/deviceplugin/v1/gpudeviceplugin_webhook.go
Comment thread pkg/pluginutils/pci.go Outdated
Comment thread pkg/pluginutils/pci.go
Comment thread pkg/pluginutils/pci.go Outdated
Comment thread cmd/gpu_plugin/gpu_plugin.go
@tkatila tkatila force-pushed the gpu-kubevirt-support branch from a21af7d to d33c5e1 Compare February 9, 2026 11:02
@tkatila tkatila requested a review from Copilot February 9, 2026 11:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 32 out of 36 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/vfio/plugin.go Outdated
Comment thread pkg/pluginutils/pci.go
Comment thread cmd/gpu_plugin/gpu_plugin_test.go
Comment thread pkg/controllers/gpu/controller.go
@tkatila tkatila force-pushed the gpu-kubevirt-support branch 2 times, most recently from bacace0 to 7899ee5 Compare February 9, 2026 13:24
@tkatila tkatila requested a review from Copilot February 9, 2026 13:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 34 out of 38 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (1)

cmd/gpu_plugin/gpu_plugin.go:1

  • The scan loop now exits the whole plugin process on any scan error. Since scan errors can be transient (e.g., momentary sysfs/devfs read issues or mounts not ready at startup), crashing the plugin can cause unnecessary restart loops. Prefer logging a warning/error and retrying on the next tick (as the previous implementation did), unless the error is truly unrecoverable and should fail fast.
// Copyright 2017-2026 Intel Corporation. All Rights Reserved.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/pluginutils/pci_test.go Outdated
Comment thread pkg/apis/deviceplugin/v1/gpudeviceplugin_webhook.go
Comment thread pkg/controllers/gpu/controller.go
Comment thread cmd/gpu_plugin/gpu_plugin.go

const timeout = time.Second * 30
const interval = time.Second * 1
const timeout = time.Second * 5
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reducing the envtest timeout from 30s to 5s increases the chance of flakes on slower CI nodes (controller reconciliation + cache sync + API server latency can exceed 5s intermittently). Consider keeping a higher timeout (or deriving it from an env/config) while keeping the shorter polling interval if desired.

Suggested change
const timeout = time.Second * 5
const timeout = time.Second * 30

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@mythi mythi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems pretty big for a single PR. maybe skip at least 2a5e5b8 for now

Comment thread cmd/gpu_init/README.md Outdated
@@ -0,0 +1,48 @@
# Using Intel GPU plugin with KubeVirt
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: KubeVirt is not the only use case for VFIO devices since Kata containers could consume those too. Can we update the changes in this PR to not use KubeVirt in places that are generic to virt/vfio

Comment thread cmd/gpu_init/main.go Outdated
@tkatila tkatila force-pushed the gpu-kubevirt-support branch from 7899ee5 to 66f6509 Compare February 10, 2026 09:15
Copy link
Copy Markdown
Contributor

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few documentation comments.

Comment thread cmd/gpu_plugin/kubevirt.md
Comment thread cmd/gpu_plugin/README.md Outdated
| gpu.intel.com/i915_monitoring | Monitoring resource for the `i915` KMD provided devices |
| gpu.intel.com/xe | `xe` KMD provided GPU instance |
| gpu.intel.com/xe_monitoring | Monitoring resource for the `xe` KMD provided devices |
| gpu.intel.com/vfio | VFIO-PCI bound GPU devices (KubeVirt)|
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the plugin provide them also as resource corresponding to required KMD? I mean, do workloads need to be modified to query different resource under kubevirt?

=> IMHO that should be explained a bit more in the doc (not necessarily here).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow. The vfio resources are essentially just handles for the PCI pass-through to occur. If an integrated/older GPU is passed through, it will use i915. A newer one will then use xe.
Plugin could register the vfio resources with a kmd suffix, but I think that using the allow/denyIDs one can aid which GPUs to use for vfio mode.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So GPU workload resource requests do not need to change; they are not intend to use this resource? What then requests it, Kubevirt? If yes, please make it clearer.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it does say "KubeVirt" in the resource line. That's the only known user for the vfio resources. In theory a container could request them, but without binding a GPU driver to the device, there's little the container could do with the device.

With other devices, like DSA, the user space driver can use the vfio-pci bound device. But that's not the case for GPU.

I'll add a link to the kubevirt md file.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that it does not say what is supposed to use it.

"To be used with KubeVirt" is better, but could still be interpreted that workload should request that resource when KubeVirt is used.

I suggest rephrasing it as "used by KubeVirt".

Copy link
Copy Markdown
Contributor Author

@tkatila tkatila Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to:
| gpu.intel.com/vfio | VFIO-PCI bound GPU devices. To be used with KubeVirt |

Comment thread README.md Outdated
@tkatila tkatila force-pushed the gpu-kubevirt-support branch from 66f6509 to c937046 Compare February 10, 2026 09:25
@tkatila
Copy link
Copy Markdown
Contributor Author

tkatila commented Feb 10, 2026

seems pretty big for a single PR. maybe skip at least 2a5e5b8 for now

I removed the dsa portion and one operator fix. This now only includes plugin, initcontainer and operator changes.

@tkatila tkatila force-pushed the gpu-kubevirt-support branch 2 times, most recently from 7d2fa91 to 7bc9480 Compare February 10, 2026 11:06
@tkatila tkatila requested a review from Copilot February 11, 2026 09:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 31 out of 35 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/apis/deviceplugin/v1/gpudeviceplugin_webhook.go
Comment thread pkg/pluginutils/pci.go Outdated
Comment thread pkg/pluginutils/pci.go
Comment thread cmd/gpu_plugin/gpu_plugin.go
Comment thread cmd/gpu_plugin/gpu_plugin.go
Comment thread pkg/pluginutils/pci_test.go
@tkatila tkatila force-pushed the gpu-kubevirt-support branch from 7bc9480 to 4848342 Compare February 11, 2026 11:04
@tkatila tkatila requested a review from Copilot February 11, 2026 11:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 34 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +436 to +437
t.Errorf("Expected %t, got %t for device %s", tc.expectPass, bindOk, dpath)
}
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test never asserts that an error is returned when expectPass is false (so a nil error would still pass). Also, the error is formatted with %t even though bindOk is an error. Add an explicit failure-case assertion (e.g., if !tc.expectPass && bindOk == nil) and use %v for the error value.

Suggested change
t.Errorf("Expected %t, got %t for device %s", tc.expectPass, bindOk, dpath)
}
t.Errorf("Expected success %t, got error %v for device %s", tc.expectPass, bindOk, dpath)
}
if !tc.expectPass && bindOk == nil {
t.Errorf("Expected failure (non-nil error) for device %s, but got nil", dpath)
}

Copilot uses AI. Check for mistakes.
Comment thread pkg/pluginutils/pci.go Outdated
Comment thread pkg/apis/deviceplugin/v1/gpudeviceplugin_webhook.go
Comment thread pkg/apis/deviceplugin/v1/gpudeviceplugin_webhook.go
Comment thread cmd/gpu_plugin/gpu_plugin.go
Comment on lines +248 to +262
changed := false

if initConts[0].Image != dp.Spec.InitImage {
initConts[0].Image = dp.Spec.InitImage
changed = true
}

args := getInitArgs(dp)
if !changed {
changed = slices.Compare(args, initConts[0].Args) != 0
}

initConts[0].Args = args

return changed
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reconciliation here only self-heals initcontainer Image and Args. If an existing DaemonSet has drifted in other initcontainer fields you rely on (e.g., SecurityContext.Privileged, VolumeMounts for /sys/bus/pci, SELinux options), this won't correct it even though VFIO mode is intended to be idempotent/self-healing. Consider reconciling the full initcontainer spec (or at least the required securityContext + mount) instead of only image/args.

Suggested change
changed := false
if initConts[0].Image != dp.Spec.InitImage {
initConts[0].Image = dp.Spec.InitImage
changed = true
}
args := getInitArgs(dp)
if !changed {
changed = slices.Compare(args, initConts[0].Args) != 0
}
initConts[0].Args = args
return changed
// Reconcile the full init container spec using the same helper that is
// used when creating it, so that any drift in securityContext, mounts,
// etc. is also corrected.
desiredSpec := ds.Spec.Template.Spec
setInitContainer(&desiredSpec, dp.Spec.InitImage, getInitArgs(dp))
// Defensive check: setInitContainer should always create exactly one init container.
if len(desiredSpec.InitContainers) != 1 {
return false
}
if !reflect.DeepEqual(initConts[0], desiredSpec.InitContainers[0]) {
ds.Spec.Template.Spec.InitContainers[0] = desiredSpec.InitContainers[0]
return true
}
return false

Copilot uses AI. Check for mistakes.
Comment thread cmd/gpu_init/main.go
Comment on lines +35 to +52
// Unbinds Intel GPU devices from xe and i915 drivers and binds them to vfio-pci driver.
func main() {
var (
denyIds string
allowIds string
)

flag.StringVar(&denyIds, "deny-ids", "", "Comma-separated list of device IDs to deny (0x1234 format)")
flag.StringVar(&allowIds, "allow-ids", "", "Comma-separated list of device IDs to allow (0x1234 format)")

flag.Parse()

if err := pluginutils.ValidatePCIDeviceIDs(allowIds); err != nil {
klog.Fatalf("allow ID validation failed: %+v", err)
}
if err := pluginutils.ValidatePCIDeviceIDs(denyIds); err != nil {
klog.Fatalf("deny ID validation failed: %+v", err)
}
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces a new command that performs host driver rebinding and option validation, but there are no direct tests for its mode/validation behavior (e.g., allow+deny mutual exclusion, invalid ID handling). Consider extracting the core logic into a testable function and adding unit tests using a fake sysfs tree (similar to pkg/pluginutils/pci_test.go) to cover success and failure paths.

Copilot uses AI. Check for mistakes.
Move common code under pluginutils and modify the
old nfdhook to rebind GPU devices from xe/i915 to vfio-pci.
New resource will be gpu.intel.com/vfio.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
Also fix "pref allocation policy" check

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
And add a kubevirt specific file.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
@tkatila tkatila force-pushed the gpu-kubevirt-support branch from 13d3234 to 1a775ae Compare February 11, 2026 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants