Skip to content

Kubernetes DRA CDI devices are not injected when using cri-dockerd + Docker runtime #558

@xwonec

Description

@xwonec

Environment

  • Kubernetes: v1.35.3
  • Runtime: cri-dockerd
  • Docker Engine: v29.4.0
  • NVIDIA DRA Driver: v25.12.0
  • NVIDIA Driver: 595.71.05
  • OS: Ubuntu 22.04

Docker daemon configuration:

{
    "data-root": "/data/docker",
    "default-runtime": "nvidia",
    "exec-opts": [
        "native.cgroupdriver=systemd"
    ],
    "features": {
        "cdi": true
    },
    "log-opts": {
        "max-file": "3",
        "max-size": "5m"
    },
    "registry-mirrors": [
        "https://docker.1ms.run"
    ],
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    }
}

Problem

I am testing Kubernetes DRA with the NVIDIA DRA driver and CDI enabled.

The DRA driver appears to work correctly:

  • ResourceClaim is allocated successfully
  • CDI YAML files are generated under /var/run/cdi
  • The kubelet plugin logs show successful preparation of CDI devices

Example plugin log:

Returning newly prepared devices for claim 'gpu-test2/pod-shared-gpu-xxxx':
[{[gpu] node2 gpu-0 [k8s.gpu.nvidia.com/claim=xxxx-gpu-0]}]

Generated CDI specs exist:

ls /var/run/cdi/

k8s.gpu.nvidia.com-claim_xxx.yaml
nvidia.yaml

However, inside the Kubernetes Pod:

  • /dev/nvidia* devices are missing
  • nvidia-smi is unavailable
  • no GPU devices are injected into the container

Example:

ls /dev/

core fd null pts random shm stderr stdin stdout tty urandom zero

No NVIDIA devices are present.

Observation

This seems to indicate that:

  1. Kubernetes DRA allocation succeeds
  2. CDI specs are generated correctly
  3. But CDI devices are not propagated through cri-dockerd into Docker runtime

It looks like the CDIDevices field from CRI may not be forwarded by cri-dockerd.

Additional Notes

Docker CDI itself appears functional.

For example, Docker standalone CDI/device injection works correctly outside Kubernetes.

The issue only occurs in the Kubernetes + cri-dockerd path.

Questions

  1. Does cri-dockerd currently support Kubernetes DRA CDI device propagation?
  2. Is the CRI CDIDevices field implemented and forwarded to Docker Engine?
  3. Is Kubernetes DRA officially supported with cri-dockerd?
  4. If not supported yet, are there plans to add support?

Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions