Skip to content

nvidia-ctk cdi generate: libdxcore.so not found on WSL2 despite being present #1739

@tyeth-ai-assisted

Description

@tyeth-ai-assisted

Description

nvidia-ctk cdi generate on WSL2 correctly auto-detects WSL mode but fails to locate libdxcore.so, logging:

level=warning msg="Could not locate libdxcore.so: pattern libdxcore.so not found"

The library IS present on the system at /usr/lib/x86_64-linux-gnu/libdxcore.so and is registered in ldconfig:

$ ldconfig -p | grep dxcore
libdxcore.so (libc6,x86-64) => /usr/lib/wsl/lib/libdxcore.so

Without libdxcore.so in the CDI spec, NVML initialization fails inside containers with Failed to initialize NVML: N/A, because libdxcore.so bridges the Linux NVML API to the Windows DirectX GPU Kernel via /dev/dxg.

Environment

  • OS: Windows WSL2 (Linux 6.6.87.2-microsoft-standard-WSL2)
  • GPU: NVIDIA GeForce RTX 5070 Laptop GPU
  • Driver: NVIDIA 595.71, CUDA 13.2
  • nvidia-container-toolkit: installed from nvidia apt repo (bundled in OpenShell cluster image)
  • libdxcore.so locations:
    • /usr/lib/wsl/lib/libdxcore.so (WSL mount)
    • /usr/lib/x86_64-linux-gnu/libdxcore.so (symlink/copy in gateway container)

Steps to Reproduce

  1. Run on WSL2 with NVIDIA GPU
  2. nvidia-ctk cdi generate --output=/tmp/nvidia.yaml
  3. Observe warning: "Could not locate libdxcore.so: pattern libdxcore.so not found"
  4. Inspect generated CDI spec — libdxcore.so not in mounts
  5. Start a container using the CDI spec
  6. nvidia-smi inside container fails: Failed to initialize NVML: N/A

Expected Behavior

nvidia-ctk cdi generate should find libdxcore.so at /usr/lib/wsl/lib/libdxcore.so (or via ldconfig) and include it in the CDI spec mounts.

Workaround

Manually add libdxcore.so to the generated CDI spec:

# Add to containerEdits.mounts:
- hostPath: /usr/lib/x86_64-linux-gnu/libdxcore.so
  containerPath: /usr/lib/x86_64-linux-gnu/libdxcore.so
  options:
    - ro
    - nosuid
    - nodev
    - rbind
    - rprivate

And add --folder /usr/lib/x86_64-linux-gnu to the update-ldcache hook args.

After this fix plus setting mode = "cdi" in the nvidia container runtime config, nvidia-smi and NVML work correctly inside containers on WSL2.

Impact

This bug breaks the entire NVIDIA k8s device plugin stack on WSL2 when using CDI mode, since the device plugin requires NVML initialization which depends on libdxcore.so.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions