Skip to content

Add CDI device selector and Nvidia GPU passthrouh for Linux#301

Open
Macbucheron1 wants to merge 10 commits intoThePorgs:devfrom
Macbucheron1:add-cdi-support
Open

Add CDI device selector and Nvidia GPU passthrouh for Linux#301
Macbucheron1 wants to merge 10 commits intoThePorgs:devfrom
Macbucheron1:add-cdi-support

Conversation

@Macbucheron1
Copy link
Copy Markdown

@Macbucheron1 Macbucheron1 commented Mar 8, 2026

Description

Note

AI tooling was used extensively while developing this PR. I did test and reviewed the resulting implementation.

Note

The idea of exposing GPUs to Exegol containers using CDI was inspired by https://github.com/p3ta00/exegol-gpu

This PR adds support for Docker CDI device selectors in Exegol.

It keeps the existing --device CLI interface, but distinguishes internally between:

  • classic host device paths (/dev/...)
  • CDI selectors (vendor.com/class=name)

The two cases are forwarded to the appropriate Docker API fields:

  • path devices → devices
  • CDI selectors → device_requests

This keeps existing behaviour unchanged for traditional device mappings while making it easier to use CDI-exposed devices, such as GPUs.

This PR also adds a --gpu nvidia convenience flag on Linux hosts, acting as a shortcut for:

  • -d nvidia.com/gpu=all

The implementation relies on CDI device selectors (e.g. nvidia.com/gpu=all) rather than the Docker --gpus flag.
CDI provides a vendor-neutral mechanism for exposing hardware devices to containers. In theory, any GPU supported by a CDI specification (e.g.amd.com/gpu or intel.com/gpu) could be exposed in the same way using--device, without requiring changes in Exegol.

Test

Before
image

After
image

This has only been tested on Linux x86_64 using nvidia gpu

Related issues

No related issue, but this is related to this pull request and is narrower in scope.

Instead of adding GPU detection or automatic GPU enablement, this patch adds generic CDI selector support to the existing --device input, while routing it to the proper Docker API field internally.

Point of attention

--gpu flag is currently limited to Linux hosts and is implemented as a convenience shortcut for NVIDIA CDI passthrough.

This has only been validated on Linux Docker hosts. In theory it could also work on Docker Desktop for Windows with WSL2 GPU support (see) but I did not try

@Macbucheron1 Macbucheron1 changed the title Add CDI device selector and GPU support for Linux Add CDI device selector and Nvidia GPU support for Linux Mar 8, 2026
@Macbucheron1 Macbucheron1 changed the title Add CDI device selector and Nvidia GPU support for Linux Add CDI device selector and Nvidia GPU passthrouh for Linux Mar 8, 2026
Copy link
Copy Markdown
Member

@Dramelac Dramelac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR !
Here is a first review with easy to fix comment or global question to review later with the team.

Comment thread exegol/model/ExegolContainer.py Outdated
message = message.replace('[', '\\[')
logger.error(f"Docker raised a critical error when starting the container [green]{self.name}[/green], error message is: {message}")
if "cdi device injection failed" in lower_message and "nvidia.com/gpu=all" in lower_message:
logger.warning("Hint: verify NVIDIA CDI is configured (e.g. nvidia-container-toolkit installed and Docker CDI enabled).")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check with docker info / from the docker daemon SDK if the nvidia toolkit is enabled ?

PS: can we link the user to the nvidia doc on how-to install the nvidia toolkit for users who don't know this ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can only check whether Docker currently sees NVIDIA CDI devices or not. That does not strictly tell us whether the NVIDIA toolkit is enabled, since the CDI spec may simply not be generated or discovered yet. Docker exposes CDI support and discovered devices in docker info, so this is more a runtime visibility check than a toolkit check.

If we want to handle NVIDIA separately, we could also check for the presence of nvidia-ctk, since that is the tool NVIDIA provides to configure the toolkit and generate CDI specs.

$ docker info
Client:
 Version:    29.2.1
 Context:    default
...
 CDI spec directories:
  /etc/cdi
  /var/run/cdi
 Discovered Devices:
  cdi: nvidia.com/gpu=0
  cdi: nvidia.com/gpu=all
 ...

And using the SDK:

$ python3
Python 3.13.12 (main, Feb  3 2026, 17:53:27) [GCC 15.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import docker
>>> info = docker.from_env().info()
>>> print(info.get("DiscoveredDevices",[]))
[{'Source': 'cdi', 'ID': 'nvidia.com/gpu=0'}, {'Source': 'cdi', 'ID': 'nvidia.com/gpu=all'}]

added the link to nvidia doc in 24cc238. Also removed about Docker CDI enabled since it is enable by default since v27

Comment thread exegol/model/ContainerConfig.py Outdated
Comment thread exegol/console/cli/actions/GenericParameters.py Outdated
Comment thread exegol/model/ExegolContainer.py Outdated
Comment thread exegol/model/ExegolContainer.py Outdated
Comment thread exegol/model/ContainerConfig.py Outdated
Comment thread exegol/model/ContainerConfig.py
Comment thread exegol/console/cli/actions/GenericParameters.py Outdated
Comment thread exegol/model/ContainerConfig.py Outdated
Comment thread exegol/model/ContainerConfig.py Outdated
@Macbucheron1
Copy link
Copy Markdown
Author

Macbucheron1 commented Mar 11, 2026

Ok, after a more careful reading of the Docker documentation, CDI is currently Linux-only. On Windows, Docker Desktop documents GPU access through WSL2, but the documented path is --gpus, not CDI selectors.

--gpus is also available on Linux.

As mentioned in my initial message, CDI selectors are more generic since they can target any vendor/device (not only GPU) exposing a CDI spec. That is still the main reason I went with the CDI path here. But, as of Docker Engine 29.3 (05 march 2026), --gpus is no longer strictly NVIDIA-only on Linux, since Docker now uses CDI-based injection for AMD GPUs as well. We can imagine that they will add more GPU support that way.

So to summarize:

  • -d with a CDI selector only works on Linux and should allow selecting any vendor/device that provides a CDI spec
  • --gpus works on Linux and on Windows Docker Desktop, but on Windows the documented path is still the regular GPU flow through WSL2 rather than CDI selectors (while they use CDI selector under the hood)

So I think this PR is still useful as a generic CDI foundation, but for GPU-specific UX, a --gpus abstraction may be simpler in the long run since it maps better to the documented Docker flow on both Linux and Windows.

If you think this is a better idea I can modify to keep the CDI support but use --gpus for the GPU support

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants