Add CDI device selector and Nvidia GPU passthrouh for Linux#301
Add CDI device selector and Nvidia GPU passthrouh for Linux#301Macbucheron1 wants to merge 10 commits intoThePorgs:devfrom
Conversation
Dramelac
left a comment
There was a problem hiding this comment.
Thank you for this PR !
Here is a first review with easy to fix comment or global question to review later with the team.
| message = message.replace('[', '\\[') | ||
| logger.error(f"Docker raised a critical error when starting the container [green]{self.name}[/green], error message is: {message}") | ||
| if "cdi device injection failed" in lower_message and "nvidia.com/gpu=all" in lower_message: | ||
| logger.warning("Hint: verify NVIDIA CDI is configured (e.g. nvidia-container-toolkit installed and Docker CDI enabled).") |
There was a problem hiding this comment.
Can we check with docker info / from the docker daemon SDK if the nvidia toolkit is enabled ?
PS: can we link the user to the nvidia doc on how-to install the nvidia toolkit for users who don't know this ?
There was a problem hiding this comment.
We can only check whether Docker currently sees NVIDIA CDI devices or not. That does not strictly tell us whether the NVIDIA toolkit is enabled, since the CDI spec may simply not be generated or discovered yet. Docker exposes CDI support and discovered devices in docker info, so this is more a runtime visibility check than a toolkit check.
If we want to handle NVIDIA separately, we could also check for the presence of nvidia-ctk, since that is the tool NVIDIA provides to configure the toolkit and generate CDI specs.
$ docker info
Client:
Version: 29.2.1
Context: default
...
CDI spec directories:
/etc/cdi
/var/run/cdi
Discovered Devices:
cdi: nvidia.com/gpu=0
cdi: nvidia.com/gpu=all
...And using the SDK:
$ python3
Python 3.13.12 (main, Feb 3 2026, 17:53:27) [GCC 15.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import docker
>>> info = docker.from_env().info()
>>> print(info.get("DiscoveredDevices",[]))
[{'Source': 'cdi', 'ID': 'nvidia.com/gpu=0'}, {'Source': 'cdi', 'ID': 'nvidia.com/gpu=all'}]added the link to nvidia doc in 24cc238. Also removed about Docker CDI enabled since it is enable by default since v27
2d77e82 to
af8f707
Compare
|
Ok, after a more careful reading of the Docker documentation, CDI is currently Linux-only. On Windows, Docker Desktop documents GPU access through WSL2, but the documented path is
As mentioned in my initial message, CDI selectors are more generic since they can target any vendor/device (not only GPU) exposing a CDI spec. That is still the main reason I went with the CDI path here. But, as of Docker Engine 29.3 (05 march 2026), So to summarize:
So I think this PR is still useful as a generic CDI foundation, but for GPU-specific UX, a If you think this is a better idea I can modify to keep the CDI support but use |
Description
Note
AI tooling was used extensively while developing this PR. I did test and reviewed the resulting implementation.
Note
The idea of exposing GPUs to Exegol containers using CDI was inspired by https://github.com/p3ta00/exegol-gpu
This PR adds support for Docker CDI device selectors in Exegol.
It keeps the existing
--deviceCLI interface, but distinguishes internally between:/dev/...)vendor.com/class=name)The two cases are forwarded to the appropriate Docker API fields:
devicesdevice_requestsThis keeps existing behaviour unchanged for traditional device mappings while making it easier to use CDI-exposed devices, such as GPUs.
This PR also adds a
--gpu nvidiaconvenience flag on Linux hosts, acting as a shortcut for:-d nvidia.com/gpu=allThe implementation relies on CDI device selectors (e.g.
nvidia.com/gpu=all) rather than the Docker--gpusflag.CDI provides a vendor-neutral mechanism for exposing hardware devices to containers. In theory, any GPU supported by a CDI specification (e.g.
amd.com/gpuorintel.com/gpu) could be exposed in the same way using--device, without requiring changes in Exegol.Test
Before

After

This has only been tested on Linux x86_64 using nvidia gpu
Related issues
No related issue, but this is related to this pull request and is narrower in scope.
Instead of adding GPU detection or automatic GPU enablement, this patch adds generic CDI selector support to the existing
--deviceinput, while routing it to the proper Docker API field internally.Point of attention
--gpuflag is currently limited to Linux hosts and is implemented as a convenience shortcut for NVIDIA CDI passthrough.This has only been validated on Linux Docker hosts. In theory it could also work on Docker Desktop for Windows with WSL2 GPU support (see) but I did not try