Add CUDA toolkit check to `rapids doctor` by jayavenkatesh19 · Pull Request #141 · rapidsai/rapids-cli

jayavenkatesh19 · 2026-03-11T23:52:15Z

Adds a new rapids doctor check that verifies that the CUDA toolkit (will refer to this as CTK from here on) is findable and version-compatible with the GPU driver.

These are the things the check does:

Library discoverability: Use cuda-pathfinder to verify that CUDA libraries can be loaded at runtime. The CTK itself has many libraries, some of which are not necessary for every RAPIDS operation. For now, this check verifies that libcudart.so, libnvrtc.so and libnvvm.so. These 3 were chosen because they are more commonly used (cudart is required for all CUDA operations, while nvrtc and nvvm are used in JIT compilation). This can be extended to add other libraries of interest in the CTK, but to keep it universal and based on frequency of usage, I am checking for these 3 currently.
Toolkit vs driver version: Detects when CTK major version is newer than the driver. Backward compatibility is supported. Version detection tries header parsing first (got this from Add CUDA toolkit major version check #140 Thanks @jacobtomlinson), and falls back to cudaRuntimeGetVersion (got the snippet from @ncclementi's comment on the PR above) for conda/pip environment as they do not ship dev headers.
System installation checks: When CTK is not installed via conda/pip, it checks the /usr/local/cuda symlink and the CUDA_HOME/CUDA_PATH variables for version mismatches.

I based the order and the checks themselves after the load_nvidia_dynamic_lib documentation page for cuda-pathfinder, where the search order is specified as site-packages (pip) -> conda -> OS defaults -> CUDA_HOME

One scenario which isn't covered by these tests is described in this comment. This check was originally only meant to test out compatibility and discoverability between the CTK and the GPU driver but not if the python packages match with the CTK. For pip packages, reading the suffixes seems like an easy enough way to do it, but I'm not sure on how we would do that for conda packages.

Signed-off-by: Jaya Venkatesh <jjayabaskar@nvidia.com>

jacobtomlinson

Overall this looks great.

I left a comment about trying to use cuda.core.system instead of pynvml. I'm not sure if it supports enough features for us, but if we can we should.

I also notice the tests have a lot of mocking in them. Perhaps the dependency injection approach @mmccarty was exploring in #137 would help clean these up?

Also it looks like CI is failing because coverage has dropped below 95%.

jacobtomlinson · 2026-03-12T11:05:26Z

rapids_cli/doctor/checks/cuda_toolkit.py

+def _extract_major_from_cuda_path(path: Path) -> int | None:
+    """Extract CUDA major version from a path like /usr/local/cuda-12.4 or its version.txt."""
+    match = re.search(r"cuda-(\d+)", str(path))
+    if match:
+        return int(match.group(1))
+    version_file = path / "version.txt"
+    if version_file.exists():
+        match = re.search(r"(\d+)\.", version_file.read_text())
+        if match:
+            return int(match.group(1))
+    return None


There may be situations where multiple CTKs are installed. In this case we need to check which one /usr/local/cuda is symlinked to, as that will be the active one.

jacobtomlinson · 2026-03-12T11:07:53Z

rapids_cli/doctor/checks/cuda_toolkit.py

+        pynvml.nvmlInit()
+        driver_major = pynvml.nvmlSystemGetCudaDriverVersion() // 1000


Could we use cuda.core.system.get_driver_version() instead here, if we can it would be more future proof.

jayavenkatesh19 added 5 commits March 11, 2026 11:46

added cuda toolkit check

6df7ef4

Signed-off-by: Jaya Venkatesh <jjayabaskar@nvidia.com>

fixed formatting and error messages

bff8cb3

Signed-off-by: Jaya Venkatesh <jjayabaskar@nvidia.com>

fixed system_path checks

f358112

Signed-off-by: Jaya Venkatesh <jjayabaskar@nvidia.com>

fixed docstring

5a44201

Signed-off-by: Jaya Venkatesh <jjayabaskar@nvidia.com>

made the code more modular

4885ac9

Signed-off-by: Jaya Venkatesh <jjayabaskar@nvidia.com>

jayavenkatesh19 self-assigned this Mar 11, 2026

jayavenkatesh19 requested review from a team as code owners March 11, 2026 23:52

jayavenkatesh19 requested a review from msarahan March 11, 2026 23:52

fixed formatting

ea65345

Signed-off-by: Jaya Venkatesh <jjayabaskar@nvidia.com>

jayavenkatesh19 removed the request for review from msarahan March 11, 2026 23:57

jacobtomlinson reviewed Mar 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CUDA toolkit check to `rapids doctor`#141

Add CUDA toolkit check to `rapids doctor`#141
jayavenkatesh19 wants to merge 6 commits intorapidsai:mainfrom
jayavenkatesh19:feat/cuda-toolkit-check

jayavenkatesh19 commented Mar 11, 2026 •

edited

Loading

Uh oh!

jacobtomlinson left a comment •

edited

Loading

Uh oh!

jacobtomlinson Mar 12, 2026

Uh oh!

jacobtomlinson Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		pynvml.nvmlInit()
		driver_major = pynvml.nvmlSystemGetCudaDriverVersion() // 1000

Conversation

jayavenkatesh19 commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jacobtomlinson left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jacobtomlinson Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

jacobtomlinson Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jayavenkatesh19 commented Mar 11, 2026 •

edited

Loading

jacobtomlinson left a comment •

edited

Loading