Skip to content

Add CUDA toolkit major version check#140

Draft
jacobtomlinson wants to merge 2 commits intorapidsai:mainfrom
jacobtomlinson:check/cuda-major-mismatch
Draft

Add CUDA toolkit major version check#140
jacobtomlinson wants to merge 2 commits intorapidsai:mainfrom
jacobtomlinson:check/cuda-major-mismatch

Conversation

@jacobtomlinson
Copy link
Member

@jacobtomlinson jacobtomlinson commented Mar 5, 2026

Adds a check that uses cuda.pathfinder to find your CUDA Toolkit and then compares the major version with the driver.

xref #139

@jacobtomlinson jacobtomlinson requested review from a team as code owners March 5, 2026 10:58
@jacobtomlinson jacobtomlinson requested a review from jameslamb March 5, 2026 10:58
Comment on lines +33 to +34
get_driver_cuda_major=_get_driver_cuda_major,
get_toolkit_cuda_major=_get_toolkit_cuda_major,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with a dependency injection approach here after chatting about it with @mmccarty to make testing easier.

I haven't refactored other checks to reuse this to keep this PR simpler, but we could do that in the future.

Comment on lines +24 to +28
version_file = Path(header_dir) / "cuda_runtime_version.h"
if not version_file.exists():
return None
match = re.search(r"#define\s+CUDA_VERSION\s+(\d+)", version_file.read_text())
return int(match.group(1)) // 1000 if match else None
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if this is the best way to get the CUDA Toolkit version.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was doing some digging to see if we could pull it from cudart via python API if it was available, because cudaRuntimeGetVersion exists

https://docs.nvidia.com/cuda/archive/9.0/cuda-runtime-api/group__CUDART____VERSION.html#group__CUDART____VERSION_1g0e3952c7802fd730432180f1f4a6cdc6

but i wasn't able to do something like

from cuda import cudart
cudart.cudaRuntimeGetVersion()

BUt with the help of perplexity, I was able to get the version using ctypes and accessing libcudart.

Idk if it's cleaner though. BUt it would be something like this

import ctypes
from ctypes import byref, c_int

libcudart = ctypes.cdll.LoadLibrary("libcudart.so")  # conda cuda-cudart provides this

cudaRuntimeGetVersion = libcudart.cudaRuntimeGetVersion
cudaRuntimeGetVersion.argtypes = [ctypes.POINTER(c_int)]
cudaRuntimeGetVersion.restype = c_int

ver = c_int()
err = cudaRuntimeGetVersion(byref(ver))
if err != 0:
    raise RuntimeError(f"cudaRuntimeGetVersion failed with error code {err}")

ver_int = ver.value
major = ver_int // 1000
minor = (ver_int % 1000) // 10
print("CUDA runtime version:", ver_int, f"({major}.{minor})")

Comment on lines +63 to +65
f"CUDA toolkit major version ({toolkit_major}) is newer than what the installed driver supports "
f"({driver_major}). Update your NVIDIA driver to one that supports CUDA {toolkit_major} or "
f"downgrade your CUDA toolkit to CUDA {driver_major}."
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could improve these errors. It would be nice to detect how CUDA Toolkit has been installed (system, conda, pip, etc) and provide more nuanced advice for the user.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do that via python, for example I'm in conda environment that has cudf and cuml and you can access that info via

>>> from cuda import pathfinder
>>> loaded = pathfinder.load_nvidia_dynamic_lib("cudart")
>>> loaded.abs_path
'/raid/myuser/conda/envs/ray-cuml/lib/libcudart.so'
>>> loaded.found_via
'conda'

and on a different conda env, that only has cuda-python, but that doesn't have cuda-runtime installed I get this

>>> from cuda import pathfinder
>>> loaded = pathfinder.load_nvidia_dynamic_lib("cudart")
>>> loaded.abs_path
'/usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.13'
>>> loaded.found_via
'system-search'

@jacobtomlinson jacobtomlinson marked this pull request as draft March 5, 2026 11:05
@jacobtomlinson
Copy link
Member Author

@jayavenkatesh19 I just pushed this draft up to share more broadly, but if you want to take over this I'd be more than happy.

Comment on lines +63 to +68
if toolkit_major < driver_major:
raise ValueError(
f"CUDA toolkit major version ({toolkit_major}) is older than the driver's supported CUDA major version "
f"({driver_major}). Upgrade your CUDA toolkit to CUDA {driver_major} or "
f"downgrade your NVIDIA driver to one that supports CUDA {toolkit_major}."
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't necessarily be an error, a newer driver is ok as long as the CTK major matches all the packages. The problem would be when you have driver CUDA 13, with CTK 12 but a foo-cu13 Python package. E.g rapidsai/deployment#516

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants