Standardize internal version checks in cuda.core by Andy-Jost · Pull Request #1825 · NVIDIA/cuda-python

Andy-Jost · 2026-03-25T20:07:50Z

Summary

Centralizes all internal CUDA version checking into a new cuda/core/_utils/version.{pyx,pxd} module, replacing ad-hoc patterns scattered across multiple files. Both Python (binding_version, driver_version) and Cython (cy_binding_version, cy_driver_version) accessors are provided, each returning cached (major, minor, patch) tuples.

Changes

New module cuda/core/_utils/version.{pyx,pxd} with:
- binding_version() / cy_binding_version(): returns (major, minor, patch) from cuda-bindings metadata, cached via @functools.cache (Python) and module-level globals (Cython)
- driver_version() / cy_driver_version(): returns (major, minor, patch) converted from cuDriverGetVersion, same caching strategy
Updated all internal version comparisons to use the new functions and tuple format (e.g., (12, 3, 0) instead of 12030 or (12, 3))
Removed dead code in _module.pyx, _launcher.pyx, and _launch_config.pyx:
- _lazy_init / _inited / _lock / _use_ex lazy initialization machinery
- _kernel_ctypes / _get_kernel_ctypes / _is_cukernel_get_library_supported unused helpers
- Pre-CUDA 12.0 version gates (CUDA 12.0 is the minimum supported version)
Standardized error messages to format versions as '.'.join(map(str, version_tuple))
Removed get_binding_version and get_driver_version from cuda_utils.pyx
Updated all test files to use the new version module
Left untouched: the public NVML-based system.get_driver_version API, which serves a different purpose (user-facing system info vs. internal feature gating)

Test Coverage

Updated test_device.py, test_module.py, test_program.py, test_cuda_utils.py, test_optional_dependency_imports.py, and graph/test_explicit.py to use new version functions
_program.pyx and _linker.pyx use the Python def versions (binding_version, driver_version) so that test_optional_dependency_imports.py can continue to monkeypatch them for controlled testing

Made with Cursor

Move the GraphBuilder/Graph/GraphCompleteOptions/GraphDebugPrintOptions implementation out of _graph/__init__.py into _graph/_graph_builder.pyx so it is compiled by Cython. A thin __init__.py re-exports the public names so all existing import sites continue to work unchanged. Cython compatibility adjustments: - Remove `from __future__ import annotations` (unsupported by Cython) - Remove TYPE_CHECKING guard; quote annotations that reference Stream (circular import), forward-reference GraphBuilder/Graph, or use X | None union syntax - Update _graphdef.pyx lazy imports to point directly at _graph_builder No build_hooks.py changes needed — the build system auto-discovers .pyx files via glob. Ref: NVIDIA#1076 Made-with: Cursor

Replace the per-module _lazy_init / _inited / _driver_ver / _py_major_minor pattern in _graph_builder.pyx with direct calls to centralized cached functions in cuda_utils: - Add get_driver_version() with @functools.cache alongside get_binding_version - Switch get_binding_version from @functools.lru_cache to @functools.cache (cleaner for nullary functions) - Fix split() to return tuple(result) — Cython enforces return type annotations unlike pure Python - Fix _cond_with_params annotation from -> GraphBuilder to -> tuple to match actual return value Made-with: Cursor

Implements NVIDIA#1328: host callbacks during stream capture via cuLaunchHostFunc, mirroring the existing GraphDef.callback API. Extracts shared callback infrastructure (_attach_user_object, _attach_host_callback_to_graph, trampoline/destructor) into a new _graph/_utils.pyx module to avoid circular imports between _graph_builder and _graphdef. Made-with: Cursor

Move binding and driver version queries into a dedicated cuda/core/_utils/version.{pyx,pxd} module, providing both Python (binding_version, driver_version) and Cython (cy_binding_version, cy_driver_version) entry points. All functions return version tuples ((major, minor, patch)) and are cached—Python via @functools.cache, Cython via module-level globals. Remove get_binding_version / get_driver_version from cuda_utils.pyx and update all internal call sites and tests to import from the new module. Remove version checks for CUDA < 12.0 (now the minimum) and eliminate dead code exposed by the migration: _lazy_init / _use_ex / _kernel_ctypes / _is_cukernel_get_library_supported machinery in _module.pyx, _launcher.pyx, and _launch_config.pyx. The public NVML-based system.get_driver_version API is unrelated and left unchanged. Made-with: Cursor

Made-with: Cursor # Conflicts: # cuda_core/cuda/core/_graph/_graph_builder.pyx # cuda_core/cuda/core/_utils/cuda_utils.pyx

Remove unused imports flagged by cython-lint and ruff after resolving merge conflicts with origin/main. Made-with: Cursor

Remove the now-dead _reduce_3_tuple helper from cuda_utils.pyx. Made-with: Cursor

Initialize _use_nvjitlink_backend to None so it can serve as its own "already decided" sentinel, eliminating the redundant _driver_ver variable and the driver_version() call that was only used to set it. Made-with: Cursor

Add -> tuple[int, int, int] annotations to binding_version and driver_version. Align driver_version arithmetic with _system.pyx. Made-with: Cursor

rwgk

I think it'd be better if "driver version" is a two-tuple. See the long comment.

The cuda-bindings version is a three-tuple, but the driver version is not.

rwgk · 2026-03-25T22:30:09Z

cuda_core/cuda/core/_utils/version.pyx

+    try:
+        parts = importlib.metadata.version("cuda-bindings").split(".")[:3]
+    except importlib.metadata.PackageNotFoundError:
+        parts = importlib.metadata.version("cuda-python").split(".")[:3]


I'd put a simple sanity check:

assert len(parts) == 3

Cheap

Should never fire

But if it does, it can prevent a lot of confusion

(I'd deliberately not make this any more sophisticated.)

This is covered by the trailing [:3] at lines 15, 17.

rwgk · 2026-03-25T22:51:35Z

cuda_core/cuda/core/_utils/version.pyx

+def driver_version() -> tuple[int, int, int]:
+    """Return the CUDA driver version as a (major, minor, patch) triple."""
+    cdef int ver = handle_return(driver.cuDriverGetVersion())
+    return (ver // 1000, (ver // 10) % 100, ver % 10)


According to the below (ChatGPT Pro, instant), this is "technically correct but semantically misleading." I'd go with Option B.

User Prompt

I'm reviewing this cython code (NVIDIA/cuda-python repo):

@functools.cache def driver_version() -> tuple[int, int, int]: """Return the CUDA driver version as a (major, minor, patch) triple.""" cdef int ver = handle_return(driver.cuDriverGetVersion()) return (ver // 1000, (ver // 10) % 100, ver % 10)

On the CTK side I see this (the last digit is always zero, irrespective of the patch version):

$ grep '#define CUDA_VERSION' $cudah ./12.0.1_525.85.12/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 12000 ./12.1.1_530.30.02/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 12010 ./12.2.2_535.104.05/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 12020 ./12.3.2_545.23.08/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 12030 ./12.4.1_550.54.15/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 12040 ./12.5.1_555.42.06/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 12050 ./12.6.3_560.35.05/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 12060 ./12.8.1_570.124.06/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 12080 ./12.9.1_575.57.08/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 12090 ./13.0.2_580.95.05/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 13000 ./13.1.1_590.48.01/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 13010 ./13.2.0_595.45.04/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 13020

But what about the driver? Does the last digit there have a significance? Is it ever different from 0?

Assistant Response

Short answer: your instinct is right to question it — the “patch” digit in cuDriverGetVersion() is not meaningful in practice and is effectively always 0.

1. What cuDriverGetVersion() actually returns

From the CUDA driver API, cuDriverGetVersion() returns an integer encoding:

version = 1000 * major + 10 * minor

That’s it. There is no patch component encoded.

Examples:

CUDA 12.6 → 12060

CUDA 13.1 → 13010

If decoded as:

(ver // 1000, (ver // 10) % 100, ver % 10)

you get (12, 6, 0) — but that last 0 is just an artifact.

2. Why your CTK observation matches

CUDA_VERSION uses the same encoding:

#define CUDA_VERSION 12060

So:

Toolkit patch versions (e.g. 12.6.3) are not represented

The last digit is always 0

3. Driver vs Toolkit versioning

CUDA Toolkit

Has real patch versions (e.g. 12.6.3)

CUDA_VERSION does not encode them

CUDA Driver

Version tied to NVIDIA driver (e.g. 560.35.05)

cuDriverGetVersion() reports only CUDA API compatibility level

4. Is the last digit ever non-zero?

No — it is structurally always zero.

5. Practical implication

Your code is technically correct but semantically misleading:

return (ver // 1000, (ver // 10) % 100, ver % 10)

It suggests a meaningful patch version, which does not exist.

6. Recommendation

Option A (preferred):

def driver_version() -> tuple[int, int]: return (ver // 1000, (ver // 10) % 100)

Option B (if API stability matters):
Keep the triple but document that patch is always zero.

"""Return (major, minor, patch=0). Patch is always 0 for driver."""

Bottom line

Patch digit is always zero

No semantic meaning

Treat as (major, minor)

Bonus

To get the actual driver version (e.g. 560.35.05), use NVML:

nvmlSystemGetDriverVersion()

I was going to ask whether to make bindings a pair or triple until I found the three-part format in the existing code here.

The danger with mixing these is that comparisons between two- and three-part tuples will give inconsistent and surprising results (see below). Rather than trying to remember which version uses which format, it's better to follow a uniform rule: make every version a three-tuple and always compare against a three-tuple. That's a simple rule everyone can remember, and it makes inconsistencies easy to spot.

>>> (12, 3) <= (12, 3, 0) True >>> (12, 3) >= (12, 3, 0) False

Andy-Jost · 2026-03-26T01:22:15Z

cuda_core/cuda/core/_program.pyx

+        version = binding_version()
+        if version < (12, 9, 0):
            raise RuntimeError(
-                f"NVVM bindings require cuda-bindings >= 12.9.0, but found {version[0]}.{version[1]}.x. "


Here is the old code using a three-part version for cuda-bindings.

github-actions · 2026-03-30T00:21:01Z

Doc Preview CI
Preview removed because the pull request was closed or merged.

Andy-Jost added 6 commits March 24, 2026 10:18

Merge remote-tracking branch 'origin/main' into cythonize-graph-builder

9b19dbf

Merge remote-tracking branch 'origin/main' into cythonize-graph-builder

6d82d8f

Made-with: Cursor # Conflicts: # cuda_core/cuda/core/_graph/_graph_builder.pyx # cuda_core/cuda/core/_utils/cuda_utils.pyx

Andy-Jost added this to the cuda.core v0.7.0 milestone Mar 25, 2026

Andy-Jost added enhancement Any code-related improvements P1 Medium priority - Should do cuda.core Everything related to the cuda.core module labels Mar 25, 2026

Andy-Jost self-assigned this Mar 25, 2026

Andy-Jost requested review from cpcloud, leofang, mdboom and rwgk March 25, 2026 20:07

This comment has been minimized.

Sign in to view

Fix unused imports after merge with main

51b8f63

Remove unused imports flagged by cython-lint and ruff after resolving merge conflicts with origin/main. Made-with: Cursor

Andy-Jost force-pushed the standardize-version-checks branch from 645ccb9 to 51b8f63 Compare March 25, 2026 21:16

Andy-Jost added 2 commits March 25, 2026 14:28

Replace _reduce_3_tuple with math.prod in _launcher.pyx

110d6de

Remove the now-dead _reduce_3_tuple helper from cuda_utils.pyx. Made-with: Cursor

rparolin approved these changes Mar 25, 2026

View reviewed changes

Add return type annotations to version.pyx; fix minor arithmetic

af3d275

Add -> tuple[int, int, int] annotations to binding_version and driver_version. Align driver_version arithmetic with _system.pyx. Made-with: Cursor

Andy-Jost enabled auto-merge (squash) March 25, 2026 22:04

rwgk reviewed Mar 25, 2026

View reviewed changes

Andy-Jost commented Mar 26, 2026

View reviewed changes

Andy-Jost merged commit 4cea0a1 into NVIDIA:main Mar 26, 2026
249 of 253 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standardize internal version checks in cuda.core#1825

Standardize internal version checks in cuda.core#1825
Andy-Jost merged 10 commits intoNVIDIA:mainfrom
Andy-Jost:standardize-version-checks

Andy-Jost commented Mar 25, 2026

Uh oh!

This comment has been minimized.

rwgk left a comment

Uh oh!

rwgk Mar 25, 2026

Uh oh!

Andy-Jost Mar 26, 2026

Uh oh!

rwgk Mar 25, 2026

Uh oh!

Andy-Jost Mar 26, 2026 •

edited

Loading

Uh oh!

Andy-Jost Mar 26, 2026

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Andy-Jost commented Mar 25, 2026

Summary

Changes

Test Coverage

Uh oh!

This comment has been minimized.

rwgk left a comment

Choose a reason for hiding this comment

Uh oh!

rwgk Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Andy-Jost Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

rwgk Mar 25, 2026

Choose a reason for hiding this comment

User Prompt

Assistant Response

1. What cuDriverGetVersion() actually returns

2. Why your CTK observation matches

3. Driver vs Toolkit versioning

4. Is the last digit ever non-zero?

5. Practical implication

6. Recommendation

Bottom line

Bonus

Uh oh!

Andy-Jost Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Andy-Jost Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

1. What `cuDriverGetVersion()` actually returns

Andy-Jost Mar 26, 2026 •

edited

Loading