Skip to content

Standardize internal version checks in cuda.core#1825

Merged
Andy-Jost merged 10 commits intoNVIDIA:mainfrom
Andy-Jost:standardize-version-checks
Mar 26, 2026
Merged

Standardize internal version checks in cuda.core#1825
Andy-Jost merged 10 commits intoNVIDIA:mainfrom
Andy-Jost:standardize-version-checks

Conversation

@Andy-Jost
Copy link
Copy Markdown
Contributor

Summary

Centralizes all internal CUDA version checking into a new cuda/core/_utils/version.{pyx,pxd} module, replacing ad-hoc patterns scattered across multiple files. Both Python (binding_version, driver_version) and Cython (cy_binding_version, cy_driver_version) accessors are provided, each returning cached (major, minor, patch) tuples.

Changes

  • New module cuda/core/_utils/version.{pyx,pxd} with:
    • binding_version() / cy_binding_version(): returns (major, minor, patch) from cuda-bindings metadata, cached via @functools.cache (Python) and module-level globals (Cython)
    • driver_version() / cy_driver_version(): returns (major, minor, patch) converted from cuDriverGetVersion, same caching strategy
  • Updated all internal version comparisons to use the new functions and tuple format (e.g., (12, 3, 0) instead of 12030 or (12, 3))
  • Removed dead code in _module.pyx, _launcher.pyx, and _launch_config.pyx:
    • _lazy_init / _inited / _lock / _use_ex lazy initialization machinery
    • _kernel_ctypes / _get_kernel_ctypes / _is_cukernel_get_library_supported unused helpers
    • Pre-CUDA 12.0 version gates (CUDA 12.0 is the minimum supported version)
  • Standardized error messages to format versions as '.'.join(map(str, version_tuple))
  • Removed get_binding_version and get_driver_version from cuda_utils.pyx
  • Updated all test files to use the new version module
  • Left untouched: the public NVML-based system.get_driver_version API, which serves a different purpose (user-facing system info vs. internal feature gating)

Test Coverage

  • Updated test_device.py, test_module.py, test_program.py, test_cuda_utils.py, test_optional_dependency_imports.py, and graph/test_explicit.py to use new version functions
  • _program.pyx and _linker.pyx use the Python def versions (binding_version, driver_version) so that test_optional_dependency_imports.py can continue to monkeypatch them for controlled testing

Made with Cursor

Move the GraphBuilder/Graph/GraphCompleteOptions/GraphDebugPrintOptions
implementation out of _graph/__init__.py into _graph/_graph_builder.pyx
so it is compiled by Cython. A thin __init__.py re-exports the public
names so all existing import sites continue to work unchanged.

Cython compatibility adjustments:
- Remove `from __future__ import annotations` (unsupported by Cython)
- Remove TYPE_CHECKING guard; quote annotations that reference Stream
  (circular import), forward-reference GraphBuilder/Graph, or use
  X | None union syntax
- Update _graphdef.pyx lazy imports to point directly at _graph_builder

No build_hooks.py changes needed — the build system auto-discovers .pyx
files via glob.

Ref: NVIDIA#1076
Made-with: Cursor
Replace the per-module _lazy_init / _inited / _driver_ver / _py_major_minor
pattern in _graph_builder.pyx with direct calls to centralized cached
functions in cuda_utils:

- Add get_driver_version() with @functools.cache alongside get_binding_version
- Switch get_binding_version from @functools.lru_cache to @functools.cache
  (cleaner for nullary functions)
- Fix split() to return tuple(result) — Cython enforces return type
  annotations unlike pure Python
- Fix _cond_with_params annotation from -> GraphBuilder to -> tuple
  to match actual return value

Made-with: Cursor
Implements NVIDIA#1328: host callbacks during stream capture via
cuLaunchHostFunc, mirroring the existing GraphDef.callback API.

Extracts shared callback infrastructure (_attach_user_object,
_attach_host_callback_to_graph, trampoline/destructor) into a new
_graph/_utils.pyx module to avoid circular imports between
_graph_builder and _graphdef.

Made-with: Cursor
Move binding and driver version queries into a dedicated
cuda/core/_utils/version.{pyx,pxd} module, providing both Python
(binding_version, driver_version) and Cython (cy_binding_version,
cy_driver_version) entry points. All functions return version tuples
((major, minor, patch)) and are cached—Python via @functools.cache,
Cython via module-level globals.

Remove get_binding_version / get_driver_version from cuda_utils.pyx
and update all internal call sites and tests to import from the new
module. Remove version checks for CUDA < 12.0 (now the minimum) and
eliminate dead code exposed by the migration: _lazy_init / _use_ex /
_kernel_ctypes / _is_cukernel_get_library_supported machinery in
_module.pyx, _launcher.pyx, and _launch_config.pyx.

The public NVML-based system.get_driver_version API is unrelated and
left unchanged.

Made-with: Cursor
Made-with: Cursor

# Conflicts:
#	cuda_core/cuda/core/_graph/_graph_builder.pyx
#	cuda_core/cuda/core/_utils/cuda_utils.pyx
@Andy-Jost Andy-Jost added this to the cuda.core v0.7.0 milestone Mar 25, 2026
@Andy-Jost Andy-Jost added enhancement Any code-related improvements P1 Medium priority - Should do cuda.core Everything related to the cuda.core module labels Mar 25, 2026
@Andy-Jost Andy-Jost self-assigned this Mar 25, 2026
@github-actions

This comment has been minimized.

Remove unused imports flagged by cython-lint and ruff after
resolving merge conflicts with origin/main.

Made-with: Cursor
@Andy-Jost Andy-Jost force-pushed the standardize-version-checks branch from 645ccb9 to 51b8f63 Compare March 25, 2026 21:16
Remove the now-dead _reduce_3_tuple helper from cuda_utils.pyx.

Made-with: Cursor
Initialize _use_nvjitlink_backend to None so it can serve as its own
"already decided" sentinel, eliminating the redundant _driver_ver variable
and the driver_version() call that was only used to set it.

Made-with: Cursor
Add -> tuple[int, int, int] annotations to binding_version and
driver_version. Align driver_version arithmetic with _system.pyx.

Made-with: Cursor
@Andy-Jost Andy-Jost enabled auto-merge (squash) March 25, 2026 22:04
Copy link
Copy Markdown
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be better if "driver version" is a two-tuple. See the long comment.

The cuda-bindings version is a three-tuple, but the driver version is not.

try:
parts = importlib.metadata.version("cuda-bindings").split(".")[:3]
except importlib.metadata.PackageNotFoundError:
parts = importlib.metadata.version("cuda-python").split(".")[:3]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd put a simple sanity check:

    assert len(parts) == 3
  • Cheap
  • Should never fire
  • But if it does, it can prevent a lot of confusion

(I'd deliberately not make this any more sophisticated.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is covered by the trailing [:3] at lines 15, 17.

def driver_version() -> tuple[int, int, int]:
"""Return the CUDA driver version as a (major, minor, patch) triple."""
cdef int ver = handle_return(driver.cuDriverGetVersion())
return (ver // 1000, (ver // 10) % 100, ver % 10)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the below (ChatGPT Pro, instant), this is "technically correct but semantically misleading." I'd go with Option B.


User Prompt

I'm reviewing this cython code (NVIDIA/cuda-python repo):

@functools.cache
def driver_version() -> tuple[int, int, int]:
    """Return the CUDA driver version as a (major, minor, patch) triple."""
    cdef int ver = handle_return(driver.cuDriverGetVersion())
    return (ver // 1000, (ver // 10) % 100, ver % 10)

On the CTK side I see this (the last digit is always zero, irrespective of the patch version):

$ grep '#define CUDA_VERSION' $cudah
./12.0.1_525.85.12/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 12000
./12.1.1_530.30.02/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 12010
./12.2.2_535.104.05/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 12020
./12.3.2_545.23.08/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 12030
./12.4.1_550.54.15/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 12040
./12.5.1_555.42.06/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 12050
./12.6.3_560.35.05/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 12060
./12.8.1_570.124.06/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 12080
./12.9.1_575.57.08/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 12090
./13.0.2_580.95.05/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 13000
./13.1.1_590.48.01/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 13010
./13.2.0_595.45.04/cuda_cudart/targets/x86_64-linux/include/cuda.h:#define CUDA_VERSION 13020

But what about the driver? Does the last digit there have a significance? Is it ever different from 0?


Assistant Response

Short answer: your instinct is right to question it — the “patch” digit in cuDriverGetVersion() is not meaningful in practice and is effectively always 0.

1. What cuDriverGetVersion() actually returns

From the CUDA driver API, cuDriverGetVersion() returns an integer encoding:

version = 1000 * major + 10 * minor

That’s it. There is no patch component encoded.

Examples:

  • CUDA 12.6 → 12060
  • CUDA 13.1 → 13010

If decoded as:

(ver // 1000, (ver // 10) % 100, ver % 10)

you get (12, 6, 0) — but that last 0 is just an artifact.


2. Why your CTK observation matches

CUDA_VERSION uses the same encoding:

#define CUDA_VERSION 12060

So:

  • Toolkit patch versions (e.g. 12.6.3) are not represented
  • The last digit is always 0

3. Driver vs Toolkit versioning

CUDA Toolkit

  • Has real patch versions (e.g. 12.6.3)
  • CUDA_VERSION does not encode them

CUDA Driver

  • Version tied to NVIDIA driver (e.g. 560.35.05)
  • cuDriverGetVersion() reports only CUDA API compatibility level

4. Is the last digit ever non-zero?

No — it is structurally always zero.


5. Practical implication

Your code is technically correct but semantically misleading:

return (ver // 1000, (ver // 10) % 100, ver % 10)

It suggests a meaningful patch version, which does not exist.


6. Recommendation

Option A (preferred):

def driver_version() -> tuple[int, int]:
    return (ver // 1000, (ver // 10) % 100)

Option B (if API stability matters):
Keep the triple but document that patch is always zero.

"""Return (major, minor, patch=0). Patch is always 0 for driver."""

Bottom line

  • Patch digit is always zero
  • No semantic meaning
  • Treat as (major, minor)

Bonus

To get the actual driver version (e.g. 560.35.05), use NVML:

  • nvmlSystemGetDriverVersion()

Copy link
Copy Markdown
Contributor Author

@Andy-Jost Andy-Jost Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to ask whether to make bindings a pair or triple until I found the three-part format in the existing code here.

The danger with mixing these is that comparisons between two- and three-part tuples will give inconsistent and surprising results (see below). Rather than trying to remember which version uses which format, it's better to follow a uniform rule: make every version a three-tuple and always compare against a three-tuple. That's a simple rule everyone can remember, and it makes inconsistencies easy to spot.

>>> (12, 3) <= (12, 3, 0)
True
>>> (12, 3) >= (12, 3, 0)
False

version = binding_version()
if version < (12, 9, 0):
raise RuntimeError(
f"NVVM bindings require cuda-bindings >= 12.9.0, but found {version[0]}.{version[1]}.x. "
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the old code using a three-part version for cuda-bindings.

@Andy-Jost Andy-Jost merged commit 4cea0a1 into NVIDIA:main Mar 26, 2026
249 of 253 checks passed
@github-actions

This comment has been minimized.

@Andy-Jost Andy-Jost deleted the standardize-version-checks branch March 26, 2026 14:45
@github-actions

This comment has been minimized.

1 similar comment
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

1 similar comment
@github-actions
Copy link
Copy Markdown

Doc Preview CI
Preview removed because the pull request was closed or merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P1 Medium priority - Should do

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants