Skip to content

Commit 85705ef

Browse files
[doc-only] cuda.core v0.6.0 release notes
Merge 0.6.x-notes.rst into 0.6.0-notes.rst and add comprehensive release notes covering all changes since v0.5.1. Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent 5985ee1 commit 85705ef

File tree

2 files changed

+67
-55
lines changed

2 files changed

+67
-55
lines changed
Lines changed: 67 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,87 @@
1-
.. SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
1+
.. SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
.. SPDX-License-Identifier: Apache-2.0
33
44
.. currentmodule:: cuda.core
55

66
``cuda.core`` 0.6.0 Release Notes
77
==================================
88

9+
10+
Highlights
11+
----------
12+
13+
- Added the ``cuda.core.system`` module for NVML-based system and device queries.
14+
- Several :class:`~utils.StridedMemoryView` improvements, including bfloat16 dlpack support
15+
and numpy array interoperability.
16+
- Improved support for Python object protocols across core API classes.
17+
- Performance improvements through Cythonization and reduced Python overhead.
18+
19+
20+
Breaking Changes
21+
----------------
22+
23+
- Building ``cuda.core`` from source now requires ``cuda-bindings`` >= 12.9.0, due to Cython-level
24+
dependencies on the NVVM bindings (``cynvvm``). Pre-built wheels are unaffected. The previous
25+
minimum was 12.8.0.
26+
27+
928
New features
1029
------------
1130

12-
- Added public access to default CUDA streams via module-level constants ``LEGACY_DEFAULT_STREAM`` and ``PER_THREAD_DEFAULT_STREAM``
31+
- Added the ``cuda.core.system`` module for NVML-based system and device queries, including
32+
device attributes, clocks, temperatures, fans, events, and PCI information.
1333

14-
Users can now access default streams directly from the ``cuda.core`` namespace:
34+
- :class:`~utils.StridedMemoryView` improvements:
1535

16-
.. code-block:: python
36+
- Added ``from_array_interface`` constructor for creating views from numpy arrays.
37+
- Improved structured dtype array support.
38+
- Relaxed the power-of-two itemsize check in ``StridedLayout``.
39+
- Added bfloat16 dlpack support when the optional ``ml_dtypes`` package is installed.
1740

18-
from cuda.core import LEGACY_DEFAULT_STREAM, PER_THREAD_DEFAULT_STREAM
41+
- Added public access to default CUDA streams via module-level constants
42+
``LEGACY_DEFAULT_STREAM`` and ``PER_THREAD_DEFAULT_STREAM``, replacing the previous
43+
workaround of using ``Stream.from_handle(0)``.
1944

20-
# Use legacy default stream (synchronizes with all blocking streams)
21-
LEGACY_DEFAULT_STREAM.sync()
45+
- Added :meth:`Kernel.from_handle` for wrapping an existing ``CUfunction`` handle into a
46+
:class:`Kernel` object, enabling interoperability with foreign CUDA handles.
2247

23-
# Use per-thread default stream (non-blocking, thread-local)
24-
PER_THREAD_DEFAULT_STREAM.sync()
48+
- Added ``__eq__``, ``__hash__``, ``__weakref__``, and ``__repr__`` support for core API classes
49+
including :class:`Buffer`, :class:`LaunchConfig`, :class:`Kernel`, :class:`ObjectCode`,
50+
:class:`Stream`, and :class:`Event`.
2551

26-
The legacy default stream synchronizes with all blocking streams in the same CUDA context, ensuring strict ordering but potentially limiting concurrency. The per-thread default stream is local to the calling thread and does not synchronize with other streams, enabling concurrent execution in multi-threaded applications.
52+
- Added NVVM ``extra_sources`` and ``use_libdevice`` options to :class:`ProgramOptions` for
53+
multi-module NVVM compilation and automatic libdevice loading.
2754

28-
This replaces the previous undocumented workaround of using ``Stream.from_handle(0)`` to access the legacy default stream.
55+
- Added CUDA version compatibility check at import time to detect mismatches between
56+
``cuda.core`` and the installed ``cuda-bindings`` version.
2957

30-
Fixes and enhancements
31-
-----------------------
58+
59+
New examples
60+
------------
3261

3362
None.
63+
64+
65+
Fixes and enhancements
66+
----------------------
67+
68+
- Reduced wheel and installed package sizes by excluding Cython source files and build
69+
artifacts from distribution packages.
70+
71+
- Improved performance by Cythonizing :class:`Program` and :class:`ObjectCode` internals.
72+
73+
- Reduced :class:`~utils.StridedMemoryView` construction overhead.
74+
75+
- Legacy and per-thread default streams are now singletons, ensuring consistent identity
76+
across the application.
77+
78+
- ``__hash__`` and ``__eq__`` on core API classes no longer require a CUDA context.
79+
80+
- Device attribute queries now gracefully handle unsupported attributes on older CUDA
81+
drivers, returning sensible defaults instead of raising errors.
82+
83+
- Fixed zero-sized allocations in legacy memory resources, which previously failed on
84+
certain platforms.
85+
86+
- Added a warning when :class:`ManagedMemoryResource` is created on platforms without
87+
concurrent managed access support.

cuda_core/docs/source/release/0.6.x-notes.rst

Lines changed: 0 additions & 42 deletions
This file was deleted.

0 commit comments

Comments
 (0)