|
1 | | -.. SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
| 1 | +.. SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
2 | 2 | .. SPDX-License-Identifier: Apache-2.0 |
3 | 3 |
|
4 | 4 | .. currentmodule:: cuda.core |
5 | 5 |
|
6 | 6 | ``cuda.core`` 0.6.0 Release Notes |
7 | 7 | ================================== |
8 | 8 |
|
| 9 | + |
| 10 | +Highlights |
| 11 | +---------- |
| 12 | + |
| 13 | +- Added the ``cuda.core.system`` module for NVML-based system and device queries. |
| 14 | +- Several :class:`~utils.StridedMemoryView` improvements, including bfloat16 dlpack support |
| 15 | + and numpy array interoperability. |
| 16 | +- Improved support for Python object protocols across core API classes. |
| 17 | +- Performance improvements through Cythonization and reduced Python overhead. |
| 18 | + |
| 19 | + |
| 20 | +Breaking Changes |
| 21 | +---------------- |
| 22 | + |
| 23 | +- Building ``cuda.core`` from source now requires ``cuda-bindings`` >= 12.9.0, due to Cython-level |
| 24 | + dependencies on the NVVM bindings (``cynvvm``). Pre-built wheels are unaffected. The previous |
| 25 | + minimum was 12.8.0. |
| 26 | + |
| 27 | + |
9 | 28 | New features |
10 | 29 | ------------ |
11 | 30 |
|
12 | | -- Added public access to default CUDA streams via module-level constants ``LEGACY_DEFAULT_STREAM`` and ``PER_THREAD_DEFAULT_STREAM`` |
| 31 | +- Added the ``cuda.core.system`` module for NVML-based system and device queries, including |
| 32 | + device attributes, clocks, temperatures, fans, events, and PCI information. |
13 | 33 |
|
14 | | - Users can now access default streams directly from the ``cuda.core`` namespace: |
| 34 | +- :class:`~utils.StridedMemoryView` improvements: |
15 | 35 |
|
16 | | - .. code-block:: python |
| 36 | + - Added ``from_array_interface`` constructor for creating views from numpy arrays. |
| 37 | + - Improved structured dtype array support. |
| 38 | + - Relaxed the power-of-two itemsize check in ``StridedLayout``. |
| 39 | + - Added bfloat16 dlpack support when the optional ``ml_dtypes`` package is installed. |
17 | 40 |
|
18 | | - from cuda.core import LEGACY_DEFAULT_STREAM, PER_THREAD_DEFAULT_STREAM |
| 41 | +- Added public access to default CUDA streams via module-level constants |
| 42 | + ``LEGACY_DEFAULT_STREAM`` and ``PER_THREAD_DEFAULT_STREAM``, replacing the previous |
| 43 | + workaround of using ``Stream.from_handle(0)``. |
19 | 44 |
|
20 | | - # Use legacy default stream (synchronizes with all blocking streams) |
21 | | - LEGACY_DEFAULT_STREAM.sync() |
| 45 | +- Added :meth:`Kernel.from_handle` for wrapping an existing ``CUfunction`` handle into a |
| 46 | + :class:`Kernel` object, enabling interoperability with foreign CUDA handles. |
22 | 47 |
|
23 | | - # Use per-thread default stream (non-blocking, thread-local) |
24 | | - PER_THREAD_DEFAULT_STREAM.sync() |
| 48 | +- Added ``__eq__``, ``__hash__``, ``__weakref__``, and ``__repr__`` support for core API classes |
| 49 | + including :class:`Buffer`, :class:`LaunchConfig`, :class:`Kernel`, :class:`ObjectCode`, |
| 50 | + :class:`Stream`, and :class:`Event`. |
25 | 51 |
|
26 | | - The legacy default stream synchronizes with all blocking streams in the same CUDA context, ensuring strict ordering but potentially limiting concurrency. The per-thread default stream is local to the calling thread and does not synchronize with other streams, enabling concurrent execution in multi-threaded applications. |
| 52 | +- Added NVVM ``extra_sources`` and ``use_libdevice`` options to :class:`ProgramOptions` for |
| 53 | + multi-module NVVM compilation and automatic libdevice loading. |
27 | 54 |
|
28 | | - This replaces the previous undocumented workaround of using ``Stream.from_handle(0)`` to access the legacy default stream. |
| 55 | +- Added CUDA version compatibility check at import time to detect mismatches between |
| 56 | + ``cuda.core`` and the installed ``cuda-bindings`` version. |
29 | 57 |
|
30 | | -Fixes and enhancements |
31 | | ------------------------ |
| 58 | + |
| 59 | +New examples |
| 60 | +------------ |
32 | 61 |
|
33 | 62 | None. |
| 63 | + |
| 64 | + |
| 65 | +Fixes and enhancements |
| 66 | +---------------------- |
| 67 | + |
| 68 | +- Reduced wheel and installed package sizes by excluding Cython source files and build |
| 69 | + artifacts from distribution packages. |
| 70 | + |
| 71 | +- Improved performance by Cythonizing :class:`Program` and :class:`ObjectCode` internals. |
| 72 | + |
| 73 | +- Reduced :class:`~utils.StridedMemoryView` construction overhead. |
| 74 | + |
| 75 | +- Legacy and per-thread default streams are now singletons, ensuring consistent identity |
| 76 | + across the application. |
| 77 | + |
| 78 | +- ``__hash__`` and ``__eq__`` on core API classes no longer require a CUDA context. |
| 79 | + |
| 80 | +- Device attribute queries now gracefully handle unsupported attributes on older CUDA |
| 81 | + drivers, returning sensible defaults instead of raising errors. |
| 82 | + |
| 83 | +- Fixed zero-sized allocations in legacy memory resources, which previously failed on |
| 84 | + certain platforms. |
| 85 | + |
| 86 | +- Added a warning when :class:`ManagedMemoryResource` is created on platforms without |
| 87 | + concurrent managed access support. |
0 commit comments