NVIDIA
diff --git a/‎cuda_core/docs/source/release/0.6.0-notes.rst‎
Lines changed: 67 additions & 13 deletions b/‎cuda_core/docs/source/release/0.6.0-notes.rst‎
Lines changed: 67 additions & 13 deletions
diff --git a/‎cuda_core/docs/source/release/0.6.x-notes.rst‎
Lines changed: 0 additions & 42 deletions b/‎cuda_core/docs/source/release/0.6.x-notes.rst‎
Lines changed: 0 additions & 42 deletions
@@ -1,33 +1,87 @@
-.. SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+.. SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 .. SPDX-License-Identifier: Apache-2.0
 
 .. currentmodule:: cuda.core
 
 ``cuda.core`` 0.6.0 Release Notes
 ==================================
 
+
+Highlights
+----------
+
+- Added the ``cuda.core.system`` module for NVML-based system and device queries.
+- Several :class:`~utils.StridedMemoryView` improvements, including bfloat16 dlpack support
+  and numpy array interoperability.
+- Improved support for Python object protocols across core API classes.
+- Performance improvements through Cythonization and reduced Python overhead.
+
+
+Breaking Changes
+----------------
+
+- Building ``cuda.core`` from source now requires ``cuda-bindings`` >= 12.9.0, due to Cython-level
+  dependencies on the NVVM bindings (``cynvvm``). Pre-built wheels are unaffected. The previous
+  minimum was 12.8.0.
+
+
 New features
 ------------
 
-- Added public access to default CUDA streams via module-level constants ``LEGACY_DEFAULT_STREAM`` and ``PER_THREAD_DEFAULT_STREAM``
+- Added the ``cuda.core.system`` module for NVML-based system and device queries, including
+  device attributes, clocks, temperatures, fans, events, and PCI information.
 
-  Users can now access default streams directly from the ``cuda.core`` namespace:
+- :class:`~utils.StridedMemoryView` improvements:
 
-  .. code-block:: python
+  - Added ``from_array_interface`` constructor for creating views from numpy arrays.
+  - Improved structured dtype array support.
+  - Relaxed the power-of-two itemsize check in ``StridedLayout``.
+  - Added bfloat16 dlpack support when the optional ``ml_dtypes`` package is installed.
 
-      from cuda.core import LEGACY_DEFAULT_STREAM, PER_THREAD_DEFAULT_STREAM
+- Added public access to default CUDA streams via module-level constants
+  ``LEGACY_DEFAULT_STREAM`` and ``PER_THREAD_DEFAULT_STREAM``, replacing the previous
+  workaround of using ``Stream.from_handle(0)``.
 
-      # Use legacy default stream (synchronizes with all blocking streams)
-      LEGACY_DEFAULT_STREAM.sync()
+- Added :meth:`Kernel.from_handle` for wrapping an existing ``CUfunction`` handle into a
+  :class:`Kernel` object, enabling interoperability with foreign CUDA handles.
 
-      # Use per-thread default stream (non-blocking, thread-local)
-      PER_THREAD_DEFAULT_STREAM.sync()
+- Added ``__eq__``, ``__hash__``, ``__weakref__``, and ``__repr__`` support for core API classes
+  including :class:`Buffer`, :class:`LaunchConfig`, :class:`Kernel`, :class:`ObjectCode`,
+  :class:`Stream`, and :class:`Event`.
 
-  The legacy default stream synchronizes with all blocking streams in the same CUDA context, ensuring strict ordering but potentially limiting concurrency. The per-thread default stream is local to the calling thread and does not synchronize with other streams, enabling concurrent execution in multi-threaded applications.
+- Added NVVM ``extra_sources`` and ``use_libdevice`` options to :class:`ProgramOptions` for
+  multi-module NVVM compilation and automatic libdevice loading.
 
-  This replaces the previous undocumented workaround of using ``Stream.from_handle(0)`` to access the legacy default stream.
+- Added CUDA version compatibility check at import time to detect mismatches between
+  ``cuda.core`` and the installed ``cuda-bindings`` version.
 
-Fixes and enhancements
------------------------
+
+New examples
+------------
 
 None.
+
+
+Fixes and enhancements
+----------------------
+
+- Reduced wheel and installed package sizes by excluding Cython source files and build
+  artifacts from distribution packages.
+
+- Improved performance by Cythonizing :class:`Program` and :class:`ObjectCode` internals.
+
+- Reduced :class:`~utils.StridedMemoryView` construction overhead.
+
+- Legacy and per-thread default streams are now singletons, ensuring consistent identity
+  across the application.
+
+- ``__hash__`` and ``__eq__`` on core API classes no longer require a CUDA context.
+
+- Device attribute queries now gracefully handle unsupported attributes on older CUDA
+  drivers, returning sensible defaults instead of raising errors.
+
+- Fixed zero-sized allocations in legacy memory resources, which previously failed on
+  certain platforms.
+
+- Added a warning when :class:`ManagedMemoryResource` is created on platforms without
+  concurrent managed access support.