Releases: NVIDIA/cuda-python
CUDA Python 12.9.1
cuda-pathfinder v1.1.0
- CTK 13.0.0 compatibility
- Bug fix: load
libnvJitLink.so.12from conda, not/usr/local/cudaPR #767
cuda-pathfinder v1.0.0
First release of cuda-pathfinder as a stand-alone module.
cuda.pathfinder replaces cuda.bindings.path_finder, which was released with cuda-bindings 12.9.0 and is now deprecated.
Note that cuda-pathfinder is a noarch package and has no dependencies (other than a Python 3.9+ interpreter).
Please see cuda/pathfinder/README for more information.
cuda.core v0.3.1
cuda.core v0.3.1 release announcement
Release note
All functionalities are currently hosted under the cuda.core.experimental namespace. Once the features become stable they will be moved out of experimental.
Documentation
Sample codes
What's Changed
- Bump github/codeql-action from 3.28.19 to 3.29.0 by @dependabot in #710
- Fix Windows build CI by @leofang in #713
- Bump pypa/cibuildwheel from 2.23.3 to 3.0.0 by @dependabot in #711
- Ensure correct handling of buffers allocated with
LegacyPinnedMemoryResource.allocateas kernel parameters by @shwina in #717 - Fix nvbugpro 5348750 by @oleksandr-pavlyk in #725
- Add a "Getting Started" page to the documentation by @shwina in #720
- Bump korthout/backport-action from 3.2.0 to 3.2.1 by @dependabot in #738
- Bump github/codeql-action from 3.29.0 to 3.29.2 by @dependabot in #737
- cuda_core/tests/test_event.py::test_timing_success WSL compatibility by @rwgk in #740
- Restore option to run testing without cupy installed. by @rwgk in #741
- Cythonize away some perf hot spots by @leofang in #709
- cuda_core forward compatibility changes. by @rwgk in #722
- Update docs for v0.3.1 release by @leofang in #695
Full Changelog: cuda-core-v0.3.0...cuda-core-v0.3.1
cuda.core v0.3.0
cuda.core v0.3.0 release announcement
Release note
All functionalities are currently hosted under the cuda.core.experimental namespace. Once the features become stable they will be moved out of experimental.
Documentation
Sample codes
What's Changed
- cuda.core:
CUResult,cudaErrorexplanations by @rwgk in #503 - DOC: Add admonition to docstrings for cuda.core handle properties by @carterbox in #573
- NEW: Make event timing error messages more specific and actionable by @carterbox in #559
- Change
cuda.corelicense to Apache-2.0 & make contributing guides clear by @leofang in #583 - Add lint instructions by @msaroufim in #581
- fix indexing bug in saxpy.py by @msaroufim in #582
- Consolidate shared info between README.md and DESCRIPTION.rst by @vzhurba01 in #590
- PyTorch example by @msaroufim in #579
- Address remaining OSRB requests + document known installation issues by @leofang in #626
- Initial version of pre-commit "Check SPDX-License-Identifier" by @rwgk in #625
- Implement Kernel.num_arguments, and Kernel.arguments_info by @oleksandr-pavlyk in #612
- Move dependencies from requirements.txt to an optional packaging extra by @kkraus14 in #638
- Always build and run Cython tests + other CI improvements by @leofang in #640
- Implement device and context properties for Event by @NaderAlAwar in #618
- Enable serialization/deserialization of
ObjectCodeinstances by @brandon-b-miller in #660 - Add more
ObjectCodeconstructors by @brandon-b-miller in #652 - Unify Common CI Code for Windows and Linux by @cryos in #645
- MNT: Bump DLPack header to 1.1 by @leofang in #667
- Add tests to cover scalar handling in
launch()+ Fix fp16 bug by @leofang in #669 - Feature/occupancy by @oleksandr-pavlyk in #648
- Repair Windows wheels by @leofang in #673
- Migrate to
windows-2022for Windows CI builds by @cryos in #672 - Bugfix/multiple ptxas options values by @oleksandr-pavlyk in #678
- Clean up cffi resources in file by @oleksandr-pavlyk in #679
- Support cooperative launch by @leofang in #676
- Allow
ObjectCodeto have a name by @leofang in #682 - Make compute-sanitizer not report API errors as errors by @leofang in #687
- Update the notes on the CCCL and nvmath-python projects by @leofang in #688
- Switch to use CUDA driver APIs in
Deviceconstructor by @leofang in #460 - Bump github/codeql-action from 3.28.18 to 3.28.19 by @dependabot in #700
- Bump conda-incubator/setup-miniconda from 3.1.1 to 3.2.0 by @dependabot in #701
- Add phase 1 of CUDA Graphs support by @vzhurba01 in #455
- Make a few memory management objects public + Miscellaneous doc updates by @leofang in #693
- Bump
cuda.coreto v0.3.0 by @leofang in #703
New Contributors
- @gmarkall made their first contribution in #522
- @kkraus14 made their first contribution in #549
- @pre-commit-ci made their first contribution in #552
- @msaroufim made their first contribution in #581
- @cryos made their first contribution in #555
- @dependabot made their first contribution in #700
Full Changelog: cuda-core-v0.2.0...cuda-core-v0.3.0
CUDA Python 12.9.0
CUDA Python 11.8.7
cuda.core v0.2.0
cuda.core v0.2.0 release announcement
Release note
All functionalities are currently hosted under the cuda.core.experimental namespace. Once the features become stable they will be moved out of experimental.
Key Features and Enhancements
- Add
ProgramOptionsto facilitate the passing of runtime compile options toProgram. - Add pythonic access to
DeviceandKernelattributes.
For full details please refer to the release note above.
Breaking Changes
- The
streamattribute is removed fromLaunchConfig. Instead, theStreamobject should now be directly passed tolaunchas an argument. - The signature for
launchis changed by swapping positional arguments, the new signature is now(stream, config, kernel, *kernel_args) - Change
__cuda_stream__from attribute to method. - The
Program.compilemethod no longer accepts theoptionsargument. Instead, you can optionally pass an instance ofProgramOptionsto the constructor ofProgram. Device.propertiesnow provides attribute getters instead of a dictionary interface.- The
.handleattribute of variouscuda.coreobjects now returns the underlying Python object instead of a (type-erased) Python integer.
New examples
jit_lto_fractal.py— Demonstrates just-in-time link-time optimization for fractal generation. (Device,LaunchConfig,Linker,LinkerOptions,Program,ProgramOptions) (#475)simple_multi_gpu_example.py— Example of using multiple GPUs. (Device,Program,LaunchConfig) (#304)show_device_properties.py— Displays detailed device properties. (Device) (#474)
Documentation
Sample codes
Test fixes
- Clean up device initialization in some tests. (#507)
What's Changed
- Add back CuPy as an optional test dependency + Fix an example bug by @leofang in #334
- add warning to the nvjitlink ctor when falling back to cuLink by @ksimpson-work in #315
- Set up a preliminary doc build/publish pipeline by @leofang in #325
- Fix doc ci permissions by @leofang in #338
- Add conda installation instructions by @leofang in #321
- Add a dummy email address to the doc bot by @leofang in #343
- multi gpu example by @amwi04 in #304
- Change
__cuda_stream__from attribute to method by @NaderAlAwar in #389 - Ensure deprecation warnings from cuda.bindings are swallowed by @ksimpson-work in #404
- Add the options data class to program by @ksimpson-work in #237
- Update to use the new NVKS runners by @leofang in #426
- Disable notebook execution for
cuda.bindingsby @vzhurba01 in #424 - Stop tracking cached files and Jupyter notebook hashes in doc builds by @vzhurba01 in #425
- Documentation remove gpu dependency by @ksimpson-work in #398
- handle CTK version specific options in the linker test by @ksimpson-work in #371
- support ptx code type for program by @ksimpson-work in #317
- Update release checklist to focus on subpackages by @vzhurba01 in #427
- Kernel attributes by @ksimpson-work in #360
- Use cpu runner to build docs by @leofang in #434
- device properties by @ksimpson-work in #409
- Update linker options sequence handling by @ksimpson-work in #436
- Expose
ObjectCodeas public API + prune unnecessary input arguments by @ksimpson-work in #435 - Pin Sphinx <8.2.0 to fix doc build by @leofang in #456
- CI: Add Windows GPU runner for tests by @leofang in #444
- Improve program checks by @ksimpson-work in #394
- Various handle-related changes and improvements by @leofang in #463
- Improve perf of accessing
dev.compute_capabilityby @leofang in #459 - Device properties example by @samaid in #474
- Add
ObjectCodeptx constructor by @brandon-b-miller in #470 - Add a
Linkerexample by @vzhurba01 in #475 - Add error log producing test by @ksimpson-work in #423
- Apply
__new__approach to disabling__init__by @rwgk in #484 - switch the launch argument order by @ksimpson-work in #316
- NEW: Create an Event without recording to Stream by @carterbox in #487
- Clearer error messages (cuda.core) by @rwgk in #458
- Add event timing by @leofang in #481
- Fix merge error by @leofang in #495
- add public handle to object code by @ksimpson-work in #492
- Bump
cuda-corever to v0.2.0 by @leofang in #494 - Increase tolerance in
test_timing()to avoid flaky tests. by @rwgk in #498 - Fix
test_timingflakiness under Windows by @rwgk in #508 - NEW: Add Event to public API by @carterbox in #501
- cuda.core: Change selected
.decode()calls to.decode("utf-8", errors="backslashreplace")by @rwgk in #510 - clean up device initialization in test by @ksimpson-work in #507
- Add
@functools.lru_cachedecorator forget_binding_version()by @rwgk in #512 - Fix dangling pointer problem in _linker.py by @rwgk in #516
- cuda.core: release notes update by @rwgk in #519
- Set release date by @vzhurba01 in #523
New Contributors
- @amwi04 made their first contribution in #304
- @NaderAlAwar made their first contribution in #389
- @samaid made their first contribution in #474
- @brandon-b-miller made their first contribution in #470
- @carterbox made their first contribution in #487
Full Changelog: cuda-core-v0.1.1...cuda-core-v0.2.0