Skip to content

Commit 863f898

Browse files
Cythonize _linker.py (#1604)
* Begin Cythonization of _program.py - Rename _program.py to _program.pyx - Convert Program to cdef class with _program.pxd declarations - Extract _MembersNeededForFinalize to module-level _ProgramMNFF (nested classes not allowed in cdef class) - Add __repr__ method to Program - Keep ProgramOptions as @DataClass (unchanged) - Keep weakref.finalize pattern for handle cleanup * Extract Program helpers to module-level cdef functions - Move _translate_program_options to Program_translate_options (cdef) - Move _can_load_generated_ptx to Program_can_load_generated_ptx (cdef) - Remove unused TYPE_CHECKING import block - Follow _memory/_buffer.pyx helper function patterns * Complete Cythonization of _program.py - Reorganize file structure per developer guide (principal class first) - Add module docstring, __all__, type alias section - Factor long methods into cdef inline helpers - Add proper exception specs to cdef functions - Fix docstrings (use :class: refs, public paths) - Add type annotations to public methods - Inline _nvvm_exception_manager (single use) - Remove Union import, use | syntax - Add public Program.driver_can_load_nvrtc_ptx_output() API - Update tests to use new public API Closes #1082 * Extend test_object_protocols.py with Program and ObjectCode variations Add fixtures for different Program backends (NVRTC, PTX, NVVM) and ObjectCode code types (cubin, PTX, LTOIR). Split API_TYPES into more precise HASH_TYPES, EQ_TYPES, and WEAKREF_TYPES lists. Derive DICT_KEY_TYPES and WEAK_KEY_TYPES for collection tests. * Add NVRTC/NVVM resource handles and remove Program MNFF - Add NvrtcProgramHandle and NvvmProgramHandle to resource handles module - Add function pointer initialization for nvrtcDestroyProgram and nvvmDestroyProgram - Forward-declare nvvmProgram to avoid nvvm.h dependency - Refactor detail::make_py to accept module name parameter - Remove _ProgramMNFF class from _program.pyx - Program now uses typed handles directly with RAII cleanup - Update handle property to return None when handle is null * Add HANDLE_RETURN_NVRTC and HANDLE_RETURN_NVVM, simplify HANDLE_RETURN - Add NVVMError exception class - Add HANDLE_RETURN_NVRTC for nogil NVRTC error handling with program log - Add HANDLE_RETURN_NVVM for nogil NVVM error handling with program log - Remove vestigial supported_error_type fused type - Simplify HANDLE_RETURN to directly take cydriver.CUresult * Fix build errors, update tests, remove unused imports - Change cdef function return types from ObjectCode to object (Cython limitation) - Remove unused imports: intptr_t, NvrtcProgramHandle, NvvmProgramHandle, as_intptr - Update as_py(NvvmProgramHandle) to return Python int via PyLong_FromSsize_t - Update test assertions: remove handle checks after close(), test idempotency instead - Update NVVM error message regex to match new unified format * Address review feedback: keep _can_load_generated_ptx private, update SPDX dates - Remove Program.driver_can_load_nvrtc_ptx_output() public static method - Make _can_load_generated_ptx a cpdef (callable from Python tests) - Update tests to import _can_load_generated_ptx from cuda.core._program - Update SPDX copyright years to 2024-2026 for files with 2024-2025 - Update get_kernel docstring: name parameter is str | bytes Co-authored-by: Cursor <cursoragent@cursor.com> * Address review feedback: NVVMError inherits from nvvmError, clean up error helpers - Make NVVMError inherit from cuda.bindings.nvvm.nvvmError for compatibility with code catching nvvmError, while also inheriting from CUDAError - Simplify HANDLE_RETURN_NVRTC and HANDLE_RETURN_NVVM to just check success and delegate to helper functions - Move all error handling logic into _raise_nvrtc_error and _raise_nvvm_error Co-authored-by: Cursor <cursoragent@cursor.com> * Add 0.6.x release notes with cuda-bindings build requirement change Co-authored-by: Cursor <cursoragent@cursor.com> * Begin Cythonization of _linker.py Rename _linker.py to _linker.pyx with cdef class Linker. Create _linker.pxd with typed attribute declarations. Move inner _MembersNeededForFinalize class to module level. No behavior change; all existing tests pass. Co-authored-by: Cursor <cursoragent@cursor.com> * Add NvJitLinkHandle, CuLinkHandle RAII and HANDLE_RETURN_NVJITLINK Add resource handle infrastructure for linker cythonization: - NvJitLinkHandle and CuLinkHandle in resource_handles.hpp/.cpp with shared_ptr RAII ownership and automatic destroy on release. - TaggedHandle<T, int> template to make void*-based handle types (NvvmProgramHandle, NvJitLinkHandle) distinct for C++ overloading. - Factory functions, as_cu/as_intptr/as_py accessors, and function pointer initialization for nvJitLinkDestroy and cuLinkDestroy. - HANDLE_RETURN_NVJITLINK in cuda_utils for nvJitLink error handling with automatic error log retrieval. Pure infrastructure; no changes to _linker.pyx yet. Co-authored-by: Cursor <cursoragent@cursor.com> * Replace MNFF/weakref.finalize with RAII handle ownership in _linker The Linker class now holds NvJitLinkHandle and CuLinkHandle shared_ptrs directly as cdef attributes, replacing the _LinkerMembersNeededForFinalize helper class and weakref.finalize destructor pattern. close() simply resets the shared_ptr. Also removes _const_char_keep_alive (unnecessary) and TYPE_CHECKING block. Co-authored-by: Cursor <cursoragent@cursor.com> * Migrate linker to C-level calls with nogil Both nvJitLink and driver paths now use cynvjitlink/cydriver C-level calls directly, releasing the GIL around expensive operations. Option arrays built with std::vector (RAII, no manual malloc/free). Removed _exception_manager context manager (was too broad, caught TypeError); driver path errors annotated via targeted except CUDAError. Removed ctypes dependency, _input_type_from_code_type (inlined), and TYPE_CHECKING block. Co-authored-by: Cursor <cursoragent@cursor.com> * Clean up module globals: remove _nvjitlink, use C-level enum ints Replace _nvjitlink Python module global with _use_nvjitlink_backend flag. Input type dicts now use C-level enum values (cynvjitlink/cydriver) instead of Python enum objects. The cuda.bindings.nvjitlink module is only imported locally for availability detection. Co-authored-by: Cursor <cursoragent@cursor.com> * Reorganize _linker module per developer guide conventions - Add module docstring and __all__ declaration - Reorder file: Linker (principal) -> LinkerOptions (supporting) -> cdef inline helpers -> private state - Extract cdef inline helpers: Linker_init, Linker_add_code_object, Linker_link, Linker_annotate_error_log - Fix import ordering per developer guide (5 groups) - Apply c_ prefix to cdef variables for clarity - Replace _formatted_options/option_keys with typed log members - Cache decoded log strings after link() for efficiency - Type Linker _linker member in _program.pxd (object -> Linker) Co-authored-by: Cursor <cursoragent@cursor.com> * Fix cython-lint warnings in _linker.pyx Remove unused cimports (CuLinkHandle, NvJitLinkHandle already declared in .pxd) and use __import__ for the availability check to avoid an unused-import lint error. Co-authored-by: Cursor <cursoragent@cursor.com> * Minor fixes for review feedback. --------- Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent 3e4af8f commit 863f898

File tree

10 files changed

+719
-306
lines changed

10 files changed

+719
-306
lines changed

cuda_core/cuda/core/_cpp/resource_handles.cpp

Lines changed: 70 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,9 @@ decltype(&cuLibraryLoadData) p_cuLibraryLoadData = nullptr;
5656
decltype(&cuLibraryUnload) p_cuLibraryUnload = nullptr;
5757
decltype(&cuLibraryGetKernel) p_cuLibraryGetKernel = nullptr;
5858

59+
// Linker
60+
decltype(&cuLinkDestroy) p_cuLinkDestroy = nullptr;
61+
5962
// GL interop pointers
6063
decltype(&cuGraphicsUnregisterResource) p_cuGraphicsUnregisterResource = nullptr;
6164

@@ -65,6 +68,8 @@ decltype(&nvrtcDestroyProgram) p_nvrtcDestroyProgram = nullptr;
6568
// NVVM function pointers (may be null if NVVM is not available)
6669
NvvmDestroyProgramFn p_nvvmDestroyProgram = nullptr;
6770

71+
// nvJitLink function pointers (may be null if nvJitLink is not available)
72+
NvJitLinkDestroyFn p_nvJitLinkDestroy = nullptr;
6873

6974
// ============================================================================
7075
// GIL management helpers
@@ -869,19 +874,19 @@ NvrtcProgramHandle create_nvrtc_program_handle_ref(nvrtcProgram prog) {
869874

870875
namespace {
871876
struct NvvmProgramBox {
872-
nvvmProgram resource;
877+
NvvmProgramValue resource;
873878
};
874879
} // namespace
875880

876881
NvvmProgramHandle create_nvvm_program_handle(nvvmProgram prog) {
877882
auto box = std::shared_ptr<NvvmProgramBox>(
878-
new NvvmProgramBox{prog},
883+
new NvvmProgramBox{{prog}},
879884
[](NvvmProgramBox* b) {
880885
// Note: nvvmDestroyProgram takes nvvmProgram* and nulls it,
881886
// but we're deleting the box anyway so nulling is harmless.
882887
// If NVVM is not available, the function pointer is null.
883888
if (p_nvvmDestroyProgram) {
884-
p_nvvmDestroyProgram(&b->resource);
889+
p_nvvmDestroyProgram(&b->resource.raw);
885890
}
886891
delete b;
887892
}
@@ -890,8 +895,69 @@ NvvmProgramHandle create_nvvm_program_handle(nvvmProgram prog) {
890895
}
891896

892897
NvvmProgramHandle create_nvvm_program_handle_ref(nvvmProgram prog) {
893-
auto box = std::make_shared<NvvmProgramBox>(NvvmProgramBox{prog});
898+
auto box = std::make_shared<NvvmProgramBox>(NvvmProgramBox{{prog}});
894899
return NvvmProgramHandle(box, &box->resource);
895900
}
896901

902+
// ============================================================================
903+
// nvJitLink Handles
904+
// ============================================================================
905+
906+
namespace {
907+
struct NvJitLinkBox {
908+
NvJitLinkValue resource;
909+
};
910+
} // namespace
911+
912+
NvJitLinkHandle create_nvjitlink_handle(nvJitLink_t handle) {
913+
auto box = std::shared_ptr<NvJitLinkBox>(
914+
new NvJitLinkBox{{handle}},
915+
[](NvJitLinkBox* b) {
916+
// Note: nvJitLinkDestroy takes nvJitLinkHandle* and nulls it,
917+
// but we're deleting the box anyway so nulling is harmless.
918+
// If nvJitLink is not available, the function pointer is null.
919+
if (p_nvJitLinkDestroy) {
920+
p_nvJitLinkDestroy(&b->resource.raw);
921+
}
922+
delete b;
923+
}
924+
);
925+
return NvJitLinkHandle(box, &box->resource);
926+
}
927+
928+
NvJitLinkHandle create_nvjitlink_handle_ref(nvJitLink_t handle) {
929+
auto box = std::make_shared<NvJitLinkBox>(NvJitLinkBox{{handle}});
930+
return NvJitLinkHandle(box, &box->resource);
931+
}
932+
933+
// ============================================================================
934+
// cuLink Handles
935+
// ============================================================================
936+
937+
namespace {
938+
struct CuLinkBox {
939+
CUlinkState resource;
940+
};
941+
} // namespace
942+
943+
CuLinkHandle create_culink_handle(CUlinkState state) {
944+
auto box = std::shared_ptr<CuLinkBox>(
945+
new CuLinkBox{state},
946+
[](CuLinkBox* b) {
947+
// cuLinkDestroy takes CUlinkState by value (not pointer).
948+
// Errors are ignored (standard destructor practice).
949+
if (p_cuLinkDestroy) {
950+
p_cuLinkDestroy(b->resource);
951+
}
952+
delete b;
953+
}
954+
);
955+
return CuLinkHandle(box, &box->resource);
956+
}
957+
958+
CuLinkHandle create_culink_handle_ref(CUlinkState state) {
959+
auto box = std::make_shared<CuLinkBox>(CuLinkBox{state});
960+
return CuLinkHandle(box, &box->resource);
961+
}
962+
897963
} // namespace cuda_core

cuda_core/cuda/core/_cpp/resource_handles.hpp

Lines changed: 91 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,28 @@
1414
// Use void* to match cuda.bindings.cynvvm's typedef
1515
using nvvmProgram = void*;
1616

17+
// Forward declaration for nvJitLink - avoids nvJitLink.h dependency
18+
// Use void* to match cuda.bindings.cynvjitlink's typedef
19+
using nvJitLink_t = void*;
20+
1721
namespace cuda_core {
1822

23+
// ============================================================================
24+
// TaggedHandle - make void*-based handle types distinct for overloading
25+
//
26+
// Both nvvmProgram and nvJitLink_t are void*, so shared_ptr<const void*>
27+
// would be the same C++ type for both. TaggedHandle<T, Tag> wraps the raw
28+
// value with a unique tag type, making each shared_ptr type distinct.
29+
// ============================================================================
30+
31+
template<typename T, int Tag>
32+
struct TaggedHandle {
33+
T raw;
34+
};
35+
36+
using NvvmProgramValue = TaggedHandle<nvvmProgram, 0>;
37+
using NvJitLinkValue = TaggedHandle<nvJitLink_t, 1>;
38+
1939
// ============================================================================
2040
// Thread-local error handling
2141
// ============================================================================
@@ -72,6 +92,9 @@ extern decltype(&cuLibraryLoadData) p_cuLibraryLoadData;
7292
extern decltype(&cuLibraryUnload) p_cuLibraryUnload;
7393
extern decltype(&cuLibraryGetKernel) p_cuLibraryGetKernel;
7494

95+
// Linker
96+
extern decltype(&cuLinkDestroy) p_cuLinkDestroy;
97+
7598
// Graphics interop
7699
extern decltype(&cuGraphicsUnregisterResource) p_cuGraphicsUnregisterResource;
77100

@@ -97,6 +120,19 @@ extern decltype(&nvrtcDestroyProgram) p_nvrtcDestroyProgram;
97120
using NvvmDestroyProgramFn = int (*)(nvvmProgram*);
98121
extern NvvmDestroyProgramFn p_nvvmDestroyProgram;
99122

123+
// ============================================================================
124+
// nvJitLink function pointers
125+
//
126+
// These are populated by _resource_handles.pyx at module import time using
127+
// function pointers extracted from cuda.bindings.cynvjitlink.__pyx_capi__.
128+
// Note: May be null if nvJitLink is not available at runtime.
129+
// ============================================================================
130+
131+
// Function pointer type for nvJitLinkDestroy (avoids nvJitLink.h dependency)
132+
// Signature: nvJitLinkResult nvJitLinkDestroy(nvJitLinkHandle *handle)
133+
using NvJitLinkDestroyFn = int (*)(nvJitLink_t*);
134+
extern NvJitLinkDestroyFn p_nvJitLinkDestroy;
135+
100136
// ============================================================================
101137
// Handle type aliases - expose only the raw CUDA resource
102138
// ============================================================================
@@ -109,7 +145,9 @@ using LibraryHandle = std::shared_ptr<const CUlibrary>;
109145
using KernelHandle = std::shared_ptr<const CUkernel>;
110146
using GraphicsResourceHandle = std::shared_ptr<const CUgraphicsResource>;
111147
using NvrtcProgramHandle = std::shared_ptr<const nvrtcProgram>;
112-
using NvvmProgramHandle = std::shared_ptr<const nvvmProgram>;
148+
using NvvmProgramHandle = std::shared_ptr<const NvvmProgramValue>;
149+
using NvJitLinkHandle = std::shared_ptr<const NvJitLinkValue>;
150+
using CuLinkHandle = std::shared_ptr<const CUlinkState>;
113151

114152

115153
// ============================================================================
@@ -347,6 +385,33 @@ NvvmProgramHandle create_nvvm_program_handle(nvvmProgram prog);
347385
// The program will NOT be destroyed when the handle is released.
348386
NvvmProgramHandle create_nvvm_program_handle_ref(nvvmProgram prog);
349387

388+
// ============================================================================
389+
// nvJitLink handle functions
390+
// ============================================================================
391+
392+
// Create an owning nvJitLink handle.
393+
// When the last reference is released, nvJitLinkDestroy is called.
394+
// Use this to wrap a handle created via nvJitLinkCreate.
395+
// Note: If nvJitLink is not available (p_nvJitLinkDestroy is null), the deleter is a no-op.
396+
NvJitLinkHandle create_nvjitlink_handle(nvJitLink_t handle);
397+
398+
// Create a non-owning nvJitLink handle (references existing handle).
399+
// The handle will NOT be destroyed when the last reference is released.
400+
NvJitLinkHandle create_nvjitlink_handle_ref(nvJitLink_t handle);
401+
402+
// ============================================================================
403+
// cuLink handle functions
404+
// ============================================================================
405+
406+
// Create an owning cuLink handle.
407+
// When the last reference is released, cuLinkDestroy is called.
408+
// Use this to wrap a CUlinkState created via cuLinkCreate.
409+
CuLinkHandle create_culink_handle(CUlinkState state);
410+
411+
// Create a non-owning cuLink handle (references existing CUlinkState).
412+
// The handle will NOT be destroyed when the last reference is released.
413+
CuLinkHandle create_culink_handle_ref(CUlinkState state);
414+
350415
// ============================================================================
351416
// Overloaded helper functions to extract raw resources from handles
352417
// ============================================================================
@@ -389,6 +454,14 @@ inline nvrtcProgram as_cu(const NvrtcProgramHandle& h) noexcept {
389454
}
390455

391456
inline nvvmProgram as_cu(const NvvmProgramHandle& h) noexcept {
457+
return h ? h->raw : nullptr;
458+
}
459+
460+
inline nvJitLink_t as_cu(const NvJitLinkHandle& h) noexcept {
461+
return h ? h->raw : nullptr;
462+
}
463+
464+
inline CUlinkState as_cu(const CuLinkHandle& h) noexcept {
392465
return h ? *h : nullptr;
393466
}
394467

@@ -434,6 +507,14 @@ inline std::intptr_t as_intptr(const NvvmProgramHandle& h) noexcept {
434507
return reinterpret_cast<std::intptr_t>(as_cu(h));
435508
}
436509

510+
inline std::intptr_t as_intptr(const NvJitLinkHandle& h) noexcept {
511+
return reinterpret_cast<std::intptr_t>(as_cu(h));
512+
}
513+
514+
inline std::intptr_t as_intptr(const CuLinkHandle& h) noexcept {
515+
return reinterpret_cast<std::intptr_t>(as_cu(h));
516+
}
517+
437518
// as_py() - convert handle to Python wrapper object (returns new reference)
438519
namespace detail {
439520
// n.b. class lookup is not cached to avoid deadlock hazard, see DESIGN.md
@@ -486,6 +567,15 @@ inline PyObject* as_py(const NvvmProgramHandle& h) noexcept {
486567
return PyLong_FromSsize_t(as_intptr(h));
487568
}
488569

570+
inline PyObject* as_py(const NvJitLinkHandle& h) noexcept {
571+
// nvJitLink bindings use raw integers, not wrapper classes
572+
return PyLong_FromSsize_t(as_intptr(h));
573+
}
574+
575+
inline PyObject* as_py(const CuLinkHandle& h) noexcept {
576+
return detail::make_py("cuda.bindings.driver", "CUlinkState", as_intptr(h));
577+
}
578+
489579
inline PyObject* as_py(const GraphicsResourceHandle& h) noexcept {
490580
return detail::make_py("cuda.bindings.driver", "CUgraphicsResource", as_intptr(h));
491581
}

cuda_core/cuda/core/_linker.pxd

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
#
3+
# SPDX-License-Identifier: Apache-2.0
4+
5+
from ._resource_handles cimport NvJitLinkHandle, CuLinkHandle
6+
7+
8+
cdef class Linker:
9+
cdef:
10+
NvJitLinkHandle _nvjitlink_handle
11+
CuLinkHandle _culink_handle
12+
bint _use_nvjitlink
13+
object _drv_log_bufs # formatted_options list (driver); None for nvjitlink; cleared in link()
14+
str _info_log # decoded log; None until link() or pre-link get_*_log()
15+
str _error_log # decoded log; None until link() or pre-link get_*_log()
16+
object _options # LinkerOptions
17+
object __weakref__

0 commit comments

Comments
 (0)