Summary
Linux wheels for cuda-core have grown steadily across releases (11 MB in 0.4.0 → 21 MB in 0.6.0 → 30 MB in 0.7.0), while Windows wheels have stayed at ~4-5 MB. The root cause is that Linux .so files ship with debug symbols and are not stripped.
Details
Running file on any .so in the 0.7.0 Linux wheel confirms:
ELF 64-bit LSB shared object, x86-64, ... with debug_info, not stripped
Size comparison for the same module (0.7.0):
| Module |
Linux .so |
Windows .pyd |
Ratio |
system/_device |
5.6 MB |
0.4 MB |
12.6x |
_device |
2.7 MB |
0.2 MB |
10.9x |
_stream |
1.4 MB |
0.1 MB |
12.0x |
Stripping all .so files reduces the total extracted size from 103.7 MB → 11.2 MB (89% reduction), which would bring the compressed wheel from ~30 MB down to ~4-5 MB, matching Windows.
Growth across releases is compounded by more Cython modules per release (each duplicated for cu12 + cu13):
| Version |
.so count |
Total .so size (unstripped) |
Wheel size |
| 0.4.0 |
20 |
29.6 MB |
11 MB |
| 0.6.0 |
51 |
73.0 MB |
21 MB |
| 0.7.0 |
71 |
103.7 MB |
30 MB |
Root cause
cuda_bindings already strips debug symbols for wheel builds via -Wl,--strip-all in its build_hooks.py (link):
if strip and sys.platform == "linux":
extra_link_args += ["-Wl,--strip-all"]
with strip=True passed only for build_wheel().
However, cuda_core/build_hooks.py has no stripping logic at all — its Extension objects have no extra_link_args. The same pattern from cuda_bindings should be applied.
Proposed fix
Add extra_link_args=["-Wl,--strip-all"] to the Extension objects in cuda_core/build_hooks.py when building wheels on Linux, following the same strip=True/False pattern as cuda_bindings.
-- Leo's bot
Summary
Linux wheels for
cuda-corehave grown steadily across releases (11 MB in 0.4.0 → 21 MB in 0.6.0 → 30 MB in 0.7.0), while Windows wheels have stayed at ~4-5 MB. The root cause is that Linux.sofiles ship with debug symbols and are not stripped.Details
Running
fileon any.soin the 0.7.0 Linux wheel confirms:Size comparison for the same module (0.7.0):
system/_device_device_streamStripping all
.sofiles reduces the total extracted size from 103.7 MB → 11.2 MB (89% reduction), which would bring the compressed wheel from ~30 MB down to ~4-5 MB, matching Windows.Growth across releases is compounded by more Cython modules per release (each duplicated for cu12 + cu13):
.socount.sosize (unstripped)Root cause
cuda_bindingsalready strips debug symbols for wheel builds via-Wl,--strip-allin itsbuild_hooks.py(link):with
strip=Truepassed only forbuild_wheel().However,
cuda_core/build_hooks.pyhas no stripping logic at all — itsExtensionobjects have noextra_link_args. The same pattern fromcuda_bindingsshould be applied.Proposed fix
Add
extra_link_args=["-Wl,--strip-all"]to theExtensionobjects incuda_core/build_hooks.pywhen building wheels on Linux, following the samestrip=True/Falsepattern ascuda_bindings.-- Leo's bot