Problem
The nvidia-cutlass 4.2.0.0 PyPI wheel contains files with very long paths. When installed under free-threaded Python on Windows, one path reaches 266 characters, exceeding the default Windows MAX_PATH limit of 260:
C:\actions-runner\_work\_tool\Python\3.14.3\x64-freethreaded\Lib\site-packages\cutlass_library\source\examples\68_hopper_fp8_warp_specialized_grouped_gemm_with_blockwise_scaling\68_hopper_fp8_warp_specialized_grouped_gemm_with_blockwise_scaling_with_sparse_groups.cu
The x64-freethreaded directory name is 12 characters longer than the x64 used by regular CPython, which is just enough to push this path over the limit. Regular (non-free-threaded) Python installs the same wheel without issue.
The pip install step fails with:
ERROR: Could not install packages due to an OSError.
HINT: This error might have occurred since this system does not have Windows Long Path support enabled.
...
FileNotFoundError: [Errno 2] No such file or directory: 'C:\actions-runner\_work\_tool\Python\3.14.3\x64-freethreaded\Lib\site-packages\cutlass_library\source\examples\68_hopper_fp8_...'
Affected jobs
All win-64 / py3.14t test jobs that run pip install --group test-cu{12,13} (which pulls in nvidia-cutlass):
- Test win-64 / py3.14t, 12.9.1, local, l4 (TCC)
- Test win-64 / py3.14t, 13.0.2, wheels, a100 (MCDM)
- Test win-64 / py3.14t, 13.2.0, wheels, a100 (MCDM)
Current workaround
PR #1816 skips the pip install --group and the subsequent all_must_work pathfinder test run for free-threaded Python on Windows. The see_what_works first pass still runs, so free-threaded builds are not completely untested.
Fix
Enable the Windows Long Path registry key on the CI runners:
HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\LongPathsEnabled = 1
Once enabled, the workaround in the CI workflow can be removed.
References
Problem
The
nvidia-cutlass4.2.0.0 PyPI wheel contains files with very long paths. When installed under free-threaded Python on Windows, one path reaches 266 characters, exceeding the default Windows MAX_PATH limit of 260:The
x64-freethreadeddirectory name is 12 characters longer than thex64used by regular CPython, which is just enough to push this path over the limit. Regular (non-free-threaded) Python installs the same wheel without issue.The
pip installstep fails with:Affected jobs
All
win-64/py3.14ttest jobs that runpip install --group test-cu{12,13}(which pulls innvidia-cutlass):Current workaround
PR #1816 skips the
pip install --groupand the subsequentall_must_workpathfinder test run for free-threaded Python on Windows. Thesee_what_worksfirst pass still runs, so free-threaded builds are not completely untested.Fix
Enable the Windows Long Path registry key on the CI runners:
Once enabled, the workaround in the CI workflow can be removed.
References