Skip to content

Exclude broken cuda-toolkit wheels on Windows#1884

Open
rwgk wants to merge 1 commit intoNVIDIA:mainfrom
rwgk:exclude_broken_cuda-toolkit_versions_windows_only
Open

Exclude broken cuda-toolkit wheels on Windows#1884
rwgk wants to merge 1 commit intoNVIDIA:mainfrom
rwgk:exclude_broken_cuda-toolkit_versions_windows_only

Conversation

@rwgk
Copy link
Copy Markdown
Collaborator

@rwgk rwgk commented Apr 9, 2026

Summary

  • exclude cuda-toolkit 12.9.2 and 13.0.3 only on Windows in cuda_pathfinder, cuda_core, and cuda_bindings
  • keep the existing floating 12.* and 13.* behavior on Linux, since the Linux matrix did not show the same breakage
  • preserve flexibility to pick up later good patch releases while avoiding the two known-bad Windows resolutions

Background

During CI for PR #1817, a group of Windows test jobs started failing in cuda.pathfinder strict mode after new cuda-toolkit patch releases became available on the package index.

Compared with the last successful CI run against main, the affected Windows jobs changed from resolving:

  • cuda-toolkit 12.9.1 to 12.9.2
  • cuda-toolkit 13.0.2 to 13.0.3

Those failing Windows jobs then stopped installing the full set of CTK DLL-providing packages such as cublas, cufft, curand, cusolver, cusparse, npp, and nvjpeg, and cuda.pathfinder strict checks failed as a result.

The strongest current hypothesis is that these two newly published cuda-toolkit patch releases have broken Windows dependency metadata for extras resolution. Linux did not show the same regression.

What This PR Changes

  • cuda_pathfinder

    • split the cu12 and cu13 toolkit requirements by platform
    • exclude 12.9.2 and 13.0.3 only when sys_platform == 'win32'
    • leave non-Windows resolution on the original 12.* / 13.* ranges
  • cuda_core

    • apply the same Windows-only exclusions to test-cu12, test-cu13, test-cu12-ft, and test-cu13-ft
  • cuda_bindings

    • apply the same Windows-only exclusion to the all extra for the toolkit packages it installs

Why Windows-Only

The exclusion-based workaround was first applied uniformly (Linux & Windows) under #1817. That is a significantly simpler change, but strictly speaking the observed breakage is Windows-specific:

  • the failing cuda.pathfinder jobs were Windows jobs
  • Linux jobs using floating 12.* / 13.* specs continued to pass

Restricting the exclusions to Windows makes the workaround narrower and keeps Linux on the normal floating-major behavior.

@rwgk rwgk added this to the cuda.pathfinder next milestone Apr 9, 2026
@rwgk rwgk self-assigned this Apr 9, 2026
@rwgk rwgk added P0 High priority - Must do! CI/CD CI/CD infrastructure labels Apr 9, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD CI/CD infrastructure P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant