Reduce helix queue fan-out for library tests on PR builds#126168
Reduce helix queue fan-out for library tests on PR builds#126168davidwrighton wants to merge 1 commit intodotnet:mainfrom
Conversation
Reduce the number of helix queues used for library test execution on PR builds, deferring additional OS versions/distros to rolling builds. Linux x64: 3 queues -> 1 on PRs Keep: Ubuntu 26.04 (primary) Rolling-only: AzureLinux 3.0, CentOS Stream 10 Windows x64: 4 queues -> 2 on PRs Keep: Windows Server 2025 (newest), Nano Server 1809 (distinct env) Rolling-only: Windows Server 2022, Windows 11 Client Browser WASM Windows: 2 queues -> 1 on PRs Keep: Windows Server 2025 Rolling-only: Windows Server 2022 Estimated savings per library PR (based on build 1351553): linux_x64: 771 -> 257 work items (~13h saved) windows_x64: 1120 -> 560 work items (~9h saved) browser_wasm_win: 406 -> 203 work items (~21h saved) Rolling/CI builds continue to test all OS versions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Reduces Helix queue fan-out for libraries test execution on PR-triggered builds by running a smaller “primary” OS set on PRs and deferring additional OS coverage to non-PR builds.
Changes:
- Linux x64: run only Ubuntu 26.04 on PRs; keep AzureLinux 3.0 and CentOS Stream 10 only for non-PR builds.
- Windows x64: run Server 2025 (+ Nano Server for non-mono) on PRs; keep Server 2022, Windows 11, and (outerloop) ServerRS5 only for non-PR builds.
- Browser WASM Windows: run Server 2025 on PRs; keep Server 2022 only for non-PR builds.
| - (AzureLinux.3.0.Amd64.Open)AzureLinux.3.Amd64.Open@mcr.microsoft.com/dotnet-buildtools/prereqs:azurelinux-3.0-helix-amd64 | ||
| - (Centos.10.Amd64.Open)AzureLinux.3.Amd64.Open@mcr.microsoft.com/dotnet-buildtools/prereqs:centos-stream-10-helix-amd64 | ||
| # Additional distros on rolling builds for broader coverage | ||
| - ${{ if eq(variables['isRollingBuild'], true) }}: |
There was a problem hiding this comment.
The added distros are gated only on variables['isRollingBuild']. That breaks the jobParameters.includeAllPlatforms escape hatch: callers that explicitly set includeAllPlatforms: true (e.g. NativeAOT outerloop, which is intended to allow PR runs with full coverage) will still not get these additional Linux queues on PR builds because isRollingBuild is false for PR-triggered runs.
Consider updating the condition to include jobParameters.includeAllPlatforms (e.g. or(eq(variables['isRollingBuild'], true), eq(parameters.jobParameters.includeAllPlatforms, true))) so includeAllPlatforms: true continues to mean “run the full queue set” even on PRs.
| - ${{ if eq(variables['isRollingBuild'], true) }}: | |
| - ${{ if or(eq(variables['isRollingBuild'], true), eq(parameters.jobParameters.includeAllPlatforms, true)) }}: |
| # Additional Windows versions on rolling builds for broader coverage | ||
| - ${{ if eq(variables['isRollingBuild'], true) }}: | ||
| - Windows.Amd64.Server2022.Open | ||
| - Windows.11.Amd64.Client.Open | ||
| - ${{ if eq(parameters.jobParameters.testScope, 'outerloop') }}: | ||
| - (Windows.10.Amd64.ServerRS5.Open)windows.10.amd64.serverrs5.open@mcr.microsoft.com/dotnet-buildtools/prereqs:windowsservercore-ltsc2019-helix-amd64 |
There was a problem hiding this comment.
Same issue here: the extra Windows queues are conditional only on variables['isRollingBuild'], so jobParameters.includeAllPlatforms: true won’t re-enable Server2022/Win11 (or the outerloop RS5 queue) on PR builds. That changes the meaning of includeAllPlatforms for consumers that use it to force full coverage.
Recommend conditioning these additional queues on or(isRollingBuild, includeAllPlatforms) instead of isRollingBuild alone.
| # Additional Windows version on rolling builds | ||
| - ${{ if eq(variables['isRollingBuild'], true) }}: |
There was a problem hiding this comment.
The additional Browser WASM Windows queue is now gated only on variables['isRollingBuild'], so jobParameters.includeAllPlatforms: true won’t re-enable Server2022 on PR runs.
If includeAllPlatforms is intended to force full coverage regardless of PR vs non-PR, consider using or(isRollingBuild, includeAllPlatforms) here too.
| # Additional Windows version on rolling builds | |
| - ${{ if eq(variables['isRollingBuild'], true) }}: | |
| # Additional Windows version on rolling builds or when all platforms are requested | |
| - ${{ if or(eq(variables['isRollingBuild'], true), eq(parameters.jobParameters.includeAllPlatforms, true)) }}: |
| - (Ubuntu.2604.Amd64.Open)AzureLinux.3.Amd64.Open@mcr.microsoft.com/dotnet-buildtools/prereqs:ubuntu-26.04-helix-amd64 | ||
| - (AzureLinux.3.0.Amd64.Open)AzureLinux.3.Amd64.Open@mcr.microsoft.com/dotnet-buildtools/prereqs:azurelinux-3.0-helix-amd64 | ||
| - (Centos.10.Amd64.Open)AzureLinux.3.Amd64.Open@mcr.microsoft.com/dotnet-buildtools/prereqs:centos-stream-10-helix-amd64 | ||
| # Additional distros on rolling builds for broader coverage |
There was a problem hiding this comment.
The comments say “rolling builds”, but in this repo isRollingBuild (from eng/pipelines/libraries/variables.yml) is defined as “not a PullRequest build” (notIn(Build.Reason, 'PullRequest')). That means these extra queues will run for any non-PR build reason (including manual/CI), not just the scheduled rolling builds.
To avoid confusion for future maintainers, consider rewording these comments to match the actual condition (e.g. “non-PR builds”).
| # Additional distros on rolling builds for broader coverage | |
| # Additional distros on non-PR builds for broader coverage |
Note
This PR was generated with the assistance of GitHub Copilot.
Summary
Reduce the number of helix queues used for library test execution on PR builds. Currently, each library test job fans out to multiple OS versions/distros, running the entire test suite on each. This multiplies work items significantly. Additional OS versions are deferred to rolling builds (twice daily) which continue to test everything.
Changes (
eng/pipelines/libraries/helix-queues-setup.yml)Linux x64: 3 queues → 1 on PRs
Windows x64: 4 queues → 2 on PRs
Nano Server is kept because it's a fundamentally different environment (minimal OS, no GUI, different API surface).
Browser WASM Windows: 2 queues → 1 on PRs
Impact
Based on build 1351553 (a library-only PR with 200h total helix execution):
This change alone reduces that build from ~200h to ~157h of helix execution.
What's NOT affected