Skip to content

Prevent silent regression of #17527: pin cascade + if/else ordering, add GA ship-build guard#17557

Draft
radical wants to merge 3 commits into
microsoft:release/13.4from
radical:ankj/fix-init-nuget-staging-feed-followup
Draft

Prevent silent regression of #17527: pin cascade + if/else ordering, add GA ship-build guard#17557
radical wants to merge 3 commits into
microsoft:release/13.4from
radical:ankj/fix-init-nuget-staging-feed-followup

Conversation

@radical
Copy link
Copy Markdown
Member

@radical radical commented May 27, 2026

Three follow-ups to PR #17528 (which fixed #17527). Each addresses an inline review thread on that PR; together they make a silent regression of the staging-feed misroute hard to re-introduce, and add a release-pipeline safety net for the related GA-ship foot-gun.

What was at risk

PR #17528 closed #17527 by reordering the release/*-branch arm of the AspireCliChannel cascade ahead of the versionKind=release arm in eng/pipelines/templates/build_sign_native.yml, and by routing the synthesized staging channel's quality through the CLI build's version shape on the stagingIdentityChannel arm of PackagingService.GetChannelsAsync. The fix is correct, but it leaves three load-bearing invariants with no test coverage and one queue-time foot-gun on the GA ship path:

Source build $buildId has aspire-cli-channel='staging', not 'stable'.

The selected microsoft-aspire build was queued with the default
AspireCliChannel resolution, which bakes 'staging' for release/*
branches. Publishing this build to nuget.org would ship a CLI
binary that identifies as 'staging' and writes the corresponding
nuget.config from `aspire init` for end users — see #17527.

That error block is what the GA ship path now raises if the operator forgets aspireCliChannelOverride=stable at queue time. Before this PR, the same condition silently shipped a staging-baked CLI to nuget.org users.

What this PR adds

Three independent commits, each landable on its own:

1. refactor(cli): extract Aspire CLI channel cascade to compute-cli-channel.ps1

The cascade lived as inline pwsh: in build_sign_native.yml with zero test coverage. A future maintainer reordering the arms back to "obvious" order would re-introduce #17527 and only the next stabilizing build would notice.

Moves the cascade to eng/scripts/compute-cli-channel.ps1 with -Reason, -SourceBranch, -PrNumber, -Override, -VersionKind parameters. The YAML step retains the DotNetFinalVersionKind MSBuild resolution (it needs the in-tree dotnet + Versions.props) and forwards the rest to the script. Write-Host diagnostics and the ##vso[task.setvariable variable=aspireCliChannel] emission are preserved verbatim; a new Write-Output $channel line gives non-AzDO callers (tests, dev runs) a stable stdout contract.

Adds tests/Infrastructure.Tests/PowerShellScripts/ComputeCliChannelTests.cs (13 cases, Theory-based, reuses the existing PowerShellCommand helper that StageNativeCliToolPackagesTests etc. already use). Cases pin every cascade arm including the #17527 fix-pin, the override path (including Stable-mixed-case normalization), and both throw paths (non-digit PR number, invalid override).

2. test(cli): pin PackagingService identity+requested ordering against #17527 regression

The new test added in #17528 (GetChannelsAsync_WhenIdentityChannelIsStagingOnStableShapedCli_DefaultsToStableQuality) calls GetChannelsAsync() with no requestedChannelName, so a future refactor that put stagingChannelRequested first would still pass it — silently re-introducing the misroute for the combinations that actually occur in production:

  • InitCommand.cs:671GetChannelsAsync(ct, identityChannel) on a staging CLI running aspire init
  • IntegrationPackageSearchService.cs:26 and DotNetBasedAppHostServerProject.cs:334 once aspire.config.json pins channel: staging

Adds three tests that exercise both signals together: identity=staging + requested=staging for both stable-shaped (expect Stable, pins ordering) and prerelease-shaped (expect Both, pins the predicate) version shapes, plus identity=staging + configuration["channel"]=staging + stable-shaped (pins the configured-arm path that aspire add hits). No production code change.

3. ci(release): guard GA ship build against AspireCliChannel != stable

After #17528, release/* source builds default to AspireCliChannel=staging. The GA ship path silently depends on the release manager queueing microsoft-aspire (definition 1602) with aspireCliChannelOverride=stable. Without this guard, one missed queue-time parameter ships a staging-baked CLI to nuget.org and aspire init writes a staging-feed nuget.config for every GA user.

Three pieces:

  • eng/scripts/compute-cli-channel.ps1 additionally emits ##vso[build.addbuildtag]aspire-cli-channel - <channel> so the source build's tag set carries the resolved channel. Tag shape mirrors the existing release-version - X.Y.Z and BAR ID - <id> tags so the consumer reuses the same tag-fetch REST call. build.addbuildtag is idempotent across jobs, so the per-RID build_sign_native invocations setting the same tag is safe.

  • eng/pipelines/release-publish-nuget.yml adds a Validate Source Build CLI Channel step right after BAR ID extraction. Fetches the source build's tags, finds aspire-cli-channel - *, and fails unless the resolved channel is stable. Gated by a new AllowNonStableCliChannel advanced parameter (default false) — the documented escape hatch. DryRun also bypasses the guard so the pipeline remains testable against any source build. Error messages name the cause, point at the right pipeline + queue-time override, and tell the operator how to bypass.

  • docs/release-process.md updates: the Signed Build prerequisite highlights aspireCliChannelOverride=stable; the source-build selection step tells the release manager to verify aspire-cli-channel - stable at pick time; the Advanced parameters table documents AllowNonStableCliChannel.

The ComputeCliChannelTests Theory assertions extend to also pin the ##vso[build.addbuildtag] emission, so a future refactor that drops it would break the test suite rather than silently disabling the pipeline guard.

Surprises and call-outs

  • Builds queued before this PR lands will fail the guard because they have no aspire-cli-channel - * tag. That's intentional — the guard's error message tells the operator to queue a fresh source build with the override. For the 13.4 GA, you'll want a fresh source build off this branch.

  • The whole queue-time-override mechanism is intended to be temporary. Issue #17550 is scoped to remove it for 13.5+ by decoupling channel identity from build-time baking via a sidecar manifest written at acquisition time. When that lands, both the aspireCliChannelOverride parameter and the Validate Source Build CLI Channel guard become moot and should be removed together.

  • No source-build behavior change in Items 1 and 2. Item 1 is a pure refactor (verified by smoke-testing every cascade case locally); Item 2 is pure test addition. The only source-build behavior change is in Item 3, where compute-cli-channel.ps1 adds the addbuildtag emission — that's additive on the source build and a no-op for any caller other than the new release-publish-nuget guard.

  • Three logical commits, one PR. Items are independently landable but bundled here for review economy. If reviewers prefer to ship Items 1 + 2 first and Item 3 separately, the commits are clean and can be split.

Validation

./dotnet.sh test --project tests/Aspire.Cli.Tests/Aspire.Cli.Tests.csproj \
  --no-launch-profile -- --filter-class "*.PackagingServiceTests" \
  --filter-not-trait "quarantined=true" --filter-not-trait "outerloop=true"
# → 53/53 pass (50 existing + 3 new)

./dotnet.sh test --project tests/Infrastructure.Tests/Infrastructure.Tests.csproj \
  --no-launch-profile -- --filter-class "*.ComputeCliChannelTests" \
  --filter-not-trait "quarantined=true" --filter-not-trait "outerloop=true"
# → 13/13 pass

Also smoke-tested compute-cli-channel.ps1 locally against every cascade arm including both throw paths and the mixed-case override.

radical and others added 3 commits May 27, 2026 15:35
…nel.ps1

PR microsoft#17528 fixed microsoft#17527 by reordering
the channel-detect cascade in `eng/pipelines/templates/build_sign_native.yml`
so a release-branch build with `DotNetFinalVersionKind=release` (the steady
state during stabilization) resolves to `staging` rather than `stable`. That
reorder is the single most load-bearing line in the fix, and it lived as
inline `pwsh:` with zero test coverage. A future maintainer reordering the
arms back to "obvious" order would re-introduce the staging-misroute and only
the next stabilizing build would notice.

Extract the cascade into `eng/scripts/compute-cli-channel.ps1` with the same
parameters the YAML step previously bound (`-Reason`, `-SourceBranch`,
`-PrNumber`, `-Override`, `-VersionKind`). The YAML step keeps the
`DotNetFinalVersionKind` resolution (which needs the in-tree `dotnet` +
`Versions.props`) and forwards everything else to the script. The script
preserves the existing `Write-Host` diagnostics and the
`##vso[task.setvariable variable=aspireCliChannel]` emission so the rest of
`build_sign_native.yml` is unaffected; the resolved channel additionally
shows up on stdout via `Write-Output` for non-AzDO callers.

Add `tests/Infrastructure.Tests/PowerShellScripts/ComputeCliChannelTests.cs`
with a Theory-based set of cases that reuses the existing `PowerShellCommand`
helper. Cascade cases covered:

- release-branch + versionKind=release → `staging` (the microsoft#17527 fix-pin)
- release-branch + versionKind=prerelease → `staging` (early stabilization)
- internal/release branch → `staging`
- main + versionKind=release → `stable` (unchanged)
- main + versionKind=prerelease → `daily` (unchanged)
- override `stable` beats the release-branch arm (the GA ship-build path)
- override `staging` and `daily` route as requested
- override `auto` falls through to the cascade
- PullRequest with numeric `prNumber` → `pr-<N>`

Negative cases:

- PullRequest with the unresolved `$(System.PullRequest.PullRequestNumber)`
  macro string → throws (defense-in-depth that was previously untested)
- override `garbage` → throws (the `-notin` validation arm)
- override `Stable` (mixed case) → normalized to `stable` (pins the
  `.ToLowerInvariant()` workaround for PowerShell's case-insensitive
  `-notin`; without it, the binary would build but
  `IdentityChannelReader.IsValidChannel` would reject it at CLI startup)

No behavior change for the source build itself: the script's output (both
the `##vso[task.setvariable]` and the human-readable diagnostics) matches
the previous inline block byte-for-byte.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…icrosoft#17527 regression

The fix that landed in PR microsoft#17528 reorders the `if/else` in
`PackagingService.GetChannelsAsync` so the `stagingIdentityChannel` arm runs
before `stagingChannelConfigured || stagingChannelRequested`. The new test
added in that PR — `GetChannelsAsync_WhenIdentityChannelIsStagingOnStableShapedCli_DefaultsToStableQuality`
— calls `GetChannelsAsync()` with no `requestedChannelName`, so a future
refactor that put `stagingChannelRequested` first (and forced `Both`) would
still pass that test, silently re-introducing the microsoft#17527 misroute on the
combinations that actually occur in production:

- `InitCommand.cs:671` — `GetChannelsAsync(ct, identityChannel)` on a staging
  CLI running `aspire init`.
- `IntegrationPackageSearchService.cs:26` (used by `aspire add`) and
  `DotNetBasedAppHostServerProject.cs:334` once a project's
  aspire.config.json pins `channel: staging`.

Add three tests that exercise both signals together:

- identity=staging + requested=staging + stable-shaped → `Stable`. This is
  the explicit ordering pin: if `stagingChannelRequested` ran first the
  result would be `Both` and the test fails.

- identity=staging + requested=staging + prerelease-shaped → `Both`. The
  companion case asserts the identity arm itself consults
  `_isStableShapedCliVersion`; a refactor that dropped the predicate from
  the identity arm would flip this prerelease case from `Both` to `Stable`
  and silently misroute Aspire.* away from the shared daily feed where
  prerelease-shaped staging packages actually live.

- identity=staging + `configuration["channel"]=staging` + stable-shaped →
  `Stable`. Same shape but for the configured-arm path, which is what
  `aspire add` against a staging-pinned project hits.

No production code changes. Verified by running just the new tests:
`dotnet test --project tests/Aspire.Cli.Tests/Aspire.Cli.Tests.csproj --no-launch-profile -- --filter-class "*.PackagingServiceTests" --filter-not-trait "quarantined=true" --filter-not-trait "outerloop=true"`
→ 53/53 pass (50 existing + 3 new).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
After PR microsoft#17528 reordered the channel-detect cascade so release-branch
source builds default to `AspireCliChannel=staging`, the GA ship path
silently depends on the release manager remembering to queue
`microsoft-aspire` (definition 1602) with `aspireCliChannelOverride=stable`.
Following the release docs at face value still selects "any successful
signed build", so a single missed override at queue time publishes a
`staging`-baked CLI binary to nuget.org — `aspire init` then writes a
nuget.config that maps Aspire.* to the staging feed for end users, and
`aspire add` 404s once the SHA-derived darc-pub feed for that build
expires. microsoft#17527 is the
underlying issue.

Add a release-pipeline guard that fails fast if the selected source build
is not stable-channel.

Mechanism, three pieces:

- `eng/scripts/compute-cli-channel.ps1` additionally emits
  `##vso[build.addbuildtag]aspire-cli-channel - <channel>` so the
  source build's tag set carries the resolved channel. The tag shape
  mirrors the existing `release-version - X.Y.Z` and `BAR ID - <id>`
  tags so the consumer can reuse the same tag-fetch REST call.
  `build.addbuildtag` is idempotent across jobs in the same build, so
  per-RID `build_sign_native` invocations all setting the same tag is
  safe — AzDO dedupes them on the build.

- `eng/pipelines/release-publish-nuget.yml` adds a
  `Validate Source Build CLI Channel` step right after the BAR ID
  extraction. It fetches the source build's tags, finds the
  `aspire-cli-channel - *` tag, and fails the pipeline unless the
  resolved channel is `stable`. The step is gated by a new
  `AllowNonStableCliChannel` advanced parameter (default `false`) — the
  documented escape hatch for an intentional non-stable ship. `DryRun`
  also bypasses the guard so the pipeline remains testable against any
  source build. Error messages name the cause (missing tag vs.
  non-stable channel), point at the `microsoft-aspire` definition and
  the `aspireCliChannelOverride=stable` queue-time parameter, and tell
  the operator how to bypass.

- `docs/release-process.md` updates: the Signed Build prerequisite
  highlights the override requirement; the source-build selection step
  tells the release manager to verify the `aspire-cli-channel - stable`
  tag at pick time; the Advanced parameters table documents
  `AllowNonStableCliChannel`.

Test coverage in `ComputeCliChannelTests.cs` extends the Theory-based
output assertions to also pin the `##vso[build.addbuildtag]aspire-cli-channel - <channel>`
emission, so a future refactor that drops it would break the test
suite rather than silently disabling the pipeline guard.

13.5+ planning: this guard is a 13.4 safety net for the same risk that
microsoft#17550 is scoped to remove
entirely by decoupling channel identity from build-time baking via a
sidecar manifest written at acquisition time. Once microsoft#17550 lands, the
queue-time override and this guard become moot and can be removed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 17557

Or

  • Run remotely in PowerShell:
iex "& { $(irm https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 17557"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant