Skip to content

Fix SDE outage handling for main CI#256

Merged
Navi Bot (project-navi-bot) merged 1 commit into
mainfrom
codex/main-sde-outage-gate
Jun 19, 2026
Merged

Fix SDE outage handling for main CI#256
Navi Bot (project-navi-bot) merged 1 commit into
mainfrom
codex/main-sde-outage-gate

Conversation

@Fieldnote-Echo

Copy link
Copy Markdown
Member

Summary

  • make routine ci/coverage SDE jobs soft-skip on Intel mirror outages for PR and push runs, while keeping manual workflow_dispatch fail-closed
  • add a tag-workflow release-avx512 job that fails closed and reruns the SDE CPUID probe plus AVX-512 tests before release assets can be staged
  • update release invariants/docs/changelog so releases cannot publish on skipped AVX-512 coverage

Root Cause

The post-merge main push run failed in setup-intel-sde: Intel downloadmirror returned a challenge payload, SHA-256 verification failed, and #255 had made push jobs fail closed. That made routine main CI red for an external mirror outage.

Verification

  • python3 tests/release_publish_invariants.py
  • bash tests/release_signed_release_invariants.sh
  • bash tests/release_publish_invariants.sh
  • git diff --check
  • python3 -m py_compile tests/release_publish_invariants.py
  • go install github.com/rhysd/actionlint/cmd/actionlint@v1.7.12 && "$(go env GOPATH)/bin/actionlint" -color .github/workflows/ci.yml .github/workflows/coverage.yml .github/workflows/release.yml

@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@qodo-code-review

Copy link
Copy Markdown

PR Summary by Qodo

Soft-skip Intel SDE outages in CI; fail-closed AVX-512 release gate
🐞 Bug fix ✨ Enhancement 🧪 Tests 📝 Documentation ⚙️ Configuration changes 🕐 20-40 Minutes

Grey Divider

Description

• Allow CI/coverage AVX-512 lanes to skip on Intel mirror outages for PR/push.
• Add release-avx512 job to re-prove SDE CPUID and AVX-512 tests on tags.
• Update docs and invariants so releases cannot publish on skipped AVX-512 coverage.
Diagram

graph TD
  EV["GitHub event"] --> CI["ci.yml: avx512"] --> ACT["setup-intel-sde"] --> MIR["Intel downloadmirror"]
  EV --> COV["coverage.yml: coverage"] --> ACT
  EV --> REL["release.yml: release-avx512"] --> ACT --> TST["AVX-512 probe+tests"] --> DRAFT["release-assets-draft"]
Loading
High-Level Assessment

The following are alternative approaches to this PR:

1. Mirror Intel SDE into a first-party artifact store
  • ➕ Eliminates Intel mirror/WAF as a CI dependency for routine runs and releases
  • ➕ Makes availability and integrity controls fully internal
  • ➖ May be blocked by licensing/redistribution terms for Intel SDE
  • ➖ Requires ongoing artifact management and secure hosting
2. Run AVX-512 lane on self-hosted runners with preinstalled SDE
  • ➕ Avoids repeated downloads and external outage sensitivity
  • ➕ Can tightly control environment for AVX-512 proof
  • ➖ Operational overhead (maintenance, security hardening, capacity)
  • ➖ Potentially harder to reproduce compared to GitHub-hosted runners

Recommendation: The PR’s approach is a good balance: keep routine PR/push CI resilient to third-party outages, while moving the fail-closed requirement into the tag-triggered release workflow where it matters for publishing. The added invariant checks reduce the risk of future drift, and the new release-avx512 dependency on asset staging ensures releases cannot proceed on a soft-skipped AVX-512 lane.

Files changed (7) +178 / -49

Tests (2) +72 / -25
release_publish_invariants.pyEnforce routine soft-skip vs release fail-closed SDE invariants +70/-23

Enforce routine soft-skip vs release fail-closed SDE invariants

• Generalizes SDE cache job checks to parameterize allow-unavailable, outage notice conditions, and whether SDE-dependent steps must be guarded. Adds invariant enforcement for the new release-avx512 job and ensures release-assets-draft needs it.

tests/release_publish_invariants.py

release_signed_release_invariants.shPin release-assets-draft dependencies to include release-avx512 +2/-2

Pin release-assets-draft dependencies to include release-avx512

• Updates the signed release graph invariants to require release-avx512 as a dependency of release-assets-draft, preventing drift that would allow staging without AVX-512 proof.

tests/release_signed_release_invariants.sh

Documentation (2) +14 / -11
CHANGELOG.mdDocument new release-time AVX-512 fail-closed guarantee +6/-5

Document new release-time AVX-512 fail-closed guarantee

• Updates the changelog entry to clarify that routine CI may soft-skip, but release publishing is blocked unless the tag workflow reruns and passes the SDE AVX-512 proof.

CHANGELOG.md

RELEASING.mdUpdate release invariants to rely on tag workflow AVX-512 proof +8/-6

Update release invariants to rely on tag workflow AVX-512 proof

• Reframes the operational guidance: routine CI/coverage may skip during mirror challenges, but release.yml’s release-avx512 job must pass before asset staging. Documents what to do if the cache misses and Intel download is unavailable.

RELEASING.md

Other (3) +92 / -13
ci.ymlSoft-skip SDE outages for PR and push runs +9/-8

Soft-skip SDE outages for PR and push runs

• Broadens Intel SDE outage soft-skip behavior from pull_request-only to all non-workflow_dispatch events. Updates the warning step to reflect that the release workflow has a separate fail-closed AVX-512 proof.

.github/workflows/ci.yml

coverage.ymlSoft-skip SDE outages for routine coverage runs +4/-4

Soft-skip SDE outages for routine coverage runs

• Aligns coverage workflow behavior with CI by allowing SDE unavailability for all non-workflow_dispatch events. Updates the outage warning text to point to the release-time fail-closed proof.

.github/workflows/coverage.yml

release.ymlAdd fail-closed release-avx512 job and gate asset staging +79/-1

Add fail-closed release-avx512 job and gate asset staging

• Introduces a new release-avx512 job that installs Intel SDE (no soft-skip), runs an AVX-512 CPUID probe, and executes AVX-512 test lanes under SDE. Makes release-assets-draft depend on release-avx512 so release assets cannot be staged without this proof.

.github/workflows/release.yml

@qodo-code-review

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider

Great, no issues found!

Qodo reviewed your code and found no material issues that require review

Grey Divider

Qodo Logo

@project-navi-bot Navi Bot (project-navi-bot) merged commit 3564e43 into main Jun 19, 2026
31 checks passed
@project-navi-bot Navi Bot (project-navi-bot) deleted the codex/main-sde-outage-gate branch June 19, 2026 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants