Skip to content

^B Use snapshot-cloudflare.debian.org CDN mirror (origin fallback) to end mirror-halting CI#392

Merged
omkhar merged 3 commits into
mainfrom
infra/debian-cloudflare-mirror
Jul 5, 2026
Merged

^B Use snapshot-cloudflare.debian.org CDN mirror (origin fallback) to end mirror-halting CI#392
omkhar merged 3 commits into
mainfrom
infra/debian-cloudflare-mirror

Conversation

@omkhar

@omkhar omkhar commented Jul 5, 2026

Copy link
Copy Markdown
Owner

Problem

The container build pins snapshot.debian.org as its sole Debian snapshot source. That origin is chronically overloaded and has been halting Container smoke and Validate repository for hours (repeated exit-38 / fetch failures), blocking every PR merge.

Fix

Switch the primary snapshot source to snapshot-cloudflare.debian.org — Debian's Cloudflare-CDN-fronted mirror of the same snapshot service — keeping snapshot.debian.org as a fallback.

Verified reproducibility-safe (from this host):

  • The pinned openssl_3.5.5-1~deb13u1_arm64.deb fetched from the CDN hashes to the exact committed SHA256 (92dfcdc2…).
  • The CDN serves the apt dists/trixie/Release path (HTTP 200).
  • DEBIAN_SNAPSHOT date and all SHA256 pins are unchanged.

What changed (5 files, 49 lines)

  • runtime/container/Dockerfile + tools/validator/Dockerfile: the TLS bootstrap fetch now alternates snapshot-cloudflare.debian.org / snapshot.debian.org per retry attempt (still SHA256-verified), and apt points at the CDN. Origin remains the automatic bootstrap fallback.
  • scripts/workcell: adds snapshot-cloudflare.debian.org:443 to the egress allowlist (both bootstrap_endpoints and the ephemeral-container allow set) — an addition alongside the existing origin, not a weakening.
  • scripts/verify-invariants.sh: enforces the new allowlist entry.
  • control-plane-manifest.json: regenerated for the scripts/workcell change.

Validation

Local: check-pinned-inputs.sh, hadolint (both Dockerfiles), shellcheck (both scripts), go test ./internal/metadatautil/... ./cmd/workcell-citools/..., control-plane manifest — all green. The container-build lanes in this PR's own CI exercise the CDN end-to-end.

Security note for reviewers: please scrutinize the egress-allowlist addition — it adds one verified Debian CDN hostname; the default-deny posture and all denies are unchanged.

🤖 Generated with Claude Code

…allback (the pinned snapshot.debian.org origin is chronically overloaded and has been halting Container smoke + Validate for hours; snapshot-cloudflare.debian.org is Debian's Cloudflare-fronted mirror of the same service and was verified to serve byte-identical content - the pinned openssl .deb hashes to the exact committed SHA256 - and the apt Release path; both Dockerfiles now bootstrap TLS by alternating cloudflare/origin per retry attempt (still SHA256-verified) and point apt at the CDN, with the origin kept as the bootstrap fallback and in the egress allowlist; DEBIAN_SNAPSHOT date and every pin are unchanged so the build stays reproducible) (pinned-inputs/hadolint/shellcheck/go test/manifest green locally, container build confirms in CI; build-reliability fix touching the egress allowlist - reviewer please scrutinize the allowlist addition)
@omkhar

omkhar commented Jul 5, 2026

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: efaa286111

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread runtime/container/Dockerfile Outdated
Comment thread scripts/workcell

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: efaa286111

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread runtime/container/Dockerfile Outdated
…) (P2a: apt sources listed only the CDN, so a transient CDN outage during apt-get update/install would still halt the build despite the bootstrap fallback - append snapshot.debian.org as a second apt source in both Dockerfiles so apt has true mirror redundancy, same signed/hash-verified content; P2b: operators denying snapshot.debian.org:443 to block Debian egress would silently regain it via the CDN since deny_endpoints match exact host:port - document in injection-policy.md that both mirror endpoints must be denied) (hadolint/pinned-inputs/markdownlint/manifest green; supporting resilience + egress-policy doc for the mirror PR)
@omkhar

omkhar commented Jul 5, 2026

Copy link
Copy Markdown
Owner Author

Fixed both: (P2a) both Dockerfiles now list snapshot.debian.org as a second apt source after the CDN, so apt has genuine mirror redundancy — a transient CDN outage during apt-get falls through to the origin (identical signed, hash-verified content). (P2b) documented in injection-policy.md that the automatic Debian snapshot egress now uses two endpoints and, since deny_endpoints match exact host:port, blocking it requires denying both snapshot-cloudflare.debian.org:443 and snapshot.debian.org:443. hadolint/pinned-inputs/markdownlint/manifest green.

@omkhar

omkhar commented Jul 5, 2026

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 91e6e8a6bd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread runtime/container/Dockerfile Outdated
… (P2) (listing cloudflare and origin as concurrent apt sources meant every apt-get update contacted both, so a slow/unreachable origin - the exact failure being avoided - would fail the update and stall the retry budget on every attempt; replace the static dual-source list with a set_snapshot_sources helper and call it per retry attempt (odd=cloudflare, even=origin) in both Dockerfiles, so each attempt uses a single mirror and a down mirror only fails its own attempt before falling through to the other) (hadolint/pinned-inputs/manifest green, Container smoke already passed the CDN build; supporting the mirror-resilience fix)
@omkhar

omkhar commented Jul 5, 2026

Copy link
Copy Markdown
Owner Author

Fixed: replaced the concurrent dual-source apt list with a per-attempt mirror selection. A set_snapshot_sources helper rewrites sources.list to a single host, and the retry loop calls it per attempt (odd=cloudflare CDN, even=origin) in both Dockerfiles. So each apt-get update contacts exactly one mirror — a slow/down origin no longer fails the update or burns the retry budget when the CDN is healthy; it only costs that one attempt before falling through. Container smoke already passed the CDN build on the prior push. hadolint/pinned-inputs/manifest green.

@omkhar

omkhar commented Jul 5, 2026

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: af6e0c0cde

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread runtime/container/Dockerfile
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Delightful!

Reviewed commit: af6e0c0cde

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@omkhar

omkhar commented Jul 5, 2026

Copy link
Copy Markdown
Owner Author

Pushing back on this one, with reasoning. The build-time per-attempt fallback is safe because I control the retry loop. Runtime in-session apt-get is a direct invocation I can't wrap in a retry loop, so the trade-off is different: if the runtime sources.list lists both mirrors concurrently, apt-get update fails whenever either mirror is unreachable (apt errors on any unfetchable source index), which is strictly worse than a single reliable source. So the right runtime choice is the single CDN mirror — and note this is an improvement, not a regression: before this PR the runtime image shipped a single snapshot.debian.org source (the flaky origin, no fallback either); after it, runtime apt uses the single reliable CDN. Adding true runtime multi-mirror fallback would require wrapping the session's apt in a retry loop, which is a separate enhancement outside this PR's scope (fixing the build halting). The egress allowlist keeps both hosts available so that future enhancement remains possible.

@omkhar omkhar merged commit 5fe02a3 into main Jul 5, 2026
14 checks passed
@omkhar omkhar deleted the infra/debian-cloudflare-mirror branch July 5, 2026 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant