fix(install): remove dangling avatars build context from docker-compose#1476
Merged
Merged
Conversation
4 tasks
joelteply
added a commit
that referenced
this pull request
May 30, 2026
Two complementary changes, both architecturally driven by Joel 2026-05-30: "We don't need to rebuild all docker obviously until we go into main. Takes a lot of machines. ... Fix properly. What broke, what is the long term goal." What broke: PR #1476's avatars-context fix succeeded but install-smoke still failed at 25m45s. The 'pull pr-N image, silently fall back to local build if missing' chain meant that for ANY PR where the dev hadn't run scripts/push-current-arch.sh, install.sh's `compose pull 2>/dev/null || warn ... will build locally` slipped into `compose up` → `docker build` → `cargo build --release` → timeout. That's the wrong default in two dimensions: per-PR docker rebuilds aren't worth it at the canary level (would consume many machines per PR), and the silent downgrade hides the actual issue (image missing) behind a 25-min compute burn. Long-term goal: the docker build is bloated by Node-legacy chat surface that the Rust-core / thin-Node-client extraction will remove. Once that's done, builds are small enough that per-PR images become viable. Until then, canary PR install-smoke validates the install PATH against canary's binary; the BINARY validation runs at main promotion when fresh images get built. Two changes: 1. .github/workflows/carl-install-smoke.yml — default to :canary for every PR run (and manual triggers). The previous logic interpolated to pr-${PR_NUMBER} for PRs, which silently required an image that the canary-stage workflow shouldn't depend on. workflow_dispatch `image_tag` input still works for the rare explicit pr-N case (binary regression debug, historical canary check, etc.). 2. scripts/ci/carl-install-smoke.sh — add a pre-flight check that verifies all 4 required image variants (continuum-core-vulkan, node-server, widget-server, model-init) exist at the resolved tag. If missing, fail-LOUD with a concrete diagnostic ("dev push pipeline didn't publish, run scripts/push-current-arch.sh") instead of silently falling through to install.sh's local-build path. The CARL_ALLOW_LOCAL_BUILD=1 escape hatch is preserved for explicit build-path debugging. Net effect: - canary PRs (the common case) → tag :canary → images exist → install smoke runs against canary's binary in normal time. - canary images somehow missing (real bug) → fail-LOUD with actionable message, not silent 25-min timeout. - main-promotion runs and explicit pr-N tests → still work via workflow_dispatch input. The avatars-context fix from PR #1476 is NOT included here — it's a separate concern (the docker-compose dangling line); PR #1476 lands that piece. This commit fixes the CI-side silent-downgrade pattern. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
joelteply
added a commit
that referenced
this pull request
May 30, 2026
…1480) * fix(ci): canary tag default for install-smoke + fail-loud precheck Two complementary changes, both architecturally driven by Joel 2026-05-30: "We don't need to rebuild all docker obviously until we go into main. Takes a lot of machines. ... Fix properly. What broke, what is the long term goal." What broke: PR #1476's avatars-context fix succeeded but install-smoke still failed at 25m45s. The 'pull pr-N image, silently fall back to local build if missing' chain meant that for ANY PR where the dev hadn't run scripts/push-current-arch.sh, install.sh's `compose pull 2>/dev/null || warn ... will build locally` slipped into `compose up` → `docker build` → `cargo build --release` → timeout. That's the wrong default in two dimensions: per-PR docker rebuilds aren't worth it at the canary level (would consume many machines per PR), and the silent downgrade hides the actual issue (image missing) behind a 25-min compute burn. Long-term goal: the docker build is bloated by Node-legacy chat surface that the Rust-core / thin-Node-client extraction will remove. Once that's done, builds are small enough that per-PR images become viable. Until then, canary PR install-smoke validates the install PATH against canary's binary; the BINARY validation runs at main promotion when fresh images get built. Two changes: 1. .github/workflows/carl-install-smoke.yml — default to :canary for every PR run (and manual triggers). The previous logic interpolated to pr-${PR_NUMBER} for PRs, which silently required an image that the canary-stage workflow shouldn't depend on. workflow_dispatch `image_tag` input still works for the rare explicit pr-N case (binary regression debug, historical canary check, etc.). 2. scripts/ci/carl-install-smoke.sh — add a pre-flight check that verifies all 4 required image variants (continuum-core-vulkan, node-server, widget-server, model-init) exist at the resolved tag. If missing, fail-LOUD with a concrete diagnostic ("dev push pipeline didn't publish, run scripts/push-current-arch.sh") instead of silently falling through to install.sh's local-build path. The CARL_ALLOW_LOCAL_BUILD=1 escape hatch is preserved for explicit build-path debugging. Net effect: - canary PRs (the common case) → tag :canary → images exist → install smoke runs against canary's binary in normal time. - canary images somehow missing (real bug) → fail-LOUD with actionable message, not silent 25-min timeout. - main-promotion runs and explicit pr-N tests → still work via workflow_dispatch input. The avatars-context fix from PR #1476 is NOT included here — it's a separate concern (the docker-compose dangling line); PR #1476 lands that piece. This commit fixes the CI-side silent-downgrade pattern. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(ci): only gate install-smoke precheck on heavy Rust image First iteration of the precheck required ALL 4 images (continuum-core- vulkan, node-server, widget-server, model-init). Initial run on this PR (#1480) revealed canary has continuum-core-vulkan published but the lighter TS sidecar images (node-server, widget-server, model-init) aren't always at the canary tag — the dev push pipeline publishes the Rust slice on different cadences than the TS slices. Per Joel 2026-05-30: "node-server / model-init / widgets ... build in under a minute on either arch." Those local builds DON'T blow the 25-min timeout that triggered the original failure mode. So gating the smoke on all 4 images is over-strict — it fails the gate for the common case where canary's Rust is fresh but the TS sidecars aren't yet published at that tag. Refinement: precheck gates only on continuum-core-vulkan (the heavy one whose local build is the 25-min cargo build --release). The lighter TS sidecars are documented as "pulled if present, built locally if not" — install.sh's existing compose-pull-then-build fallback is fine for those because their local build is fast. This restores the intended semantic: catch the SLOW silent fallback (Rust source build) and fail-loud; let the FAST sidecar fallback through as install.sh always did. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
3 tasks
The `avatars: ./src/models/avatars` additional_context was added in 9b1f6ca (April 2026) when the plan was to bake CC0 avatar VRMs into the continuum-core image. That plan never landed end-to-end — docker/continuum-core.Dockerfile lines 131-143 document the rollback: src/models is gitignored, the dir doesn't exist in CI checkouts, and the Dockerfile uses `RUN mkdir -p /app/avatars` as a placeholder instead of COPYing from the avatars context. The compose-side context declaration was left behind, dangling. No Dockerfile uses `--from=avatars` (verified by grep), so the declaration referenced nothing in build instructions. But docker compose validates that ALL additional_contexts resolve at build time — a missing local context dir fails the whole build with "stat /tmp/carl-smoke-NNNN/src/ models/avatars: no such file or directory". That's the exact failure mode currently blocking carl-install-smoke on PR #1475 (Mac Intel hardware tier) — any PR that touches install.sh triggers carl-install-smoke, which has been silently broken by this dangling context since the rollback. Other PRs (e.g. #1471, #1473, #1474) didn't touch install.sh so the check never ran on them; the break was invisible until now. Removing the line restores the carl-install-smoke happy path while keeping the Dockerfile's empty-dir placeholder intact. Restore the build context when the avatar-provisioning story lands (LFS, model-init download, or curl from a CC0 URL in CI before docker build) per the gap noted in docs/infrastructure/PR891-E2E-VALIDATION.md. Inline comment preserves the context-of-removal in the file so a future contributor doesn't re-add the dangling line. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
f7cdcfb to
41d3148
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Remove dangling
avatars: ./src/models/avatarsadditional_context from docker-compose.yml that has been silently breakingcarl-install-smokefor any PR that touches install.sh.Root cause
April 2026 commit 9b1f6ca added the build context expecting CC0 avatar VRMs to be baked in. That plan rolled back — docker/continuum-core.Dockerfile lines 131-143 document the rollback (src/models is gitignored; Dockerfile uses
RUN mkdir -p /app/avatarsplaceholder instead).The compose-side
additional_contextsdeclaration was left behind. No Dockerfile uses--from=avatars(verified by grep), so the declaration referenced nothing — but docker compose validates ALL declared additional_contexts resolve at build time. Missing local context dir fails the build withstat .../src/models/avatars: no such file or directory.Impact
PR #1475 (Mac Intel hardware tier) currently fails
carl-install-smokebecause it touches install.sh → triggers the check → docker build fails on the missing avatars context. PRs that DON'T touch install.sh (#1471, #1473, #1474) didn't run the check, so the break was invisible until now.Fix
Remove the dangling line. Replace with an explanatory comment so a future contributor doesn't re-add it without restoring the avatar-provisioning story (LFS, model-init download, or CC0-URL curl) per the gap in docs/infrastructure/PR891-E2E-VALIDATION.md.
Test plan
carl-install-smokepasses on this PR (and on follow-up rebase of feat: Mac Intel hardware tier + cognition perf pass #1475 once this merges)