ci: optimize workflows for Blacksmith + swap to gemma4 models#184
ci: optimize workflows for Blacksmith + swap to gemma4 models#184Gajesh2007 wants to merge 1 commit into
Conversation
Workflow optimizations: - ci.yml: Add Cargo artifact cache for provider tests (saves 3-8 min/run) - ci.yml: Replace manual go install with golangci-lint-action v9.2.0 (pre-built binary download, built-in analysis cache, GitHub annotations) - integration.yml: Cache Homebrew packages, Swift build artifacts, and HuggingFace models via actions/cache (Blacksmith transparent proxy at 400 MB/s, 25 GB free per repo) - release-swift.yml: Cache HuggingFace test model - threat-model-review.yml: Cache pip packages All caches leverage Blacksmith's transparent cache proxy — zero Blacksmith-specific actions needed, standard actions/cache is automatically intercepted and served from co-located storage. Model migration (Qwen -> Gemma 4): - Replace mlx-community/Qwen3.5-0.8B-MLX-4bit with gemma-4-e4b-4bit as the primary test model across CI, testbed, and benchmarks - Remove gemma-3-270m-4bit secondary model - Update KnownModelSizes, testbed defaults, benchmark configs, and fullstack integration test constants - Update RUNNER_DESC to reflect actual Blacksmith runner (M4 macos-26) Sticky disks (useblacksmith/stickydisk) were evaluated but are Linux-only (ext4 + block devices); all heavy CI jobs run on macOS Apple Silicon runners.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
No security-relevant production code changed in this PR; all modifications are CI workflow definitions, test infrastructure, and the threat-model-review automation itself. Trust boundaries touchedNone of the standard TB-xxx boundaries are touched. The changed files are GitHub Actions workflows and test/benchmark helpers that never run in the production coordinator, provider, or console-ui processes. Threat coverageNo T-xxx threats are affected. The diff contains no changes to authentication, encryption, attestation, billing, or any other security-sensitive path. New attack surface not covered by an existing threatTwo areas are worth a brief note for the threat model backlog — neither is an immediate finding, but both represent implicit trust decisions: 1. CI workflow permissions and secret exposure ( 2. SEC-* findings resolvedNone. 🔐 Threat model: |
| var KnownModelSizes = map[string]string{ | ||
| "mlx-community/Qwen3.5-0.8B-MLX-4bit": "0.5 GB", | ||
| "mlx-community/gemma-3-270m-4bit": "0.2 GB", | ||
| "mlx-community/gemma-4-e4b-4bit": "5.2 GB", |
There was a problem hiding this comment.
should we extend the list to also support https://huggingface.co/openai/gpt-oss-20b?
There was a problem hiding this comment.
heavy to start -- might need 20-30 minutes to download
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a18374fac8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| func DefaultModelConfig() ModelConfig { | ||
| return ModelConfig{ | ||
| ModelID: "mlx-community/gemma-3-270m", | ||
| ModelID: "mlx-community/gemma-4-e4b-4bit", |
There was a problem hiding this comment.
Keep CI defaults on a Swift-loadable model
When TESTBED_MODEL_ID is not set, this default is what the e2e/profile jobs pass to provider-swift. The Swift provider loads the cache entry with LLMModelFactory.shared.loadContainer, but the pinned libs/mlx-swift-lm submodule at 8d1cbcd does not register the Gemma 4 gemma4 model type, so a fresh CI run that downloads mlx-community/gemma-4-e4b-4bit will fail while loading the model rather than exercising inference. Keep the default on a supported text model, or update the Swift MLX submodule/model loader in the same change.
Useful? React with 👍 / 👎.
| set -euo pipefail | ||
| /tmp/mlxvenv/bin/pip install --quiet --no-cache-dir 'huggingface_hub[cli]' | ||
| /tmp/mlxvenv/bin/hf download mlx-community/Qwen3-0.6B-8bit \ | ||
| /tmp/mlxvenv/bin/hf download mlx-community/gemma-4-e4b-4bit \ |
There was a problem hiding this comment.
Download the model the release tests actually load
The release job still runs provider-swift live tests with DARKBLOOM_LIVE_MLX_TESTS=1, and those fixtures load LiveInferenceFixtures.tinyModelID (mlx-community/Qwen3-0.6B-8bit) rather than this Gemma model. On a fresh runner/cache this step no longer seeds the Qwen snapshot, so the live tests record the model as missing (or only pass because of stale cache state) instead of validating the release binary. Keep downloading the tiny Qwen model, or change the test fixture and supported loader together.
Useful? React with 👍 / 👎.
|
Found 21 test failures on Blacksmith runners: Failures
|

Summary
Optimize all CI workflows to leverage Blacksmith's infrastructure and migrate test models from Qwen to Gemma 4.
Workflow optimizations
ci.ymlci.ymlgolangci-lint-actionv9.2.0 (pre-built binary, analysis cache)integration.ymlrelease-swift.ymlthreat-model-review.ymlAll caches use standard
actions/cache— Blacksmith's transparent proxy intercepts them automatically at ~400 MB/s (vs GitHub's ~90 MB/s) with 25 GB free storage per repo.Sticky disks (
useblacksmith/stickydisk) were evaluated but are Linux-only (ext4 + block devices). All heavy CI jobs run on macOS Apple Silicon runners, soactions/cachewith the transparent proxy is the best available option.Model migration
mlx-community/Qwen3.5-0.8B-MLX-4bit→mlx-community/gemma-4-e4b-4bit(5.2 GB)mlx-community/gemma-3-270m-4bitsecondary modelmlx-community/Qwen3-0.6B-8bit→mlx-community/gemma-4-e4b-4bitKnownModelSizesRUNNER_DESCto reflect actual Blacksmith runner (macos-26 (M4 Blacksmith))Files changed (9)
.github/workflows/ci.yml— Cargo cache + golangci-lint action.github/workflows/integration.yml— Homebrew/Swift/HF caches + model swap.github/workflows/release-swift.yml— HF cache + model swap.github/workflows/threat-model-review.yml— pip cachee2e/testbed/config.go— model defaults + known sizese2e/testbed/suite.go— fallback model IDe2e/testbed/testbed_test.go— assertion updatese2e/benchmark_test.go— benchmark model configscoordinator/api/fullstack_integration_test.go— testModel constantNeed help on this PR? Tag
@codesmithwith what you need.