ci: optimize workflows for Blacksmith + swap to gemma4 models by Gajesh2007 · Pull Request #184 · Layr-Labs/d-inference

Gajesh2007 · 2026-05-18T22:54:29Z

Summary

Optimize all CI workflows to leverage Blacksmith's infrastructure and migrate test models from Qwen to Gemma 4.

Workflow optimizations

Workflow	Optimization	Expected impact
`ci.yml`	Cargo artifact cache for provider tests	Save 3-8 min/run
`ci.yml`	`golangci-lint-action` v9.2.0 (pre-built binary, analysis cache)	Save ~30s + better annotations
`integration.yml`	Cache Homebrew, Swift .build, HuggingFace models	Save 5-10 min/run on warm cache
`release-swift.yml`	Cache HuggingFace test model	Save ~2 min/release
`threat-model-review.yml`	pip cache	Save ~15s

All caches use standard actions/cache — Blacksmith's transparent proxy intercepts them automatically at ~400 MB/s (vs GitHub's ~90 MB/s) with 25 GB free storage per repo.

Sticky disks (useblacksmith/stickydisk) were evaluated but are Linux-only (ext4 + block devices). All heavy CI jobs run on macOS Apple Silicon runners, so actions/cache with the transparent proxy is the best available option.

Model migration

Primary test model: mlx-community/Qwen3.5-0.8B-MLX-4bit → mlx-community/gemma-4-e4b-4bit (5.2 GB)
Removed: mlx-community/gemma-3-270m-4bit secondary model
Release tests: mlx-community/Qwen3-0.6B-8bit → mlx-community/gemma-4-e4b-4bit
Updated testbed defaults, benchmark configs, fullstack integration test constants, and KnownModelSizes
Updated RUNNER_DESC to reflect actual Blacksmith runner (macos-26 (M4 Blacksmith))

Files changed (9)

.github/workflows/ci.yml — Cargo cache + golangci-lint action
.github/workflows/integration.yml — Homebrew/Swift/HF caches + model swap
.github/workflows/release-swift.yml — HF cache + model swap
.github/workflows/threat-model-review.yml — pip cache
e2e/testbed/config.go — model defaults + known sizes
e2e/testbed/suite.go — fallback model ID
e2e/testbed/testbed_test.go — assertion updates
e2e/benchmark_test.go — benchmark model configs
coordinator/api/fullstack_integration_test.go — testModel constant

^{Need help on this PR? Tag @codesmith with what you need.}

Let Codesmith autofix CI failures and bot reviews

Workflow optimizations: - ci.yml: Add Cargo artifact cache for provider tests (saves 3-8 min/run) - ci.yml: Replace manual go install with golangci-lint-action v9.2.0 (pre-built binary download, built-in analysis cache, GitHub annotations) - integration.yml: Cache Homebrew packages, Swift build artifacts, and HuggingFace models via actions/cache (Blacksmith transparent proxy at 400 MB/s, 25 GB free per repo) - release-swift.yml: Cache HuggingFace test model - threat-model-review.yml: Cache pip packages All caches leverage Blacksmith's transparent cache proxy — zero Blacksmith-specific actions needed, standard actions/cache is automatically intercepted and served from co-located storage. Model migration (Qwen -> Gemma 4): - Replace mlx-community/Qwen3.5-0.8B-MLX-4bit with gemma-4-e4b-4bit as the primary test model across CI, testbed, and benchmarks - Remove gemma-3-270m-4bit secondary model - Update KnownModelSizes, testbed defaults, benchmark configs, and fullstack integration test constants - Update RUNNER_DESC to reflect actual Blacksmith runner (M4 macos-26) Sticky disks (useblacksmith/stickydisk) were evaluated but are Linux-only (ext4 + block devices); all heavy CI jobs run on macOS Apple Silicon runners.

vercel · 2026-05-18T22:54:34Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
d-inference	Ready	Preview	May 18, 2026 10:55pm
d-inference-console-ui-dev	Ready	Preview	May 18, 2026 10:55pm
d-inference-landing	Ready	Preview	May 18, 2026 10:55pm

github-actions · 2026-05-18T22:55:03Z

No security-relevant production code changed in this PR; all modifications are CI workflow definitions, test infrastructure, and the threat-model-review automation itself.

Trust boundaries touched

None of the standard TB-xxx boundaries are touched. The changed files are GitHub Actions workflows and test/benchmark helpers that never run in the production coordinator, provider, or console-ui processes.

Threat coverage

No T-xxx threats are affected. The diff contains no changes to authentication, encryption, attestation, billing, or any other security-sensitive path.

New attack surface not covered by an existing threat

Two areas are worth a brief note for the threat model backlog — neither is an immediate finding, but both represent implicit trust decisions:

1. CI workflow permissions and secret exposure (.github/workflows/)
The threat model explicitly marks the release supply chain out of scope, so this is noted purely for completeness. release-swift.yml presumably has access to signing credentials and the coordinator/api/fullstack_integration_test.go and e2e/ tests run against real coordinator code. If any workflow GITHUB_TOKEN permission or externally supplied secret (e.g. the release API key, A-005) is broader than needed, it could be abused in a compromised-runner scenario. Worth auditing permissions: blocks against least-privilege, but not in scope for this threat model.

2. e2e/testbed/ — testbed config reachability
e2e/testbed/config.go and suite.go configure coordinator endpoints and likely inject test API keys or mock attestation material. If any testbed fixture (fake attestation blob, mock MDM callback, permissive CORS origin) were accidentally reachable in a staging environment that shares production secrets, it could touch TB-005 (unauthenticated MDM webhook) or TB-009 (attestation chain). This is low-risk as written — the files live under e2e/ and are not compiled into production binaries — but the threat model's existing note that api.dev.darkbloom.xyz is out of scope should be confirmed to cover the testbed target URL if one is hardcoded in config.go.

SEC-* findings resolved

None.

🔐 Threat model: docs/threat-model.yaml · Updates on each push to this PR

ethenotethan · 2026-05-18T22:56:51Z

 var KnownModelSizes = map[string]string{
-	"mlx-community/Qwen3.5-0.8B-MLX-4bit": "0.5 GB",
-	"mlx-community/gemma-3-270m-4bit":     "0.2 GB",
+	"mlx-community/gemma-4-e4b-4bit": "5.2 GB",


should we extend the list to also support https://huggingface.co/openai/gpt-oss-20b?

heavy to start -- might need 20-30 minutes to download

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a18374fac8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-18T23:00:20Z

 func DefaultModelConfig() ModelConfig {
 	return ModelConfig{
-		ModelID:     "mlx-community/gemma-3-270m",
+		ModelID:     "mlx-community/gemma-4-e4b-4bit",


Keep CI defaults on a Swift-loadable model

When TESTBED_MODEL_ID is not set, this default is what the e2e/profile jobs pass to provider-swift. The Swift provider loads the cache entry with LLMModelFactory.shared.loadContainer, but the pinned libs/mlx-swift-lm submodule at 8d1cbcd does not register the Gemma 4 gemma4 model type, so a fresh CI run that downloads mlx-community/gemma-4-e4b-4bit will fail while loading the model rather than exercising inference. Keep the default on a supported text model, or update the Swift MLX submodule/model loader in the same change.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-18T23:00:21Z

          set -euo pipefail
          /tmp/mlxvenv/bin/pip install --quiet --no-cache-dir 'huggingface_hub[cli]'
-          /tmp/mlxvenv/bin/hf download mlx-community/Qwen3-0.6B-8bit \
+          /tmp/mlxvenv/bin/hf download mlx-community/gemma-4-e4b-4bit \


Download the model the release tests actually load

The release job still runs provider-swift live tests with DARKBLOOM_LIVE_MLX_TESTS=1, and those fixtures load LiveInferenceFixtures.tinyModelID (mlx-community/Qwen3-0.6B-8bit) rather than this Gemma model. On a fresh runner/cache this step no longer seeds the Qwen snapshot, so the live tests record the model as missing (or only pass because of stale cache state) instead of validating the release binary. Keep downloading the tiny Qwen model, or change the test fixture and supported loader together.

Useful? React with 👍 / 👎.

blacksmith-sh · 2026-05-18T23:34:33Z

Found 21 test failures on Blacksmith runners:

Failures

Test	View Logs
`github.com/eigeninference/d-inference/e2e/TestBenchmark_HeavyLoad_100Concurrent_10KB`	View Logs
`github.com/eigeninference/d-inference/e2e/TestBenchmark_HighConcurrency`	View Logs
`github.com/eigeninference/d-inference/e2e/TestBenchmark_ManyUsers`	View Logs
`github.com/eigeninference/d-inference/e2e/TestBenchmark_MultiModelMultiProvider`	View Logs
`github.com/eigeninference/d-inference/e2e/TestBenchmark_QueueSaturation`	View Logs
`github.com/eigeninference/d-inference/e2e/TestBenchmark_SingleModelScaling`	View Logs
`github.com/eigeninference/d-inference/e2e/TestBenchmark_SingleModelScaling/5-providers`	View Logs
`github.com/eigeninference/d-inference/e2e/TestBenchmark_SingleProviderNonStreaming`	View Logs
`github.com/eigeninference/d-inference/e2e/TestBenchmark_SingleProviderStreaming`	View Logs
`github.com/eigeninference/d-inference/e2e/TestIntegration_AttestationHeaders`	View Logs
`github.com/eigeninference/d-inference/e2e/TestIntegration_BillingBalanceDeduction`	View Logs
`github.com/eigeninference/d-inference/e2e/TestIntegration_ConcurrentRequests`	View Logs
`github.com/eigeninference/d-inference/e2e/TestIntegration_E2EEncryptionCorrectness`	View Logs
`github.com/eigeninference/d-inference/e2e/TestIntegration_MultipleRequestsAccounting`	View Logs
`github.com/eigeninference/d-inference/e2e/TestIntegration_NonStreamingInference`	View Logs
`github.com/eigeninference/d-inference/e2e/TestIntegration_ProviderPayoutSplit`	View Logs
`github.com/eigeninference/d-inference/e2e/TestIntegration_ReferralRewardDistribution`	View Logs
`github.com/eigeninference/d-inference/e2e/TestIntegration_StreamingContentValidation`	View Logs
`github.com/eigeninference/d-inference/e2e/TestIntegration_StreamingInference`	View Logs
`github.com/eigeninference/d-inference/e2e/TestIntegration_SwiftProviderRealRoutingGates`	View Logs
`github.com/eigeninference/d-inference/e2e/TestProfile_SingleProviderNonStreaming`	View Logs

^{Need help on this PR? Tag @codesmith with what you need.}

vercel Bot deployed to Preview – d-inference-landing May 18, 2026 22:54 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev May 18, 2026 22:55 View deployment

vercel Bot deployed to Preview – d-inference May 18, 2026 22:55 View deployment

Gajesh2007 requested a review from ethenotethan May 18, 2026 22:56

ethenotethan reviewed May 18, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: optimize workflows for Blacksmith + swap to gemma4 models#184

ci: optimize workflows for Blacksmith + swap to gemma4 models#184
Gajesh2007 wants to merge 1 commit into
masterfrom
ci-optimizations

Gajesh2007 commented May 18, 2026 •

edited by blacksmith-sh Bot

Loading

Uh oh!

vercel Bot commented May 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

ethenotethan May 18, 2026

Uh oh!

Gajesh2007 May 18, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Uh oh!

blacksmith-sh Bot commented May 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Gajesh2007 commented May 18, 2026 • edited by blacksmith-sh Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Workflow optimizations

Model migration

Files changed (9)

Uh oh!

vercel Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 18, 2026

Trust boundaries touched

Threat coverage

New attack surface not covered by an existing threat

SEC-* findings resolved

Uh oh!

ethenotethan May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Gajesh2007 May 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

blacksmith-sh Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Failures

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Gajesh2007 commented May 18, 2026 •

edited by blacksmith-sh Bot

Loading

vercel Bot commented May 18, 2026 •

edited

Loading

blacksmith-sh Bot commented May 18, 2026 •

edited

Loading