Skip to content

ci: split integration tests into parallel LLVM and Haskell jobs#993

Draft
Stevengre wants to merge 3 commits intomasterfrom
ci/split-llvm-tests-parallel
Draft

ci: split integration tests into parallel LLVM and Haskell jobs#993
Stevengre wants to merge 3 commits intomasterfrom
ci/split-llvm-tests-parallel

Conversation

@Stevengre
Copy link
Contributor

@Stevengre Stevengre commented Mar 18, 2026

Change Proof

Behavior Delta

Before: One integration-tests CI job ran all integration tests serially in a single Docker container, rebuilding stable-mir-json + kdist definitions each time. Every test_exec_smir parametrization triggered an independent llvm-kompile due to function-scoped tmp_path.

After:

  • CI splits into buildintegration-tests-llvm | integration-tests-haskell (parallel).
  • build job compiles once, commits the container to GHCR (ghcr.io/.../ci:<sha>). Test jobs pull the pre-built image — no duplicate make stable-mir-json build.
  • test_exec_smir shares kompile output across parametrizations via session-scoped fixtures + KompileDigest cache, eliminating redundant llvm-kompile calls for the same (smir_json, symbolic) pair.

Invariants / Non-goals

  • make test-integration still runs all 341 tests locally (unchanged target).
  • The two -k expressions partition tests with 0 overlap, 0 missing (115 LLVM + 226 Haskell = 341).
  • stable-mir-json-integration-tests, test-kmir-image, nix jobs are unchanged.
  • No test logic changed — only how/where tests are scheduled and how kompile directories are shared.

Validation

Check Result
-k LLVM expression 115/341 tests collected
-k Haskell expression 226/341 tests collected
Union coverage 115 + 226 = 341, 0 overlap, 0 missing
make check all passed (flake8, mypy, black, isort, autoflake)
make test-integration-llvm local 55 passed, 7 skipped, 4 failed (pre-existing on master)
make test-integration-haskell local 62 passed, 4 failed (pre-existing on master)
All 8 failures on master Reproduced identical failures on origin/master

Risk / Blast Radius / Rollback

Risk Mitigation
GHCR packages:write permission denied on PR workflows CI will fail at docker push step — clear error message. Fallback: revert to per-job independent builds (commit 2e30d107).
-k expression drift when new tests are added New tests not matching either expression will be silently skipped in CI. Mitigation: add a CI check or document the convention.
Session-scoped kompile cache race under pytest-xdist File-lock guard (open(lock_file, "x")) with 5-min timeout.
GHCR image accumulation Ephemeral CI images tagged by SHA. Can be cleaned up via retention policy or scheduled workflow.

Rollback: Revert this PR. No persistent state changes outside CI workflow.

Review Focus

  1. GHCR permissions (permissions: packages: write at workflow level + job level) — does the org allow this for PR workflows?
  2. -k expression completeness — verified programmatically (0 missing), but reviewers should confirm the convention is clear for future test additions.
  3. conftest.py file-lock mechanism — simple open(lock_file, "x") + poll loop. Adequate for pytest-xdist on a single machine; not designed for distributed execution.

Architecture Trace

Context (C4-L1)

CI pipeline for the mir-semantics project. Tests run on self-hosted GitHub Actions runners inside Docker containers.

Container (C4-L2)

graph LR
    B[build job<br/>self-hosted runner] -->|docker commit + push| GHCR[ghcr.io/.../ci:SHA]
    GHCR -->|docker pull| L[integration-tests-llvm<br/>self-hosted runner]
    GHCR -->|docker pull| H[integration-tests-haskell<br/>self-hosted runner]
    B -.->|needs| CQ[code-quality-checks<br/>ubuntu-latest]
    L -.->|needs| B
    H -.->|needs| B
Loading

Component (C4-L3)

graph TD
    subgraph "test_exec_smir parametrizations"
        T1["(smir_A, llvm)"] --> CD1["kompile_cache_dir / hash-A-concrete"]
        T2["(smir_A, haskell)"] --> CD2["kompile_cache_dir / hash-A-symbolic"]
        T3["(smir_B, llvm)"] --> CD3["kompile_cache_dir / hash-B-concrete"]
    end
    subgraph "KompileDigest cache"
        CD1 -->|"digest match → skip"| SKIP1[no llvm-kompile]
        CD1 -->|"digest miss → build"| BUILD1[llvm-kompile]
    end
Loading

Code Trace (C4-L4)

File Change
.github/workflows/test.yml Replace integration-tests with build + integration-tests-llvm + integration-tests-haskell. Add GHCR push/pull.
Makefile Add test-integration-llvm and test-integration-haskell targets with -k filters.
kmir/src/tests/integration/conftest.py New file: session-scoped kompile_cache_dir, exec_smir_kompile_dirs fixtures, get_exec_smir_target_dir() with file-lock.
kmir/src/tests/integration/test_integration.py test_exec_smir: replace tmp_path with shared kompile cache fixtures.

Decision Record

Decision: Use GHCR to share built Docker images across CI jobs instead of per-job independent rebuilds.

Alternatives considered:

  1. Per-job independent builds (evm-semantics/kontrol pattern) — simpler but duplicates expensive make stable-mir-json build across 2 jobs.
  2. docker save + GitHub Actions artifacts — no registry needed but Docker images are large (multi-GB), slow upload/download.
  3. Single job with parallel containers — avoids cross-runner sharing but loses independent job visibility in GitHub UI.

Trade-off: GHCR approach requires packages:write permission (may need org-level approval) but eliminates all duplicate build work and keeps jobs independently visible.

🤖 Generated with Claude Code

Stevengre and others added 3 commits March 18, 2026 01:53
- Phase 1: Add `test-integration-llvm` and `test-integration-haskell`
  Makefile targets that run the respective test subsets in parallel CI jobs.

- Phase 2: Fix redundant llvm-kompile calls in test_exec_smir by replacing
  function-scoped `tmp_path` with session-scoped fixtures (`kompile_cache_dir`,
  `exec_smir_kompile_dirs`) so KompileDigest caching hits for repeated
  (smir_json, symbolic) combinations within a test session.  A file-lock
  guards against concurrent pytest-xdist workers racing on the same dir.

- Phase 3: Add a shared `build` CI job that runs `make stable-mir-json` +
  `make build` once and commits the result as a Docker image; downstream
  `integration-tests-llvm` and `integration-tests-haskell` jobs start
  containers from that image, avoiding a redundant full rebuild per job.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous approach used `docker commit` to save a built image and
share it across jobs. This fails because self-hosted runners are
separate machines and local Docker images are not shared between them.

Switch to having each test job independently run `with-docker` + build,
matching the pattern used by other jobs in this workflow. The tests
themselves still run in parallel across two separate jobs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Build once in a dedicated job, commit the container image with all
kompile artifacts to GHCR, then pull it in the LLVM and Haskell test
jobs. This avoids rebuilding stable-mir-json and kdist definitions
twice.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Stevengre
Copy link
Contributor Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5cba7069be

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +79 to +81
docker commit mir-semantics-build-${GITHUB_SHA} ${IMAGE}
echo ${{ secrets.GITHUB_TOKEN }} | docker login ghcr.io -u ${{ github.actor }} --password-stdin
docker push ${IMAGE}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid pushing images that contain checkout credentials

This job checks out with actions/checkout@v4 using secrets.JENKINS_GITHUB_PAT and then commits/pushes the running container, which already has the copied workspace under /home/github-user/workspace; in v4, checkout persists auth credentials for git commands, so the committed layer can include those credentials (for example via repo git config) and expose them to anyone who can pull ghcr.io/.../ci:<sha>. Avoid committing that workspace as-is (or set persist-credentials: false and scrub .git before docker commit).

Useful? React with 👍 / 👎.

Comment on lines +62 to +65
with open(lock_file, 'x'):
target.mkdir(parents=True, exist_ok=True)
exec_smir_kompile_dirs[key] = target
lock_file.unlink(missing_ok=True)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep the lock held until kompilation has finished

The lock file is removed immediately after creating the target directory, but the expensive work (KMIR.from_kompiled_kore(...)) happens later in the test, so concurrent xdist workers that hit the same (smir_json, symbolic) key can still enter kompilation at the same time and write into the same target_dir, causing flaky or corrupted integration outputs. The critical section needs to include the actual kompile step, not just directory creation.

Useful? React with 👍 / 👎.

Comment on lines +70 to +73
while lock_file.exists() and secs < 300:
time.sleep(1)
secs += 1
exec_smir_kompile_dirs[key] = target

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Raise on lock timeout instead of using a possibly partial dir

When another worker holds the lock longer than 300 seconds, this code still records and returns target even if lock_file still exists, so callers may proceed against a partially built kompile directory and fail nondeterministically. If the timeout is reached, this should error out (or continue waiting) rather than pretending the cache entry is ready.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant