Skip to content

Commit fa122c7

Browse files
feat(chapel): Wave 2 — chapel-multilocale gate (-nl 2 via gasnet+smp, #87 option A) (#99)
Closes Wave 1 → Wave 2 transition for the optional Chapel mass-panic harness. Adds the seventh strict gate to `chapel-ci.yml`, exercising real multilocale execution by building Chapel 2.8.0 from source with `CHPL_COMM=gasnet` + `CHPL_LAUNCHER=smp` and running `mass-panic --numLocales=2` against the synthetic 2-repo corpus. ## What this closes - `chapel-multilocale` job defined + wired into `chapel-ci-gate` aggregator (now 7-of-7 instead of 6-of-6) - Source-build path documented + cached (cold ~30-40 min, warm ~30s) - `smp` launcher + `smp` GASNet substrate run two locales as oversubscribed local processes on a single GH runner ## Toolchain choice — option A from #87 Owner picked the build-from-source path. Not option B (no upstream `chapel-multilocale-2.8.0.deb`); not option C (no self-hosted runner infra). ## Cache strategy - `$CHPL_HOME = /opt/chapel-multilocale` cached via `actions/cache@v4` - Key: `${runner.os}-chapel-multilocale-2.8.0-gasnet-smp-v1` - Invalidate by bumping `CHAPEL_MULTILOCALE_CACHE_GEN` env var - 7-day idle eviction (GitHub policy) — normal `chapel/**` activity keeps it warm ## What this does NOT close (filed elsewhere or deferred) - Ruleset `required_status_checks` bump — optional: the aggregator is the only gate in the ruleset, and 7 ≤ 7 inside it is a no-op. Filed as a doc note instead of a separate PR. - ~50-repo benchmark for the "~5-15% slower" README claim — needs a beefier or self-hosted runner to be meaningful (`-nl 2` on a 2-core runner doesn't produce credible numbers). Tracked at #87 acceptance bullet 3. ## Test plan - [ ] First PR run will be cold-cache (~30-40 min build) — acceptable one-time tax - [ ] Subsequent runs hit cache; total job time ~3-5 min - [ ] Two-locale e2e verifies system-image-*.json mentions both repo-alpha and repo-beta (cross-locale aggregation) - [ ] Aggregator gate reports 7 underlying job results; SKIP path still passes immediately for non-chapel changes
1 parent 4c8b800 commit fa122c7

3 files changed

Lines changed: 162 additions & 12 deletions

File tree

.github/workflows/chapel-ci.yml

Lines changed: 142 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -16,20 +16,25 @@
1616
# - SUCCESS if all 6 underlying jobs succeeded on a relevant change.
1717
# - FAILURE if any underlying job failed.
1818
#
19-
# Six strict jobs (no continue-on-error anywhere):
19+
# Seven strict jobs (no continue-on-error anywhere):
2020
# 1. chapel-parse-check — chpl --parse-only on every module
2121
# 2. chapel-build — chpl build of mass-panic + smoke (no toolbox)
2222
# 3. chapel-smoke — chapel/smoke/two_repo_smoke (Chapel data flow)
2323
# 4. chapel-e2e — mass-panic -nl 1 on a synthetic 2-repo manifest
24-
# True -nl 2 requires CHPL_COMM=gasnet which the
25-
# stock .deb doesn't ship; tracked for Wave 2.
2624
# 5. chapel-cli-contract — panic-attack describe-contract vs expected fixture
2725
# 6. chapel-rust-diff — rayon assemblyline vs Chapel single-locale parity
26+
# 7. chapel-multilocale — mass-panic -nl 2 on the same synthetic 2-repo
27+
# corpus, against a Chapel built from source with
28+
# CHPL_COMM=gasnet + CHPL_LAUNCHER=smp (single-host
29+
# oversubscription). The source build is cached on
30+
# $CHPL_HOME; cold ~30-40 min, warm ~30s restore.
31+
# Closes the gap left by Wave 1 (issue #87).
2832
#
2933
# Plus the always-on aggregator: `chapel-ci-gate`.
3034
#
31-
# Wave 2 hardening tracker: SHA-pin the Chapel 2.8.0 .deb download. Today the
32-
# workflow trusts the HTTPS endpoint at chapel-lang/chapel releases.
35+
# Wave 2 hardening tracker: SHA-pin the Chapel 2.8.0 .deb + source tarball
36+
# downloads. Today the workflow trusts the HTTPS endpoints at chapel-lang/chapel
37+
# releases.
3338

3439
name: chapel-ci
3540

@@ -48,6 +53,11 @@ concurrency:
4853
env:
4954
CHAPEL_VERSION: "2.8.0"
5055
CHAPEL_DEB_URL: "https://github.com/chapel-lang/chapel/releases/download/2.8.0/chapel-2.8.0-1.ubuntu22.amd64.deb"
56+
# Source tarball used by chapel-multilocale to build with CHPL_COMM=gasnet.
57+
CHAPEL_SRC_URL: "https://github.com/chapel-lang/chapel/releases/download/2.8.0/chapel-2.8.0.tar.gz"
58+
# $CHPL_HOME for the multilocale build. Cache key bumps via CHAPEL_MULTILOCALE_CACHE_GEN.
59+
CHAPEL_MULTILOCALE_HOME: /opt/chapel-multilocale
60+
CHAPEL_MULTILOCALE_CACHE_GEN: "v1"
5161

5262
jobs:
5363
detect-relevant-changes:
@@ -248,12 +258,131 @@ jobs:
248258
- name: rayon vs Chapel single-locale aggregate parity
249259
run: ./chapel/tests/rayon_vs_chapel_diff.sh
250260

261+
chapel-multilocale:
262+
name: chapel-multilocale
263+
needs: detect-relevant-changes
264+
if: needs.detect-relevant-changes.outputs.relevant == 'true'
265+
runs-on: ubuntu-22.04
266+
timeout-minutes: 75
267+
env:
268+
CHPL_HOME: /opt/chapel-multilocale
269+
steps:
270+
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
271+
272+
# Cache the entire built-from-source Chapel tree. Key is stable across
273+
# PRs as long as the version, conduit, launcher and cache-gen marker
274+
# don't change. Cold build is ~30-40 min on a 2-core runner; warm
275+
# restore is ~30s.
276+
- name: Cache multilocale Chapel ($CHPL_HOME)
277+
id: chapel-cache
278+
uses: actions/cache@0057852bfaa89a56745cba8c7296529d2fc39830 # v4
279+
with:
280+
path: ${{ env.CHAPEL_MULTILOCALE_HOME }}
281+
key: ${{ runner.os }}-chapel-multilocale-${{ env.CHAPEL_VERSION }}-gasnet-smp-${{ env.CHAPEL_MULTILOCALE_CACHE_GEN }}
282+
283+
- name: Install Chapel build dependencies
284+
if: steps.chapel-cache.outputs.cache-hit != 'true'
285+
run: |
286+
set -euo pipefail
287+
sudo apt-get update -qq
288+
sudo apt-get install -y --no-install-recommends \
289+
build-essential gcc g++ make perl python3 \
290+
m4 autoconf automake libtool libunwind-dev pkg-config
291+
292+
- name: Build Chapel from source with CHPL_COMM=gasnet
293+
if: steps.chapel-cache.outputs.cache-hit != 'true'
294+
run: |
295+
set -euo pipefail
296+
curl -fsSL --retry 3 -o /tmp/chapel-src.tar.gz "${{ env.CHAPEL_SRC_URL }}"
297+
sudo mkdir -p /opt
298+
sudo tar -xzf /tmp/chapel-src.tar.gz -C /opt
299+
sudo mv "/opt/chapel-${{ env.CHAPEL_VERSION }}" "${{ env.CHAPEL_MULTILOCALE_HOME }}"
300+
sudo chown -R "$(id -u):$(id -g)" "${{ env.CHAPEL_MULTILOCALE_HOME }}"
301+
cd "${{ env.CHAPEL_MULTILOCALE_HOME }}"
302+
# Configure for single-host oversubscribed multilocale:
303+
# CHPL_COMM=gasnet — multilocale communication layer
304+
# CHPL_COMM_SUBSTRATE=smp — shared-memory substrate (no NIC needed)
305+
# CHPL_LAUNCHER=smp — spawn locales as local processes
306+
export CHPL_HOME="${{ env.CHAPEL_MULTILOCALE_HOME }}"
307+
export CHPL_COMM=gasnet
308+
export CHPL_COMM_SUBSTRATE=smp
309+
export CHPL_LAUNCHER=smp
310+
export CHPL_TARGET_COMPILER=gnu
311+
# CHPL_LLVM=none disables the LLVM backend (we only need the gnu C
312+
# backend for the multilocale smoke test). Without this, the build
313+
# tries to verify LLVM headers via clang/Basic/Version.h and aborts
314+
# with a CHPL_LLVM "unset" make target.
315+
export CHPL_LLVM=none
316+
# setchplenv.bash references ${MANPATH} unconditionally; GH runners
317+
# don't export MANPATH by default, so seed it before sourcing.
318+
export MANPATH="${MANPATH:-}"
319+
source util/setchplenv.bash
320+
# Build chpl + runtime + GASNet+smp substrate
321+
make -j"$(nproc)"
322+
# Sanity: Chapel's runtime layout creates `comm-gasnet` directories
323+
# somewhere under $CHPL_HOME/lib for each (comm, launcher, tasks, ...)
324+
# variant. find -print -quit is format-independent and survives
325+
# path-component renames between Chapel minor versions.
326+
find "$CHPL_HOME/lib" -type d -name comm-gasnet -print -quit | grep -q comm-gasnet
327+
328+
- name: Activate multilocale Chapel
329+
id: activate
330+
run: |
331+
set -euo pipefail
332+
export CHPL_HOME="${{ env.CHAPEL_MULTILOCALE_HOME }}"
333+
export CHPL_LLVM=none
334+
export MANPATH="${MANPATH:-}"
335+
source "$CHPL_HOME/util/setchplenv.bash"
336+
# Persist env to subsequent steps via GITHUB_ENV
337+
{
338+
echo "CHPL_HOME=$CHPL_HOME"
339+
echo "CHPL_COMM=gasnet"
340+
echo "CHPL_COMM_SUBSTRATE=smp"
341+
echo "CHPL_LAUNCHER=smp"
342+
echo "CHPL_TARGET_COMPILER=gnu"
343+
echo "PATH=$CHPL_HOME/bin/$(uname -s)-$(uname -m):$PATH"
344+
} >> "$GITHUB_ENV"
345+
chpl --version
346+
find "$CHPL_HOME/lib" -type d -name comm-gasnet -print -quit | grep -q comm-gasnet
347+
348+
- name: Build mass-panic against multilocale Chapel
349+
working-directory: chapel
350+
run: |
351+
set -euo pipefail
352+
chpl src/MassPanic.chpl src/Protocol.chpl src/Imaging.chpl src/Temporal.chpl -o mass-panic
353+
354+
- name: End-to-end -nl 2 exercise (oversubscribed locales on single runner)
355+
run: |
356+
set -euo pipefail
357+
WORK=$(mktemp -d /tmp/chapel-multilocale-XXXXXX)
358+
trap 'rm -rf "$WORK"' EXIT
359+
mkdir -p "$WORK/corpus/repo-alpha/src" "$WORK/corpus/repo-beta/src"
360+
echo 'pub unsafe fn a() {}' > "$WORK/corpus/repo-alpha/src/lib.rs"
361+
echo 'pub unsafe fn b() {}' > "$WORK/corpus/repo-beta/src/lib.rs"
362+
for d in repo-alpha repo-beta; do
363+
(cd "$WORK/corpus/$d" && git init -q && git add -A && git -c user.email=ci@example.com -c user.name=ci commit -q -m init)
364+
done
365+
# The smp launcher spawns N processes on the local host. -nl 2 is
366+
# the minimum non-trivial multilocale exercise; oversubscription
367+
# is fine for verification (latency, not throughput, matters here).
368+
./chapel/mass-panic \
369+
--repoDirectory="$WORK/corpus" \
370+
--numLocales=2 \
371+
--quiet \
372+
--outputDir="$WORK/out"
373+
# Two-locale run produced a system image
374+
ls "$WORK/out"/system-image-*.json >/dev/null
375+
# And that image references both repos (cross-locale aggregation)
376+
grep -q 'repo-alpha' "$WORK/out"/system-image-*.json
377+
grep -q 'repo-beta' "$WORK/out"/system-image-*.json
378+
echo "chapel-multilocale: PASS (-nl 2, gasnet+smp)"
379+
251380
# Always-on aggregator. This is the ONLY job listed in the Base ruleset's
252381
# required_status_checks rule. If detect-relevant-changes determined nothing
253382
# in this PR touches Chapel-relevant paths, the gate passes immediately
254-
# (the six per-task jobs above skip via their `if:` guard). If a relevant
383+
# (the seven per-task jobs above skip via their `if:` guard). If a relevant
255384
# change is present, the gate inspects each job's result and only passes
256-
# when ALL six succeeded.
385+
# when ALL seven succeeded.
257386
chapel-ci-gate:
258387
name: chapel-ci-gate
259388
needs:
@@ -264,6 +393,7 @@ jobs:
264393
- chapel-e2e
265394
- chapel-cli-contract
266395
- chapel-rust-diff
396+
- chapel-multilocale
267397
if: always()
268398
runs-on: ubuntu-22.04
269399
steps:
@@ -276,11 +406,12 @@ jobs:
276406
R_E2E: ${{ needs.chapel-e2e.result }}
277407
R_CLI: ${{ needs.chapel-cli-contract.result }}
278408
R_DIFF: ${{ needs.chapel-rust-diff.result }}
409+
R_MULTILOCALE: ${{ needs.chapel-multilocale.result }}
279410
run: |
280411
set -euo pipefail
281412
echo "detect-relevant-changes.outputs.relevant=$RELEVANT"
282-
printf 'parse-check=%s\nbuild=%s\nsmoke=%s\ne2e=%s\ncli-contract=%s\nrust-diff=%s\n' \
283-
"$R_PARSE" "$R_BUILD" "$R_SMOKE" "$R_E2E" "$R_CLI" "$R_DIFF"
413+
printf 'parse-check=%s\nbuild=%s\nsmoke=%s\ne2e=%s\ncli-contract=%s\nrust-diff=%s\nmultilocale=%s\n' \
414+
"$R_PARSE" "$R_BUILD" "$R_SMOKE" "$R_E2E" "$R_CLI" "$R_DIFF" "$R_MULTILOCALE"
284415
if [[ "$RELEVANT" != "true" ]]; then
285416
echo "chapel-ci-gate: SKIP (no chapel-relevant paths changed) → PASS"
286417
exit 0
@@ -291,7 +422,7 @@ jobs:
291422
exit 1
292423
fi
293424
fail=0
294-
for r in "$R_PARSE" "$R_BUILD" "$R_SMOKE" "$R_E2E" "$R_CLI" "$R_DIFF"; do
425+
for r in "$R_PARSE" "$R_BUILD" "$R_SMOKE" "$R_E2E" "$R_CLI" "$R_DIFF" "$R_MULTILOCALE"; do
295426
case "$r" in
296427
success) ;;
297428
*) fail=$((fail + 1)) ;;
@@ -301,4 +432,4 @@ jobs:
301432
echo "chapel-ci-gate: $fail dependent job(s) did not succeed → FAIL"
302433
exit 1
303434
fi
304-
echo "chapel-ci-gate: all six gates green → PASS"
435+
echo "chapel-ci-gate: all seven gates green → PASS"

CHANGELOG.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,23 @@
22

33
## [Unreleased]
44

5+
### Added (2026-06-01) — Chapel Wave 2: single-host multilocale gate
6+
- **`chapel-multilocale` CI gate** (#99, closes #87 option A): adds a 7th
7+
strict chapel-ci job that builds Chapel 2.8.0 from source with
8+
`CHPL_COMM=gasnet` + `CHPL_COMM_SUBSTRATE=smp` + `CHPL_LAUNCHER=smp`,
9+
caches `$CHPL_HOME` (`actions/cache@v4`, stable key with manual
10+
`CHAPEL_MULTILOCALE_CACHE_GEN` invalidation counter; cold build
11+
~30-40 min, warm restore ~30s for 7 days), runs
12+
`mass-panic --numLocales=2` against a synthetic 2-repo corpus, and
13+
greps the emitted `system-image-*.json` for both repo names to prove
14+
cross-locale aggregation actually executed. The Wave 1 binary `.deb`
15+
install path is single-locale only; this gate closes the gap.
16+
- Aggregator `chapel-ci-gate` updated to wait on the 7th job and to
17+
surface it as `multilocale=<result>` in the gate summary.
18+
- Wave 3 (`gasnet/ofi` over a real NIC across cluster nodes) and the
19+
~50-repo "~5-15% slower" benchmark from `chapel/README.md` remain
20+
parked — both need a beefier or self-hosted runner to be meaningful.
21+
522
### Fixed (2026-06-01) — baseline-red corrective maintenance
623
- **Dogfood Gate A2ML validation** restored (#94, #97): bumped
724
`hyperpolymath/a2ml-validate-action` from `59145c7d` to `6bff6ec` to

ROADMAP.adoc

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -183,7 +183,9 @@ but panic-attack flags these as generic UnsafeCode findings.
183183
* [x] Temporal diff subcommand: `--subcommand=diff` with global health/risk/weak-point deltas
184184
* [x] Single-locale scan validated against 303-repo estate (2026-04-12)
185185
* [ ] Per-node temporal diff: load full SystemImage JSON for per-repo health breakdown
186-
* [ ] Multi-machine orchestration: gasnet/ofi multi-locale Chapel run across cluster nodes
186+
* [~] Multi-locale Chapel orchestration:
187+
** [x] Single-host oversubscribed `gasnet+smp` (Wave 2, panic-attack#99 / #87 option A): `chapel-multilocale` CI gate exercises `mass-panic --numLocales=2` on a 2-repo synthetic corpus. Source-built Chapel cached on `$CHPL_HOME` (cold ~30-40 min, warm ~30s restore).
188+
** [ ] Cross-node `gasnet/ofi` over a real NIC (Wave 3): needs cluster runner — not exercisable on default GH runners.
187189
* [ ] VeriSimDB HTTP push from Chapel metalayer (currently file-only)
188190
* [x] `--scheduler=queue` — resumable dynamic work-pull scheduler for mass-panic. Atomic fetch-add work index shared across locales; per-run JSONL journal shards (`locale-<id>-<runId>.jsonl`) recording `{claim, done}` state per repo with full RepoResult payload on `done`; `--resume` replays every shard in the journal directory, reconstructs RepoResult records from prior runs, and skips those repos on the new run. ~5–15% slower than static on clean runs; a crash or Ctrl+C loses only the in-flight repo per locale. See `chapel/README.md` §Scheduling modes for the full spec.
189191

0 commit comments

Comments
 (0)