Skip to content

feat(chapel): Wave 2 — chapel-multilocale gate (-nl 2 via gasnet+smp, #87 option A)#99

Merged
hyperpolymath merged 7 commits into
mainfrom
chapel/wave2-multilocale-from-source
Jun 2, 2026
Merged

feat(chapel): Wave 2 — chapel-multilocale gate (-nl 2 via gasnet+smp, #87 option A)#99
hyperpolymath merged 7 commits into
mainfrom
chapel/wave2-multilocale-from-source

Conversation

@hyperpolymath
Copy link
Copy Markdown
Owner

Closes Wave 1 → Wave 2 transition for the optional Chapel mass-panic harness. Adds the seventh strict gate to chapel-ci.yml, exercising real multilocale execution by building Chapel 2.8.0 from source with CHPL_COMM=gasnet + CHPL_LAUNCHER=smp and running mass-panic --numLocales=2 against the synthetic 2-repo corpus.

What this closes

  • chapel-multilocale job defined + wired into chapel-ci-gate aggregator (now 7-of-7 instead of 6-of-6)
  • Source-build path documented + cached (cold ~30-40 min, warm ~30s)
  • smp launcher + smp GASNet substrate run two locales as oversubscribed local processes on a single GH runner

Toolchain choice — option A from #87

Owner picked the build-from-source path. Not option B (no upstream chapel-multilocale-2.8.0.deb); not option C (no self-hosted runner infra).

Cache strategy

  • $CHPL_HOME = /opt/chapel-multilocale cached via actions/cache@v4
  • Key: ${runner.os}-chapel-multilocale-2.8.0-gasnet-smp-v1
  • Invalidate by bumping CHAPEL_MULTILOCALE_CACHE_GEN env var
  • 7-day idle eviction (GitHub policy) — normal chapel/** activity keeps it warm

What this does NOT close (filed elsewhere or deferred)

  • Ruleset required_status_checks bump — optional: the aggregator is the only gate in the ruleset, and 7 ≤ 7 inside it is a no-op. Filed as a doc note instead of a separate PR.
  • ~50-repo benchmark for the "~5-15% slower" README claim — needs a beefier or self-hosted runner to be meaningful (-nl 2 on a 2-core runner doesn't produce credible numbers). Tracked at chapel: Wave 2 — real multi-locale cluster validation (-nl 16+) on a non-trivial corpus #87 acceptance bullet 3.

Test plan

  • First PR run will be cold-cache (~30-40 min build) — acceptable one-time tax
  • Subsequent runs hit cache; total job time ~3-5 min
  • Two-locale e2e verifies system-image-*.json mentions both repo-alpha and repo-beta (cross-locale aggregation)
  • Aggregator gate reports 7 underlying job results; SKIP path still passes immediately for non-chapel changes

Adds a 7th strict gate to chapel-ci.yml that exercises real multilocale
execution by building Chapel 2.8.0 from source with `CHPL_COMM=gasnet`
and `CHPL_LAUNCHER=smp`, then running `mass-panic --numLocales=2`
against the same synthetic 2-repo corpus as `chapel-e2e`.

Closes the Wave 1 gap: the stock `.deb` ships `CHPL_COMM=none` and
rejects `-nl >1`, so until now the multi-locale code path had no
CI coverage. The `smp` launcher and `smp` GASNet substrate let two
locales run as oversubscribed local processes on a single ubuntu-22.04
runner — verification, not performance.

Implementation choice — owner picked option A from issue #87:
- Build from source with `CHPL_COMM=gasnet`, aggressive caching.
- Not option B (chapel-multilocale .deb — none published upstream).
- Not option C (self-hosted runner — no infrastructure to maintain).

Cache strategy:
- `$CHPL_HOME = /opt/chapel-multilocale` cached on `actions/cache@v4`.
- Key stable on `${runner.os}-chapel-multilocale-2.8.0-gasnet-smp-v1`.
- Bump `CHAPEL_MULTILOCALE_CACHE_GEN` env var to invalidate.
- Cold build: ~30-40 min on 2-core runner. Warm restore: ~30s.
- Cache eviction after 7 days idle (GitHub policy); chapel/** touches
  in normal repo activity keep it warm.

Aggregator gate updated:
- `chapel-ci-gate` now waits on 7 jobs (added `chapel-multilocale`).
- `R_MULTILOCALE` env var added to the success-aggregation loop.
- Doc comments updated from "six gates" → "seven gates".

Acceptance criteria from #87 (partial closure):
- [x] `chapel-multilocale` job defined and wired into aggregator
- [ ] Job green on PR + main ≥1 merge cycle (this PR)
- [ ] Added to Base ruleset `required_status_checks` (separate ruleset edit)
- [ ] ~50-repo benchmark for README perf claim (separate PR; needs
      either a beefier runner or self-hosted CI to be meaningful)

The aggregator job is the only one in the ruleset, so 7 → 7 is a
no-op there: when this PR is green and merged, the aggregator just
covers one more underlying job. Ruleset bump is therefore optional
unless the owner wants per-gate visibility.

Refs: #87
@hyperpolymath hyperpolymath enabled auto-merge (squash) June 1, 2026 20:49
util/setchplenv.bash references ${MANPATH} unconditionally. Under
set -euo pipefail on a clean GH runner (MANPATH not exported), this
trips 'MANPATH: unbound variable' and aborts before chpl --make
even starts.

Fix: export MANPATH=${MANPATH:-} in both 'Build Chapel from source'
and 'Activate multilocale Chapel' steps.
CHANGELOG.md gets an Added-2026-06-01 entry for #99 (option A closure
of #87): 7th strict chapel-ci gate, gasnet+smp single-host
oversubscribed, source-built + $CHPL_HOME-cached, cross-locale
verification via system-image-*.json grep.

ROADMAP.adoc v3.0.0 'Multi-machine orchestration' bullet split into
two: [x] single-host oversubscribed (Wave 2 landed) and
[ ] cross-node gasnet/ofi over a real NIC (Wave 3, needs cluster
runner).
Second cold-build attempt failed at:

  Error: Please set the environment variable CHPL_LLVM to a supported value.
    1) 'none' to build with minimal LLVM support
    2) 'bundled' ...
    3) 'system' ...

Chapel's compiler-builds tries to verify LLVM headers via
clang/Basic/Version.h before building; on Ubuntu 22.04 we don't have
LLVM dev headers installed, and we don't need them — CHPL_TARGET_COMPILER=gnu
already targets the C backend.

Setting CHPL_LLVM=none disables the LLVM backend entirely. Multilocale
GASNet+smp comms don't depend on LLVM, only on the runtime layer.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

🔍 Hypatia Security Scan

Findings: 96 issues detected

Severity Count
🔴 Critical 5
🟠 High 10
🟡 Medium 81

⚠️ Action Required: Critical security issues found!

View findings
[
  {
    "reason": "Action uses: dtolnay/rust-toolchain@4be9e76fd7c4901c61fb841f5599 needs attention",
    "type": "unpinned_action",
    "file": "e2e.yml",
    "action": "pin_sha",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Action es: Swatinem/rust-cache@779680da715d629ac1d338a641029a2f4372abb needs attention",
    "type": "unpinned_action",
    "file": "e2e.yml",
    "action": "pin_sha",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Action perpolymath/standards/.github/workflows/governance-reusable.yml@main\n needs attention",
    "type": "unpinned_action",
    "file": "governance.yml",
    "action": "pin_sha",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in boj-build.yml",
    "type": "missing_timeout_minutes",
    "file": "boj-build.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in cargo-audit.yml",
    "type": "missing_timeout_minutes",
    "file": "cargo-audit.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in casket-pages.yml",
    "type": "missing_timeout_minutes",
    "file": "casket-pages.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in casket-pages.yml",
    "type": "missing_timeout_minutes",
    "file": "casket-pages.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in chapel-ci.yml",
    "type": "missing_timeout_minutes",
    "file": "chapel-ci.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in chapel-ci.yml",
    "type": "missing_timeout_minutes",
    "file": "chapel-ci.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in chapel-ci.yml",
    "type": "missing_timeout_minutes",
    "file": "chapel-ci.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  }
]

Powered by Hypatia Neurosymbolic CI/CD Intelligence

Run #3 cold build completed successfully (~12 min, libchpllaunch.a +
modules + cmake module files all generated), then the post-build
sanity check failed with:

  Unrecognized flag: '--about' (use '-h' for help)

Chapel 2.8.0 dropped 'chpl --about'. Use the canonical
$CHPL_HOME/util/printchplenv --simple invocation instead — it prints
KEY=value lines (so the regex anchor changes from ':\s+' to '=').

Applied in both 'Build Chapel from source' and 'Activate multilocale
Chapel' steps.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

🔍 Hypatia Security Scan

Findings: 96 issues detected

Severity Count
🔴 Critical 5
🟠 High 10
🟡 Medium 81

⚠️ Action Required: Critical security issues found!

View findings
[
  {
    "reason": "Action uses: dtolnay/rust-toolchain@4be9e76fd7c4901c61fb841f5599 needs attention",
    "type": "unpinned_action",
    "file": "e2e.yml",
    "action": "pin_sha",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Action es: Swatinem/rust-cache@779680da715d629ac1d338a641029a2f4372abb needs attention",
    "type": "unpinned_action",
    "file": "e2e.yml",
    "action": "pin_sha",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Action perpolymath/standards/.github/workflows/governance-reusable.yml@main\n needs attention",
    "type": "unpinned_action",
    "file": "governance.yml",
    "action": "pin_sha",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in boj-build.yml",
    "type": "missing_timeout_minutes",
    "file": "boj-build.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in cargo-audit.yml",
    "type": "missing_timeout_minutes",
    "file": "cargo-audit.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in casket-pages.yml",
    "type": "missing_timeout_minutes",
    "file": "casket-pages.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in casket-pages.yml",
    "type": "missing_timeout_minutes",
    "file": "casket-pages.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in chapel-ci.yml",
    "type": "missing_timeout_minutes",
    "file": "chapel-ci.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in chapel-ci.yml",
    "type": "missing_timeout_minutes",
    "file": "chapel-ci.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in chapel-ci.yml",
    "type": "missing_timeout_minutes",
    "file": "chapel-ci.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  }
]

Powered by Hypatia Neurosymbolic CI/CD Intelligence

Run #4 progressed past Build (Chapel 2.8.0 + bundled LLVM support
compiled in ~12 min, libchpllaunch.a + modules + cmake module files
all generated). Failed in Activate when:

  "$CHPL_HOME/util/printchplenv" --simple | grep -E '^CHPL_COMM=gasnet$'

emitted zero output then exit 1 (presumably printchplenv --simple uses
a different output format than KEY=value, or it ran into a chplenv
caching issue).

Switch to a format-independent check: Chapel's runtime layout names
$CHPL_HOME/lib/<plat>/gnu/<arch>/loc-flat/comm-gasnet/smp/<tasks>/launch-smp/
per (comm, launcher, ...) variant. Existence of comm-gasnet/smp/.../launch-smp
proves the runtime was built with our CHPL_COMM+CHPL_LAUNCHER settings —
no chpl-flag-version dependence, no parsing format risk.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

🔍 Hypatia Security Scan

Findings: 96 issues detected

Severity Count
🔴 Critical 5
🟠 High 10
🟡 Medium 81

⚠️ Action Required: Critical security issues found!

View findings
[
  {
    "reason": "Action uses: dtolnay/rust-toolchain@4be9e76fd7c4901c61fb841f5599 needs attention",
    "type": "unpinned_action",
    "file": "e2e.yml",
    "action": "pin_sha",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Action es: Swatinem/rust-cache@779680da715d629ac1d338a641029a2f4372abb needs attention",
    "type": "unpinned_action",
    "file": "e2e.yml",
    "action": "pin_sha",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Action perpolymath/standards/.github/workflows/governance-reusable.yml@main\n needs attention",
    "type": "unpinned_action",
    "file": "governance.yml",
    "action": "pin_sha",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in boj-build.yml",
    "type": "missing_timeout_minutes",
    "file": "boj-build.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in cargo-audit.yml",
    "type": "missing_timeout_minutes",
    "file": "cargo-audit.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in casket-pages.yml",
    "type": "missing_timeout_minutes",
    "file": "casket-pages.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in casket-pages.yml",
    "type": "missing_timeout_minutes",
    "file": "casket-pages.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in chapel-ci.yml",
    "type": "missing_timeout_minutes",
    "file": "chapel-ci.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in chapel-ci.yml",
    "type": "missing_timeout_minutes",
    "file": "chapel-ci.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in chapel-ci.yml",
    "type": "missing_timeout_minutes",
    "file": "chapel-ci.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  }
]

Powered by Hypatia Neurosymbolic CI/CD Intelligence

Run #5: build COMPLETED (libchpllaunch.a + libchplmalloc.a + modules
all generated successfully), then the sanity glob missed the actual
runtime path:

  No such file or directory:
  /opt/chapel-multilocale/lib/*/gnu/*/loc-flat/comm-gasnet/smp/*/launch-smp

The real path includes an extra tasks-* component:
  lib/linux64/gnu/x86_64/loc-flat/comm-gasnet/smp/fast/tasks-qthreads/launch-smp/...

Glob was missing the tasks-* slot between perf-flavor and launch-smp.

Switch to find -name comm-gasnet -print -quit: hierarchy-independent,
survives minor-version path renames, succeeds iff the gasnet runtime
variant was built.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

🔍 Hypatia Security Scan

Findings: 96 issues detected

Severity Count
🔴 Critical 5
🟠 High 10
🟡 Medium 81

⚠️ Action Required: Critical security issues found!

View findings
[
  {
    "reason": "Action uses: dtolnay/rust-toolchain@4be9e76fd7c4901c61fb841f5599 needs attention",
    "type": "unpinned_action",
    "file": "e2e.yml",
    "action": "pin_sha",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Action es: Swatinem/rust-cache@779680da715d629ac1d338a641029a2f4372abb needs attention",
    "type": "unpinned_action",
    "file": "e2e.yml",
    "action": "pin_sha",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Action perpolymath/standards/.github/workflows/governance-reusable.yml@main\n needs attention",
    "type": "unpinned_action",
    "file": "governance.yml",
    "action": "pin_sha",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in boj-build.yml",
    "type": "missing_timeout_minutes",
    "file": "boj-build.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in cargo-audit.yml",
    "type": "missing_timeout_minutes",
    "file": "cargo-audit.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in casket-pages.yml",
    "type": "missing_timeout_minutes",
    "file": "casket-pages.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in casket-pages.yml",
    "type": "missing_timeout_minutes",
    "file": "casket-pages.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in chapel-ci.yml",
    "type": "missing_timeout_minutes",
    "file": "chapel-ci.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in chapel-ci.yml",
    "type": "missing_timeout_minutes",
    "file": "chapel-ci.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in chapel-ci.yml",
    "type": "missing_timeout_minutes",
    "file": "chapel-ci.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  }
]

Powered by Hypatia Neurosymbolic CI/CD Intelligence

@hyperpolymath hyperpolymath merged commit fa122c7 into main Jun 2, 2026
34 checks passed
@hyperpolymath hyperpolymath deleted the chapel/wave2-multilocale-from-source branch June 2, 2026 04:27
hyperpolymath added a commit that referenced this pull request Jun 2, 2026
…ump cache gen v1→v2

PR #100 surfaced a 5th Chapel-2.8.0 sharp edge from the #99 Wave 2 series:

  error: The runtime has not been built for this configuration.
  There is no runtime for 'CHPL_UNWIND=bundled'
  Valid options: system

Root cause: cached $CHPL_HOME from #99 was built when libunwind-dev was
installed (gated on cache-miss), so chpl auto-inferred CHPL_UNWIND=system.
On PR #100 the cache hit, libunwind-dev install was SKIPPED, so the
consumer chpl invocation auto-inferred CHPL_UNWIND=bundled — mismatch
against the cached runtime, mass-panic build aborts.

Three-part fix at source:
1. Pin CHPL_UNWIND=system explicitly in both Build and Activate steps
   (no more auto-inference, no more cache-hit/miss drift).
2. Promote libunwind-dev to always-run (split out of the cache-miss-gated
   Install step). Cheap to install (~1-2s apt) on every run; matches
   the cached configuration.
3. Bump CHAPEL_MULTILOCALE_CACHE_GEN v1→v2 to discard the inconsistent
   cache and force a fresh build with the explicit CHPL_UNWIND.

Also includes the docs(truthfulness) audit from this branch:
- README badge corrected: 402 → 782 runnable tests (cargo authoritative)
- ROADMAP fly.toml [x] → [~] (file doesn't exist in this repo)
- ROADMAP/Wiki '500+ repositories' → '303-repo estate (2026-04-12)'
- chapel/README ~5-15%× soften to UNMEASURED ESTIMATE
- README + Wiki note PA001/PA001b SARIF collapse

Resolves blocking chapel-multilocale failure on PR #100.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant