ci: cap parallel-ctest-containers concurrency to fix self-hosted NP/LR test failures#2
Conversation
The action launched one docker container per test ALL AT ONCE (Promise.all over every discovered test). For the nonparallelizable_tests / long_running_tests suites those are heavy multi-nodeos integration tests, so on a modest self-hosted runner they starve CPU/RAM/ports and fail en masse (even trivial tests like get_account_test fail — starvation, not real bugs; they pass on ENF's sized runners). Replace the unbounded fan-out with a bounded worker pool: at most N containers run concurrently (N = CTEST_CONTAINER_CONCURRENCY env, default 4). results[i] stays aligned with tests[i] so failure-log extraction is unchanged. Rebuilt dist/index.mjs via ncc; verified it loads + runs.
There was a problem hiding this comment.
Code Review
This pull request introduces bounded concurrency for running test containers in parallel, replacing the previous behavior of launching all containers at once. This prevents resource starvation on self-hosted runners by capping the concurrent executions to a default of 4 or a value specified by the CTEST_CONTAINER_CONCURRENCY environment variable. The review feedback suggests refactoring the unconventional for loop in the worker function to a more idiomatic while loop to improve code readability.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| for(let i = next_test++; i < tests.length; i = next_test++) { | ||
| const t = tests[i]; | ||
| // Clear any orphaned container of this name before reusing it (see note above). | ||
| child_process.spawnSync("docker", ["rm", "-f", t.name], {stdio:"ignore"}); | ||
| results[i] = await new Promise(resolve => { | ||
| child_process.spawn("docker", ["run", "--security-opt", "seccomp=unconfined", "-e", "GITHUB_ACTIONS=True", "--name", t.name, "--init", "baseimage", "bash", "-c", `cd build; ctest --output-on-failure -R '^${t.name}$' --timeout ${test_timeout}`], {stdio:"inherit"}).on('close', code => resolve(code)); | ||
| }); | ||
| } |
There was a problem hiding this comment.
The for loop with next_test++ in both the initialization and the increment expression is highly unconventional and can be difficult to read and reason about. Using a standard while loop is much more idiomatic and improves code readability.
while (next_test < tests.length) {
const i = next_test++;
const t = tests[i];
// Clear any orphaned container of this name before reusing it (see note above).
child_process.spawnSync("docker", ["rm", "-f", t.name], {stdio:"ignore"});
results[i] = await new Promise(resolve => {
child_process.spawn("docker", ["run", "--security-opt", "seccomp=unconfined", "-e", "GITHUB_ACTIONS=True", "--name", t.name, "--init", "baseimage", "bash", "-c", `cd build; ctest --output-on-failure -R '^${t.name}$' --timeout ${test_timeout}`], {stdio:"inherit"}).on('close', code => resolve(code));
});
}There was a problem hiding this comment.
Pull request overview
Caps the parallel-ctest-containers GitHub Action’s test-container execution fan-out by introducing a bounded worker pool, preventing self-hosted runners from being overwhelmed when running heavy NP/LR integration suites.
Changes:
- Replace unbounded
Promise.allcontainer launches with a bounded worker-pool runner. - Add
CTEST_CONTAINER_CONCURRENCY(default 4) to control maximum concurrent test containers.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const max_concurrency = Math.max(1, parseInt(process.env.CTEST_CONTAINER_CONCURRENCY, 10) || 4); | ||
| console.log(`Running ${tests.length} '${tests_label}' test(s), up to ${max_concurrency} container(s) at a time`); |
| results[i] = await new Promise(resolve => { | ||
| child_process.spawn("docker", ["run", "--security-opt", "seccomp=unconfined", "-e", "GITHUB_ACTIONS=True", "--name", t.name, "--init", "baseimage", "bash", "-c", `cd build; ctest --output-on-failure -R '^${t.name}$' --timeout ${test_timeout}`], {stdio:"inherit"}).on('close', code => resolve(code)); | ||
| }); |
- use a plain while loop instead of the unconventional for(;;) with next_test++ in both clauses (readability; Gemini) - parse CTEST_CONTAINER_CONCURRENCY with an explicit NaN check so a configured 0 clamps to 1 instead of silently falling back to 4 (Copilot) - add an 'error' handler to the docker spawn so a spawn failure resolves the worker as failed instead of hanging the pool (Copilot) Rebuilt dist/index.mjs; behaviour on the happy path is unchanged.
|
Thanks for the review — all three points addressed in 70aad08:
Happy-path behaviour is unchanged (the validated run had concurrency 4, NP/LR 12/12). Rebuilt |
Problem
parallel-ctest-containerslaunched a docker container for every discovered test at once (Promise.allover all of them). Thenonparallelizable_tests/long_running_testssuites are heavy multi-nodeosintegration tests, so on a self-hosted runner this starved CPU/RAM/ports and made them fail en masse — even trivial tests likeget_account_testfailed (starvation, not real bugs; they pass on ENF's sized runners and the code is the live-Jungle4-validated 1.2.x harvest).Fix
Replace the unbounded fan-out with a bounded worker pool: at most N test containers run concurrently (N =
CTEST_CONTAINER_CONCURRENCYenv, default 4).results[i]stays aligned withtests[i], so the failure-log extraction is unchanged. Rebuiltdist/index.mjsvia ncc; verified it loads + runs.Validation (dispatched run on this branch, 12c/32G self-hosted runner)
wasm_config_part1_unit_test_eos-vm-oc(Tests (asserton)) — a pre-existing, isolated OC unit-test issue consistent across every run, unrelated to this change (tracked separately).Trade-off: capping concurrency lengthens the NP/LR wall-clock (heavy suites run 4-at-a-time). Tunable per-tier via the env var without a rebuild if needed.