ci: make CI runnable on self-hosted runners (orphan-container fix, distutils, drop ubuntu20)#1
Conversation
Ubuntu 20.04 left standard support in April 2025 and is the tail of the legacy llvm-11 OC toolchain path that AntelopeIO#578 migrates away from. Remove it from the CI matrices to cut the heaviest redundant jobs on the self-hosted runner. The reproducible .deb still installs/runs on 20.04 via glibc forward-compat, so node operators on Focal are unaffected (best-effort, no longer gated by CI). - .cicd/platforms/ubuntu20.Dockerfile: removed (platforms auto-discover) - build.yaml: drop ubuntu20 from package matrix; drop ubuntu20 + ubuntu20repro from tests / np-tests / lr-tests matrices - release.yaml: drop ubuntu20 dev-deb from experimental-binaries image - performance_harness_run.yaml: drop ubuntu20 dispatch choice
…ners The action runs each test in a docker container with a fixed name (--name <testname>) and no --rm, and never removes them (it needs them for docker export on failure). On ENF's ephemeral runners the daemon is fresh per job so this is fine, but on a persistent self-hosted runner the daemon survives between jobs and the next job collides: 'docker: Error response from daemon: Conflict. The container name "/<test>" is already in use'. That wiped out every NP/LR Tests job on the eosrio self-hosted runner. Defensively 'docker rm -f' the base container and each test container name before reusing it, so a leftover from a prior/interrupted run can't block it. Patched both main.mjs (source) and the built dist/index.mjs bundle.
python3-distutils was removed in Python 3.12 / Ubuntu 24.04+, making antelope-spring-dev uninstallable there (apt: 'python3-distutils but it is not installable', exit 100) — which broke the ubuntu24/ubuntu26 package CI jobs. Nothing in spring or TestHarness imports distutils, so just drop it.
There was a problem hiding this comment.
Code Review
This pull request removes an unused Dockerfile, adds defensive cleanup of leftover Docker containers in the parallel-ctest-containers action to prevent naming conflicts on persistent runners, and drops the python3-distutils dependency from the Debian package configuration. The reviewer identified two important issues: first, dropping python3-distutils may break installations on older supported systems because scripts/postinst and scripts/prerm still import distutils (a migration to sysconfig is suggested); second, calling spawnSync sequentially inside a forEach loop to remove test containers blocks the event loop and should be batched into a single command before the loop.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| # NOTE: python3-distutils was dropped here — it was removed from Python 3.12 / Ubuntu 24.04+ | ||
| # (so the dev package became uninstallable there) and nothing in spring or TestHarness uses it. | ||
| set(CPACK_DEBIAN_DEV_PACKAGE_DEPENDS "libgmp-dev, python3-numpy, zlib1g-dev") |
There was a problem hiding this comment.
While removing python3-distutils from the package dependencies solves the installation issue on Ubuntu 24.04+, scripts/postinst and scripts/prerm still import distutils when the Python version is less than 3.10.0:
if sys.version_info < (3, 10, 0):
from distutils import sysconfig
...If this .deb package is installed on an older system (such as Ubuntu 20.04, which you noted is still supported via glibc forward-compat) where python3-distutils is not installed by default, the installation or removal will fail with ModuleNotFoundError: No module named 'distutils'.
To safely drop the python3-distutils dependency, you should update scripts/postinst and scripts/prerm to use the standard sysconfig module (available since Python 3.2) instead of distutils:
import sysconfig
try:
print(sysconfig.get_path('platlib', 'deb_system'))
except KeyError:
print(sysconfig.get_path('platlib'))This is compatible with all Python 3 versions and avoids any dependency on distutils.
|
|
||
| let subprocesses = []; | ||
| tests.forEach(t => { | ||
| // Clear any orphaned container of this name before reusing it (see note above). | ||
| child_process.spawnSync("docker", ["rm", "-f", t.name], {stdio:"ignore"}); |
There was a problem hiding this comment.
Spawning a new process synchronously inside a forEach loop blocks the Node.js event loop. If there are many tests, executing docker rm -f sequentially for each test will introduce significant synchronous overhead and delay the start of all test containers.
Since docker rm supports removing multiple containers in a single command, we can batch this operation into a single spawnSync call before the loop. This is much more efficient.
// Clear any orphaned containers of these names before reusing them (see note above).
const test_names = tests.map(t => t.name);
if (test_names.length > 0) {
child_process.spawnSync("docker", ["rm", "-f", ...test_names], {stdio:"ignore"});
}
let subprocesses = [];
tests.forEach(t => {There was a problem hiding this comment.
Pull request overview
This PR updates CI and packaging to run cleanly on persistent self-hosted runners by making the parallel container test action resilient to orphaned containers, removing a deprecated Python dependency from the dev package, and dropping Ubuntu 20.04 as a CI target.
Changes:
- Make
parallel-ctest-containersidempotent by force-removing leftoverbaseand per-test containers before reuse (source + builtdist). - Remove
python3-distutilsfrom the Debian dev package dependency list to restore installability on Ubuntu 24.04+. - Remove Ubuntu 20.04 from CI matrices, workflows, and delete the Ubuntu 20.04 platform Dockerfile.
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
package.cmake |
Drops python3-distutils from antelope-spring-dev deps to support newer Ubuntu/Python. |
.github/workflows/release.yaml |
Stops downloading the Ubuntu 20.04 dev package artifact for releases. |
.github/workflows/performance_harness_run.yaml |
Removes ubuntu20 as a selectable perf-harness platform. |
.github/workflows/build.yaml |
Removes ubuntu20 from build/package/test matrices so CI no longer targets 20.04. |
.github/actions/parallel-ctest-containers/main.mjs |
Pre-cleans orphan containers (docker rm -f) to avoid name collisions on persistent daemons. |
.github/actions/parallel-ctest-containers/dist/index.mjs |
Regenerated bundled action output reflecting the same orphan-container cleanup. |
.cicd/platforms/ubuntu20.Dockerfile |
Removes the Ubuntu 20.04 CI platform image definition. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 26993e6fde
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let subprocesses = []; | ||
| tests.forEach(t => { | ||
| // Clear any orphaned container of this name before reusing it (see note above). | ||
| child_process.spawnSync("docker", ["rm", "-f", t.name], {stdio:"ignore"}); |
There was a problem hiding this comment.
Avoid force-removing another job's test container
When two matrix jobs using this action land on runners that share the same Docker daemon, the fixed ctest-derived container names overlap across platforms, and docker rm -f force-removes a running container (Docker documents --force as using SIGKILL). In that scenario a later NP/LR job can kill an in-progress test container from an earlier job instead of just clearing an orphan, so the self-hosted workflow can become flaky or lose logs; use job-unique container names/labels and only clean containers owned by the current run.
Useful? React with 👍 / 👎.
| set(CPACK_DEBIAN_DEV_PACKAGE_DEPENDS "libgmp-dev, python3-distutils, python3-numpy, zlib1g-dev") | ||
| # NOTE: python3-distutils was dropped here — it was removed from Python 3.12 / Ubuntu 24.04+ | ||
| # (so the dev package became uninstallable there) and nothing in spring or TestHarness uses it. | ||
| set(CPACK_DEBIAN_DEV_PACKAGE_DEPENDS "libgmp-dev, python3-numpy, zlib1g-dev") |
There was a problem hiding this comment.
Keep distutils available for maintainer scripts
The dev package still installs scripts/postinst/scripts/prerm, and those scripts import distutils.sysconfig whenever the target machine's Python is older than 3.10. Removing python3-distutils from the package dependencies means installs or removals on Python 3.8/3.9 Debian/Ubuntu systems can fail in the maintainer script if that module is not already present; either update the scripts to avoid distutils on those versions or keep a conditional dependency for those packages.
Useful? React with 👍 / 👎.
This workflow runs on self-hosted runners. Auto-running PR code there (and especially fork PRs) is a security risk (arbitrary code on our infra + shared Docker daemon) and burns the single serial runner for hours. Make CI on a PR a deliberate human action: maintainers dispatch it via the Run workflow button or 'gh workflow run build.yaml --ref <branch>'. push on main/release still runs.
First run of the Build & Test workflow on the eosrio self-hosted runner surfaced three issues. This bundles the fixes; once green it proves CI runs end-to-end off ENF infra.
Fixes
parallel-ctest-containers: idempotent against orphaned containers. The action runs each test indocker run --name <test>with no--rmand never removes them. On a persistent self-hosted runner the daemon survives between jobs, so the next job collides (container name "/<test>" already in use) — this wiped out all 16 NP/LR Tests jobs. Nowdocker rm -fthe base + each test container name before reuse. Patchedmain.mjs(source) and the builtdist/index.mjs.python3-distutilsfrom the dev package deps. Removed in Python 3.12 / Ubuntu 24.04+, soantelope-spring-devwas uninstallable there (apt exit 100) — broke the ubuntu24/26 package jobs. Nothing in spring/TestHarness uses it..debstill runs on 20.04 via glibc forward-compat.Context
First run (pre-fix): 50 jobs, 28 ✅ / 21 ❌. Of the 21: 16 = the orphan-container infra bug, 2 = distutils, 2 = real/uncertain tests (
wasm_config_part1_unit_test_eos-vm-ocon asserton; an ubuntu26 timeout), 1 = the aggregate gate. All 8 builds passed clean on the 12c/32G runner.A matching job-started hook on the runner (chown
_work+ prune stopped containers) handles the operational side; this PR is the durable in-repo half.