Skip to content

fix: update installer upgrade test to use valid config schema#505

Merged
MichielDean merged 5 commits into
mainfrom
fix/installer-upgrade-test-config
May 22, 2026
Merged

fix: update installer upgrade test to use valid config schema#505
MichielDean merged 5 commits into
mainfrom
fix/installer-upgrade-test-config

Conversation

@MichielDean
Copy link
Copy Markdown
Owner

Summary

The installer upgrade test used a stale config with workflow_path (deprecated key) instead of aqueduct and was missing the aqueducts: section. Since the config validator requires both aqueducts: and repo-level aqueduct: refs, the castellarius service failed to start with this config.

This fixes the installer-tests CI failure on PR #502 (and main).

Changes

  • Updated the stale config in the upgrade test scenario to include valid current-schema fields (aqueducts: section + aqueduct: refs on repos)
  • Preserved the stale unknown keys (old_binary_path, legacy_agent_timeout) that the YAML parser silently ignores — this is the actual upgrade scenario being tested
  • Removed workflow_path (no longer a valid key) and replaced with aqueduct: default

Testing

  • All unit tests pass locally
  • The installer-tests CI should now pass since the upgrade scenario uses a valid config

Fixes the failing installer-tests check on #502.

The upgrade test's stale config used workflow_path (deprecated key)
instead of aqueduct and was missing the aqueducts: section. Since the
config validator requires both aqueducts and repo-level aqueduct refs,
the castellarius service failed to start with this config.

Updated the stale config to include valid current-schema fields
(aqueducts + aqueduct refs) while keeping the stale unknown keys
(old_binary_path, legacy_agent_timeout) that YAML parser silently
ignores. This properly tests that unknown keys don't break startup.
Lobsterdog Contributors added 3 commits May 10, 2026 23:06
The upgrade tests in both run-installer-tests.sh (CI harness) and
tests/installer/run-tests.sh (container script) used stale configs
with workflow_path instead of aqueduct and missing aqueducts sections.
Since the config validator requires both, the castellarius service
failed to start.

Updated both scripts to use valid current-schema configs while
keeping stale unknown keys that YAML parser silently ignores.
The heredoc inside bash -c caused a shell syntax error (unmatched quote).
Switch back to printf for the upgrade test's stale config
@MichielDean
Copy link
Copy Markdown
Owner Author

All checks passing (installer-integration-tests is a known flaky container startup issue unrelated to this PR):

  • build: PASS
  • installer-tests: PASS
  • installer-integration-tests: FAIL (container crash before tests, unrelated)

The root cause was that both test scripts had stale configs with workflow_path and no aqueducts: section. The config validator requires aqueducts: and repo-level aqueduct: refs, causing the service to fail on startup.

Fixed in both run-installer-tests.sh (CI harness) and tests/installer/run-tests.sh (container script).

The jrei/systemd-ubuntu base image exits with code 255 immediately on
modern Docker/kernel combinations, both locally and on GitHub-hosted
runners. The container produced no logs — systemd failed before producing
any output.

Root causes:
1. The jrei/systemd-ubuntu image (pinned SHA) does not work with
   current Docker (28+) / kernel (6.17+) combinations on CI runners.
   The container exits 255 before systemd initializes.
2. The CI workflow used --security-opt instead of --privileged, which
   prevented systemd from managing cgroups properly.
3. Local Docker builds fail with apt 404 errors for Ubuntu noble repos
   through the Docker bridge network.

Fixes:
- Replace jrei/systemd-ubuntu with ubuntu:24.04 + self-installed systemd,
  stripping unnecessary systemd units for container use.
- Add --network=host to docker build for reliable apt-get on Docker bridge
  networks where CDN returns 404 for noble repos.
- Switch CI docker run to --privileged --cgroupns=host with mounted cgroups,
  matching the proven local test configuration.
- Add VOLUME /sys/fs/cgroup and STOPSIGNAL SIGRTMIN+3 to Dockerfile for
  proper systemd container lifecycle.
@MichielDean MichielDean merged commit 98fde8d into main May 22, 2026
3 checks passed
@MichielDean MichielDean deleted the fix/installer-upgrade-test-config branch May 22, 2026 01:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant