Skip to content

Fix pre-existing test failures: align docker PG versions to source pkgvars; retarget EOL build images#408

Closed
ihalatci wants to merge 1 commit into
developfrom
ihalatci/fix-preexisting-tools-tests
Closed

Fix pre-existing test failures: align docker PG versions to source pkgvars; retarget EOL build images#408
ihalatci wants to merge 1 commit into
developfrom
ihalatci/fix-preexisting-tools-tests

Conversation

@ihalatci

Copy link
Copy Markdown
Contributor

Summary

Fixes two pre-existing, token-independent failures in the packaging_automation test suite. This is INDEPENDENT of the GH_TOKEN→GitHub App migration happening elsewhere — no .github/workflows/* files and no GH_TOKEN/app-token code were touched. Scope is strictly the Python tests + the production code they exercise.

Root causes & fixes

Failure #1 — docker pkgvars schema drift (PG14/15 removed upstream)

test_update_docker.py::test_pkgvar_postgres_version_existence failed with KeyError: 'postgres_15_version'.

citusdata/docker's pkgvars is the source of truth and now ships only postgres_16/17/18 (16.10 / 17.6 / 18.1); postgres_14_version and postgres_15_version were removed upstream. This was not just a test bug — production update_docker.py was also out of sync:

  • read_postgres_versions() returned config['postgres_14_version'] / ['postgres_15_version'] → would KeyError at runtime against the current pkgvars.
  • update_all_docker_files() unpacked 5 values and called update_docker_file_for_postgres14(...) / ..._postgres15(...).

Changes (update_docker.py): aligned automation to PG16/17/18 — trimmed the SupportedDockerImages enum and the docker_templates / docker_outputs dicts; removed the now-dead update_docker_file_for_postgres14 / _postgres15 helpers and their invocations; updated read_postgres_versions() and update_all_docker_files() to 18/17/16. Confirmed via repo-wide grep that nothing else imports/calls the removed pieces.

Changes (test_update_docker.py): swapped imports to update_docker_file_for_postgres16/17/18; updated constants to POSTGRES_16/17/18_VERSION (16.10/17.6/18.1, read from the current docker pkgvars); retargeted/renamed the per-version tests and the test_pkgvar_postgres_version_existence assertions to 16/17/18.

Failure #2 — EOL / unavailable Docker build images

test_citus_package_utils.py::test_build_package_debian targeted debian-stretch (long EOL; archived apt repos make the build fail). Retargeted to debian-bookworm (confirmed-available citus/packaging-test:debian-bookworm-all), and updated test_sign_packages' sub_folder consistently so the suite stays coherent.

test_build_package_rpm targets almalinux-9-pg17citus/packaging-test:almalinux-9-pg17. That tag exists and builds successfully; the earlier failure was transient (Docker-unavailable), so it is kept unchanged.

Validation (local: WSL + Docker Desktop)

  • test_update_docker.py: 9 passed.
  • prospector: 0 messages; black --check: clean.
  • test_build_package_rpm (almalinux-9-pg17): 1 passed (~62s).
  • test_build_package_debian (debian-bookworm): deb build completes greendh_installdocs OK, .deb artifacts built for postgresql-15/16/17-citus, lintian clean, build exit 0 — when run on a native-Linux filesystem (verified via a permission-correct Docker volume populated with git archive, which preserves git's 0644/0755 modes).

Note for maintainers / CI

The test_build_package_debian pytest cannot pass when run directly from a Windows-hosted /mnt/c mount: DrvFs reports every file as 0777, so debhelper's "executable config files" feature treats debian/docs as an executable script and dh_installdocs fails. Git records debian/docs as 100644, so on CI's native Linux filesystem this passes — and the underlying build is proven green locally via the perms-correct volume. Please confirm test_build_package_debian (bookworm) via CI.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

…gvars; retarget EOL build images

Two pre-existing, token-independent failures in the packaging_automation test suite. Independent of the GH_TOKEN->GitHub App migration; no .github/workflows or token code touched.

1) Docker pkgvars schema drift (PG14/15 removed upstream). citusdata/docker's pkgvars (source of truth) now ships only postgres_16/17/18 (16.10/17.6/18.1); postgres_14/15 were removed. update_docker.py.read_postgres_versions() still read postgres_14/15_version (KeyError at runtime) and update_all_docker_files() unpacked 5 values and invoked the PG14/15 helpers. Aligned automation to PG16/17/18: trimmed the SupportedDockerImages enum and template/output dicts, removed the now-dead update_docker_file_for_postgres14/15 helpers and their invocations, and updated read_postgres_versions()/update_all_docker_files() to 18/17/16. test_update_docker.py retargeted to 16/17/18 (constants, imports, test bodies, pkgvar existence assertions).

2) EOL/unavailable build images. test_build_package_debian targeted debian-stretch (long EOL; archived apt repos make the build fail). Retargeted to debian-bookworm (confirmed-available citus/packaging-test:debian-bookworm-all) and updated test_sign_packages sub_folder consistently. test_build_package_rpm (almalinux-9-pg17) tag exists and builds; its earlier failure was transient, so it is kept unchanged.

Validated locally (WSL + Docker): test_update_docker.py 9 passed; prospector 0 messages; black clean; almalinux-9-pg17 rpm build passed; debian-bookworm deb build completed green (dh_installdocs OK, .deb artifacts produced, lintian clean) when run on a native-Linux filesystem.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ihalatci

Copy link
Copy Markdown
Contributor Author

CI note: rpm test_execution failures are pre-existing external infra, unrelated to this PR

All legs relevant to this PR are green:

  • All unit_test_execution legs pass (the test_update_docker.py fixes).
  • All Build package jobs pass, including debian/bookworm (~3m49s) — confirming the retarget from EOL debian-stretch works and that the local /mnt/c DrvFs 0777 quirk was a local-only artifact (CI's native Linux fs is fine).
  • All debian/ubuntu test_execution legs pass.

The only red is the rpm test_execution matrix (centos/8, el/9, ol/9 across PG 14/15/16), which fails at:

curl https://install.citusdata.com/community/rpm.sh | yum install citus...
Error: GPG check FAILED

This is an external GPG/mirror issue with the citusdata community rpm repo (install.citusdata.com), not a regression from this PR:

  • It also fails on the PG14 legs, which this PR does not touch (PG14 was dropped only from the docker-image automation, and these legs install citus from the community repo, not from anything this PR changed).
  • centos/8 is itself EOL.
  • Our own test_build_package_rpm (almalinux-9-pg17), which builds the rpm rather than installing from the community repo, passes both locally and in CI.

No code change is warranted here. This PR is ready for maintainer review/merge.

@ihalatci

Copy link
Copy Markdown
Contributor Author

Follow-up: definitive root cause of the rpm test_execution GPG failures

I reproduced the rpm test_execution failure locally (oraclelinux:9 container, same curl …/rpm.sh | bash && yum install citus… path the matrix uses) and nailed the exact cause. It is not a transient mirror glitch — the published citus community RPMs are currently unsigned while the generated repo config enforces gpgcheck=1.

Evidence

yum install output:

Package citus121_15-12.1.5.citus-1.el9.x86_64.rpm is not signed
Package hll_15-2.18.citus-1.el9.x86_64.rpm is not signed
Package topn_15-2.6.0.citus-1.el9.x86_64.rpm is not signed
Error: GPG check FAILED

rpm -K on packages pulled straight from the community repo (note: yumdownloader preserves embedded signatures, confirmed via the PGDG control below):

Package rpm -K Meaning
citus121_16-12.1.5.citus-1.el9 digests OK unsigned (no PGP sig)
citus132_16-13.2.2.citus-1.el9 digests OK unsigned (no PGP sig)
postgresql16-server (PGDG, control) digests SIGNATURES NOT OK signed (sig present)

So PGDG dependencies are signed and pass; the citus/hll/topn packages are unsigned and trip gpgcheck=1GPG check FAILED. This is why it fails identically across all rpm platforms (centos/8, el/9, ol/9) and all PG majors (14/15/16) — it is about package signing, not version or platform. Deb legs pass because of apt's different trust handling.

This means a version bump would NOT fix it13.2.2 is equally unsigned. The real fix lives in the citusdata/packaging signing pipeline / repo host (repos.citusdata.com / Cloudsmith), outside citusdata/tools, and is unrelated to this PR and to the GH_TOKEN→GitHub App migration.

Options for this repo

  • (A) Document only (recommended): leave the rpm legs red; the fix belongs in citusdata/packaging (sign the published RPMs). This comment serves as that documentation.
  • (B) Test-harness workaround: add --nogpgcheck to the yum install lines in the rpm test-images/*/Dockerfiles (el-7/8/9, ol-7/8/9, centos-7/8, almalinux-9). These are not workflow files, so it's within the PR's constraints. Trade-off: it makes the rpm legs green but stops the test from verifying citus package signatures, masking the upstream signing gap — a maintainer policy decision.
  • (C) Bump PROJECT_VERSION=12.1.5 in package-tests.yml: ruled out — doesn't fix signing, and edits a workflow file (out of scope).

Happy to implement (B) if maintainers want the rpm legs green and accept the reduced assertion; otherwise (A) stands and this PR is ready for review/merge on its own (all unit tests + all Build package jobs + deb/ubuntu legs green).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants