Skip to content

Fix pre-existing tools test failures: RPM signing no-op, docker PG-version drift, EOL images, and PG16/17/18 package-test retarget#409

Open
ihalatci wants to merge 6 commits into
developfrom
ihalatci/sign-rpm-packages
Open

Fix pre-existing tools test failures: RPM signing no-op, docker PG-version drift, EOL images, and PG16/17/18 package-test retarget#409
ihalatci wants to merge 6 commits into
developfrom
ihalatci/sign-rpm-packages

Conversation

@ihalatci

@ihalatci ihalatci commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR consolidates three independent pre-existing fixes into a single branch so CI exercises them together. It supersedes / folds in #408 (whose commit is cherry-picked here). All of this is independent of the GH_TOKEN → GitHub App migration — no app-token/secret logic is touched.

Five commits (on top of develop 9322601):

  1. 4c862ceRPM signing was silently skipped (path-doubling) + the verifier was blind.
  2. df93bb1test_sign_packages now signs the real built rpm (in rpm_build) instead of an empty folder.
  3. 8ac87b4Retarget citus-package-tests to PG16/17/18 (citus 14.1.0) + bump test-image HLL/TOPN.
  4. 4695e3aDrop EOL centos/8; add published ubuntu/noble to the package-test matrix.
  5. a6d04f8(Fix pre-existing test failures: align docker PG versions to source pkgvars; retarget EOL build images #408) align update_docker.py to the docker pkgvars source-of-truth (PG14/15 → 16/17/18) + retarget EOL debian-stretch build image → debian-bookworm.

Root causes

A. RPM signing silently skipped (citus_package.py) + blind verifier (common_tool_methods.py)

build_packages() pre-mutates output_dir to {base}/{sub_folder}, but sign_packages() appended {sub_folder} again → globbed {base}/{sub_folder}/{sub_folder}0 files → signing skipped with no error. The verifier is_rpm_file_signed() used rpm -K (returncode), which returns 0 for unsigned rpms too, so the regression was invisible — published community RPMs have been unsigned since this no-op landed (#175, 2021), surfacing in CI only once packagecloud turned on per-package gpgcheck enforcement (a new signing subkey 723C4197ED5519E8 was provisioned 2026-02-12).

Fix: restore the base output_dir before sign_packages() (call-site fix, leaves the sub_folder-append contract intact so test_sign_packages stays valid); make the empty-package case loud; replace the verifier with a real 3-tag PGP check (%{SIGPGP}|%{SIGGPG}|%{RSAHEADER} — unsigned iff all three (none), back-compatible with legacy V3 sigs and future RSAHEADER-only sigs); add a regression assertion that produced rpms must be signed.

B. docker pkgvars schema drift (#408)

citusdata/docker pkgvars ships only postgres_16/17/18 now (14/15 removed). read_postgres_versions() still returned 5 keys → KeyError at runtime, and test_pkgvar_postgres_version_existence asserted 14/15. Fix: align both production code and tests to PG16/17/18.

C. EOL / unavailable build & package-test images

  • test_build_package_debian used debian-stretch (long EOL) → image gone. Retargeted to debian-bookworm (Fix pre-existing test failures: align docker PG versions to source pkgvars; retarget EOL build images #408).
  • citus-package-tests matrix tested PG14/15/16 via the stale PROJECT_VERSION=12.1.5 default and pinned HLL 2.18 / TOPN 2.6.0. Retargeted the default to 14.1.0 → PG16/17/18, and bumped the test-image HLL_VERSION/TOPN_VERSION ARGs to 2.19.citus-1 / 2.7.0.citus-1 (the versions actually published for the 14.1.0 era; pg17/18 don't ship 2.18/2.6.0).
  • Dropped centos/8 (no longer in citusdata/packaging's published matrix; base image is EOL → vault) and added ubuntu/noble (published, was missing).

What changed

  • .github/workflows/package-tests.ymlPROJECT_VERSION default 12.1.5 → 14.1.0 (metadata + test_execution); matrix: - centos/8 removed, - ubuntu/noble added.
  • packaging_automation/citus_package.py — restore base output_dir before signing; loud-warn on 0 packages.
  • packaging_automation/common_tool_methods.py — real 3-tag is_rpm_file_signed.
  • packaging_automation/tests/test_citus_package.py — regression assert: built rpms must be signed.
  • packaging_automation/tests/test_citus_package_utils.pydebian-stretch → debian-bookworm; rpm sign target centos-8 → rpm_build.
  • packaging_automation/tests/test_update_docker.py, packaging_automation/update_docker.py — PG14/15 → 16/17/18 alignment.
  • test-images/{almalinux-9,ol-9,debian-bullseye,debian-bookworm,ubuntu-jammy,ubuntu-noble,centos-8}/Dockerfile — HLL 2.18 → 2.19, TOPN 2.6.0 → 2.7.0.

Validated locally (WSL + Docker Desktop)

  • test_update_docker.py9 passed (Failure Remove socket directory permissions fix #1 fixed).
  • Full tests collect cleanly (76 tests; only test_publish_docker.py errors, due to no docker-socket access for the python SDK in this env — out of scope, unchanged).
  • black --check and prospector clean on all touched files (the 2 reported findings — pypi_stats_collector.py black drift, upload_to_package_cloud.py too-many-args — are pre-existing on develop in files this PR does not touch).
  • Real Docker installs against the actual test-image Dockerfiles: debian/bookworm pg17 & pg18, ubuntu/noble pg16/17/18 all install citus 14.1.0 + hll 2.19 + topn 2.7.0 cleanly → deb/ubuntu install legs green.
  • RPM signing verifier proven by negative control: signed → True, --delsign unsigned → False, re-signed (RSAHEADER) → True.

Known-red, by design (not a merge gate)

The rpm test_execution install legs (el/9, ol/9 × PG16/17/18) stay RED: the already-published community RPMs are unsigned per-package while the repo enforces gpgcheck=1. This PR fixes go-forward signing in the pipeline; turning those legs green additionally requires a one-time packaging-side re-sign/re-upload backfill of the published RPMs — a separate track, not fixable in this repo. Deb legs authenticate via signed repo metadata and are green.

Note

This folds in #408 (cherry-picked as a6d04f8); #408 can be closed in favor of this PR.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

ihalatci-msft and others added 5 commits June 17, 2026 21:27
…al verifier

build_packages mutates input_output_parameters.output_dir to
"{base}/{sub_folder}" for the per-postgres build loop, then passed that
mutated object straight into sign_packages, which re-appends the sub_folder.
The resulting "{base}/{sub_folder}/{sub_folder}" path matches no built
packages, so sign_packages silently signs nothing and rpms are published
unsigned (latent since the 2021 InputOutputParameters refactor, #175).

This went undetected because is_rpm_file_signed used `rpm -K`, which returns
success ("digests OK") for unsigned packages too, so verify_rpm_signature_in_dir
passed on unsigned rpms.

Fixes:
- citus_package.build_packages: save the base output_dir, mutate only for the
  build loop, and restore it before calling sign_packages so the sign path
  matches where packages were built. sign_packages' own append (untouched)
  keeps the direct test_sign_packages caller correct.
- citus_package.sign_packages: warn loudly when no .rpm/.deb files are found to
  sign, so an empty glob can never again pass unnoticed.
- common_tool_methods.is_rpm_file_signed: query the signature header tags
  directly (SIGPGP/SIGGPG/RSAHEADER) and treat the package as unsigned only when
  all render "(none)". Covers legacy V3 (rpm 4.11/centos:7) and header-only
  (rpm >= 4.16) signatures.
- test_citus_package.test_build_packages: assert every produced rpm is actually
  signed, guarding against the path-doubling regression.

Independent of the GH_TOKEN -> GitHub App migration; no workflow files touched.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The real PGP verifier added in this PR exposed that test_sign_packages
signed sub_folder 'centos-8', where nothing is ever built, so the rpm
signing leg never signed anything. verify_rpm_signature_in_dir then
walked the whole packages/ tree and caught the unsigned rpm left in
rpm_build/ by test_build_package_rpm, failing the test.

Point the rpm signing leg at 'rpm_build' (where test_build_package_rpm
actually writes the el9 rpm) so the test genuinely signs and verifies a
real rpm end-to-end, making it a meaningful regression guard.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The package-tests.yml install legs defaulted PROJECT_VERSION=12.1.5,
which resolves to PostgreSQL 14/15/16 via postgres-matrix.yml. Bump the
default to 14.1.0 so the matrix resolves to the current PG16/17/18 set.

The runner (test_citus_package.py) passes only CITUS_VERSION, PG_MAJOR
and CITUS_MAJOR_VERSION to the test-images Dockerfiles, so the hardcoded
HLL_VERSION/TOPN_VERSION ARG defaults are used. The 12.1.5-era defaults
(hll 2.18.citus-1 / topn 2.6.0.citus-1) are not published for PG17/18 in
the 14.1.0 era; bump the six matrix Dockerfiles to the published
hll 2.19.citus-1 / topn 2.7.0.citus-1.

Verified against the live community repo (repoquery) that
citus141_{16,17,18}=14.1.0, hll 2.19 and topn 2.7.0 exist for el8, el9,
ol9, bullseye, bookworm and jammy. Validated the deb legs end-to-end in
Docker (bookworm pg17 and pg18) against the real test-images Dockerfile:
citus 14.1.0 + hll 2.19 + topn 2.7.0 install cleanly with no GPG error.
RPM legs remain blocked on the separate unsigned-published-RPM issue.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
centos/8 is no longer in citusdata/packaging's published platform matrix
(build-package.yml all-citus); its base image is EOL (centos:8 -> vault).
ubuntu/noble is published and missing from the test matrix.

Verified via real Docker installs (WSL) that ubuntu/noble installs
citus 14.1.0 + hll 2.19 + topn 2.7.0 cleanly for pg16/17/18.
Bumps the ubuntu-noble image HLL/TOPN ARGs to the 14.1.0-era versions
(2.19.citus-1 / 2.7.0.citus-1) to match the retarget.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…gvars; retarget EOL build images

Two pre-existing, token-independent failures in the packaging_automation test suite. Independent of the GH_TOKEN->GitHub App migration; no .github/workflows or token code touched.

1) Docker pkgvars schema drift (PG14/15 removed upstream). citusdata/docker's pkgvars (source of truth) now ships only postgres_16/17/18 (16.10/17.6/18.1); postgres_14/15 were removed. update_docker.py.read_postgres_versions() still read postgres_14/15_version (KeyError at runtime) and update_all_docker_files() unpacked 5 values and invoked the PG14/15 helpers. Aligned automation to PG16/17/18: trimmed the SupportedDockerImages enum and template/output dicts, removed the now-dead update_docker_file_for_postgres14/15 helpers and their invocations, and updated read_postgres_versions()/update_all_docker_files() to 18/17/16. test_update_docker.py retargeted to 16/17/18 (constants, imports, test bodies, pkgvar existence assertions).

2) EOL/unavailable build images. test_build_package_debian targeted debian-stretch (long EOL; archived apt repos make the build fail). Retargeted to debian-bookworm (confirmed-available citus/packaging-test:debian-bookworm-all) and updated test_sign_packages sub_folder consistently. test_build_package_rpm (almalinux-9-pg17) tag exists and builds; its earlier failure was transient, so it is kept unchanged.

Validated locally (WSL + Docker): test_update_docker.py 9 passed; prospector 0 messages; black clean; almalinux-9-pg17 rpm build passed; debian-bookworm deb build completed green (dh_installdocs OK, .deb artifacts produced, lintian clean) when run on a native-Linux filesystem.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ihalatci ihalatci changed the title Fix silently-skipped RPM signing: restore output_dir before sign + real signature verifier Fix pre-existing tools test failures: RPM signing no-op, docker PG-version drift, EOL images, and PG16/17/18 package-test retarget Jun 18, 2026
test_build_packages generates a throwaway GPG key from
tests/files/gpg/packaging_with_passphrase.gpg, which bakes
`Passphrase: Citus123`. The test then exported + signed using
TEST_GPG_KEY_PASSPHRASE = os.getenv("PACKAGING_PASSPHRASE"), i.e. the
PRODUCTION org-secret passphrase. These only matched while the prod
passphrase happened to equal "Citus123". When the maintainer rotated
PACKAGING_PASSPHRASE for the new signing key (59EDA3DCFD440E45), the
prod passphrase no longer matched the test key, so
gpg.export_keys(secret=True, passphrase=...) returned empty and the
test raised "packaging key is stored with passphrase...".

Pin TEST_GPG_KEY_PASSPHRASE to the literal "Citus123" so the unit test
is self-contained and immune to production signing-key/passphrase
rotations. This matches the convention already used in
test_citus_package_utils.py. The prod sign path is unaffected (it reads
PACKAGING_PASSPHRASE directly via citus_package.py).

Verified: gpg round-trip shows export with "Citus123" succeeds (4059
bytes) and with a different passphrase returns empty; black --check and
prospector both clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants