Consume Docker Hub upstream SBOMs and merge with Syft#225
Open
vpetersson wants to merge 2 commits into
Open
Conversation
Docker Official Images (library/*) and Docker Hardened Images (dhi.io/*)
ship publisher-signed SBOMs. Unlike Chainguard (where we bypass local
scanning), Docker Hub images are routinely extended by users, so this
fetches upstream's authoritative SBOM and *merges* it with a Syft scan
of the same image: upstream wins for base-layer packages, Syft fills
gaps and overlays anything the Dockerfile added on top.
Detection mirrors the Chainguard flow:
- Direct: DOCKER_IMAGE is itself an Official Image or DHI.
- Provenance: DOCKER_IMAGE carries BuildKit SLSA provenance whose
resolvedDependencies name a Docker Hub base.
Merge policy (upstream wins; Syft fills empty upstream fields):
- Strict dedup: (type, namespace, name, version), qualifiers ignored.
Handles different qualifier conventions (Docker's
os_distro= vs Syft's distro=).
- Loose dedup: (type, name, version) fallback. Handles namespace
disagreement such as pkg:rpm/amazonlinux vs
pkg:rpm/amzn on the same Amazon Linux image.
- Component tagging: sbomify:source = docker-hub-upstream | syft-overlay.
Ecosystems verified end-to-end: pkg:deb (Debian, Ubuntu), pkg:apk
(Alpine), pkg:rpm (Rocky, AlmaLinux, Fedora, Amazon Linux), pkg:alpm
(Arch), pkg:generic (BusyBox). Both CycloneDX and SPDX output; both
direct and provenance detection paths. Full pipeline verified with
--augment (local sbomify.json) and --enrich.
Factored the crane/cosign attestation-walking plumbing out of
chainguard.py into a new buildkit_provenance module shared by both
detectors. Added a platform-aware variant that picks the right
attestation sibling on multi-arch indexes (the original code walked
the wrong manifest when given a per-platform digest and silently
missed Docker Hub's SBOMs).
Failure modes surface clearly: rate-limit (Docker Hub anonymous pulls
are capped at 100/6h), 401, and 404 from crane or cosign are classified
and logged at WARNING with concrete remediation. Any fetch/merge error
falls back to the existing plain Syft scan path.
New files:
sbomify_action/_generation/buildkit_provenance.py
sbomify_action/_generation/dockerhub.py
sbomify_action/_generation/sbom_merge.py
tests/test_buildkit_provenance.py
tests/test_dockerhub.py
tests/test_sbom_merge.py
examples/docker-hub/ (runnable E2E: 7 scenarios across distros and formats)
README: new "Docker Hub Images" section, updated tools table, and
"Docker Hub SBOM reuse" entry in the top-level feature list.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds support for consuming Docker Hub–published upstream SPDX SBOMs (Docker Official Images and DHI), then merging them with a local Syft scan so base-layer packages prefer publisher metadata while user-installed packages are overlaid.
Changes:
- Introduces shared BuildKit attestation/provenance helpers (crane/cosign + in-toto walking) and rewires Chainguard detection to use them.
- Implements Docker Hub image detection (direct + provenance) and upstream SBOM retrieval (crane for Official Images, cosign for DHI).
- Adds CycloneDX + SPDX merge logic (upstream-wins with Syft gap-filling) and integrates it into the CLI pipeline, with extensive unit + E2E examples.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
sbomify_action/cli/main.py |
Integrates Docker Hub detection/fetch + Syft overlay merge into the main generation pipeline. |
sbomify_action/_generation/buildkit_provenance.py |
New shared primitives for resolving platform digests and fetching BuildKit/cosign attestations. |
sbomify_action/_generation/dockerhub.py |
New Docker Hub Official/DHI detection and upstream SPDX SBOM fetch. |
sbomify_action/_generation/sbom_merge.py |
New upstream-wins merge logic for CycloneDX and SPDX documents. |
sbomify_action/_generation/chainguard.py |
Refactors Chainguard detector to use shared BuildKit provenance helpers. |
tests/test_buildkit_provenance.py |
Unit tests for shared BuildKit/crane/cosign helpers and registry error classification. |
tests/test_dockerhub.py |
Unit tests for Docker Hub detection paths and upstream SBOM retrieval behavior. |
tests/test_sbom_merge.py |
Unit tests for merge semantics (dedup, fill-empty, collision rewrites, relationship/licensing carry-over). |
tests/test_chainguard.py |
Updates mocks/patches to point at the extracted shared helper module. |
examples/docker-hub/run-e2e.sh |
Adds an end-to-end runner script covering official/provenance/DHI scenarios. |
examples/docker-hub/README.md |
Documents the E2E scenarios and how to run them locally. |
examples/docker-hub/Dockerfile.official |
Example derivative image to exercise provenance-based detection + overlay behavior. |
examples/docker-hub/Dockerfile.dhi |
Example derivative image for DHI scenarios (best-effort). |
examples/docker-hub/hello.py |
Minimal app used by the Dockerfiles/E2E runner. |
README.md |
Documents Docker Hub SBOM reuse/merge feature and updates required-tools notes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Security DHI SBOM fetch now uses `cosign verify-attestation --type spdxjson` instead of `cosign download attestation`. The download-only command fetched envelopes without checking Docker's signature, so a tampered attestation would have been silently consumed. verify-attestation enforces the signature against the --key keyring and still short- circuits the Rekor check (DHI isn't Rekor-logged). Chainguard's pre-existing download-only behavior is unchanged — separate scope. Correctness SPDX merge now deep-copies the upstream doc before mutation. The shallow `dict(upstream_spdx)` copy worked only because upstream was never reused; a future cache layer would have silently leaked nested-list mutations. External-references dedup keys on (type, url) instead of url alone, so CycloneDX refs with the same URL but different type (e.g., vcs vs website) no longer collapse. Error surfacing (Copilot) `_classify_registry_error` is shared by Chainguard and Docker Hub paths, so the 429/401 hints are now registry-agnostic, pointing at `docker login <registry>` with the placeholder left for the user to fill in. A new branch recognises crane's "No matching credentials" message and steers users toward `docker login dhi.io` — the common cause of DHI failures after a plain `docker login`. Docs (Copilot) README clarified: `sbomify:source` property is emitted on CycloneDX components only; SPDX output is merged but not per-package source- annotated. Types / polish DockerHubBaseImage.tier and related helpers typed as Literal["official", "dhi"] instead of plain str. convert_spdx_to_cyclonedx docstring notes its generic SPDX 2.x → CDX contract and that it's reused by the Docker Hub merge path. Tests +4 new tests covering the registry-agnostic classifier, the "No matching credentials" → dhi.io hint, externalReferences (type, url) dedup, and a regression guard that pins the DHI cosign call to `verify-attestation` (not `download`). Suite now 2261 passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
library/*) and Docker Hardened Images (dhi.io/*) — either directly viaDOCKER_IMAGEor via BuildKit SLSA provenance on a user-built image — and fetch the publisher's SPDX SBOM.apt,pip,COPY, …). Components are tagged with asbomify:sourceproperty (docker-hub-upstreamorsyft-overlay) for auditability.chainguard.pyinto a sharedbuildkit_provenancemodule so both detectors use the same primitives. Fixes a latent bug along the way: multi-arch indexes have oneattestation-manifestsibling per platform, and the previous code sometimes pulled the wrong one (or nothing) when given a per-platform digest.docker login,docker login dhi.io, etc.) instead of silently falling back.Full pipeline verified end-to-end with
--augment(localsbomify.json) and--enrich: the merge output passes through both stages unmodified in shape, source tags preserved.Dedup keys on the PURL's core identity, ignoring qualifiers (Docker's
os_distro=trixie&os_name=debianvs Syft'sarch=amd64&distro=debian-13both collapse to the same package). A second-pass loose match (type + name + version, namespace ignored) catches the cases where the two generators disagree on namespace — for example, Amazon Linux's upstream SBOM emitspkg:rpm/amazonlinux/bashwhile Syft emitspkg:rpm/amzn/bash. Safe because a merge always describes a single image, so the distro is fixed.Test plan
Unit coverage — 79 new tests, full suite 2257 passing, ruff clean:
tests/test_buildkit_provenance.py— crane/cosign helpers, rate-limit/401/404 classifiertests/test_dockerhub.py— direct + provenance detection forlibrary/*,dhi.io/*, ref classification, multi-arch sibling matchingtests/test_sbom_merge.py— strict dedup, qualifier-ignore fallback, namespace-mismatch fallback, relationship/extracted-license carry-overtests/test_chainguard.py— existing tests rewired to the shared module; no behavioral changeRunnable E2E (
examples/docker-hub/run-e2e.sh— 7 scenarios):python:3.11-slimdirect, CycloneDX — 142 upstream + 2721 overlaypython:3.11-slimdirect, SPDX — 157 packages, 72 extracted licenses, 0 validation errorsalpine:3.20,ubuntu:24.04,rockylinux:9,amazonlinux:2,archlinux:latest,busybox:latest— all detected, merged, deduped acrosspkg:deb/pkg:apk/pkg:rpm/pkg:alpm/pkg:genericFROM python:3.11-slim+apt/pip, CycloneDX — 142 upstream + 2852 overlay,requests/click/curl/jqall present assyft-overlay--augment(localsbomify.json) +--enrich— supplier, authors, lifecycle applied on top of merged SBOMdocker login dhi.ioand account entitlement). Unit-tested cosign shape includes--key https://registry.scout.docker.com/keyring/dhi/latest.puband--insecure-ignore-tlog=true; the keyring URL is verified publicly reachable.🤖 Generated with Claude Code