-
Notifications
You must be signed in to change notification settings - Fork 35
docs(adr): add ADR for configurable SBOM duplicate handling #2188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
dejanb
wants to merge
1
commit into
main
Choose a base branch
from
adr/duplicate-sbom
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
118 changes: 118 additions & 0 deletions
118
docs/adrs/00013-configurable-sbom-duplicate-handling.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,118 @@ | ||||||||||||||||||
| # 00011. Configurable SBOM Duplicate Handling | ||||||||||||||||||
|
|
||||||||||||||||||
| ## Status | ||||||||||||||||||
|
|
||||||||||||||||||
| PROPOSED | ||||||||||||||||||
|
|
||||||||||||||||||
| ## Context | ||||||||||||||||||
|
|
||||||||||||||||||
| ### Problem Statement | ||||||||||||||||||
|
|
||||||||||||||||||
| Trustify currently uses hash-based deduplication (SHA256/384/512) to detect duplicate SBOMs. However, SBOM documents have stable identifiers (`documentNamespace` for SPDX, `serialNumber` for CycloneDX) that uniquely identify them regardless of minor content changes. | ||||||||||||||||||
|
|
||||||||||||||||||
| **Current Limitation**: When an SBOM is regenerated with the same identifier but different content (e.g., updated timestamps), it's ingested as a new document. | ||||||||||||||||||
|
|
||||||||||||||||||
| ### Use Cases | ||||||||||||||||||
|
|
||||||||||||||||||
| Different scenarios require different duplicate handling behaviors: | ||||||||||||||||||
|
|
||||||||||||||||||
| 1. **Audit/Compliance**: Keep all versions for historical tracking | ||||||||||||||||||
| 2. **Latest-only**: Replace old versions to save storage and show current state | ||||||||||||||||||
| 3. **Deduplication**: Ignore re-ingestion of documents with the same identifier | ||||||||||||||||||
|
|
||||||||||||||||||
| ## Decision | ||||||||||||||||||
|
|
||||||||||||||||||
| Add configurable duplicate handling with three modes based on SBOM document identifiers: | ||||||||||||||||||
|
|
||||||||||||||||||
| ### Duplicate Handling Modes | ||||||||||||||||||
|
|
||||||||||||||||||
| **`onDuplicate=ingest`** (default) | ||||||||||||||||||
| - Ingest as new document (current behavior) | ||||||||||||||||||
| - Hash-based deduplication still applies | ||||||||||||||||||
| - Backward compatible | ||||||||||||||||||
|
|
||||||||||||||||||
| **`onDuplicate=ignore`** | ||||||||||||||||||
| - Skip ingestion if SBOM with same document_id already exists | ||||||||||||||||||
| - Return existing SBOM information | ||||||||||||||||||
| - Useful for preventing re-ingestion of unchanged documents | ||||||||||||||||||
|
|
||||||||||||||||||
| **`onDuplicate=replace`** | ||||||||||||||||||
| - Delete existing SBOM with same document_id | ||||||||||||||||||
| - Ingest new version | ||||||||||||||||||
| - Maintains latest-only view | ||||||||||||||||||
|
|
||||||||||||||||||
| ## Configuration | ||||||||||||||||||
|
|
||||||||||||||||||
| ### 1. API Upload (Per-Request) | ||||||||||||||||||
|
|
||||||||||||||||||
| Add optional `onDuplicate` query parameter to SBOM upload endpoint: | ||||||||||||||||||
|
|
||||||||||||||||||
| ```bash | ||||||||||||||||||
|
Comment on lines
+48
to
+50
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nitpick (typo): Minor grammar tweak: add "it" to read more naturally. Suggest rephrasing this line to
Suggested change
|
||||||||||||||||||
| # Ignore duplicates - skip if already exists | ||||||||||||||||||
| cat sbom.json | http POST localhost:8080/api/v2/sbom onDuplicate=ignore | ||||||||||||||||||
|
|
||||||||||||||||||
| # Replace existing - delete old, ingest new | ||||||||||||||||||
| cat sbom-v2.json | http POST localhost:8080/api/v2/sbom onDuplicate=replace | ||||||||||||||||||
|
|
||||||||||||||||||
| # Ingest as new (default) - current behavior | ||||||||||||||||||
| cat sbom.json | http POST localhost:8080/api/v2/sbom | ||||||||||||||||||
| ``` | ||||||||||||||||||
|
|
||||||||||||||||||
| ### 2. Importer Configuration (Per-Importer) | ||||||||||||||||||
|
|
||||||||||||||||||
| Add `onDuplicate` field to SBOM importer configuration: | ||||||||||||||||||
|
|
||||||||||||||||||
| ```bash | ||||||||||||||||||
| # Ignore duplicates during scheduled imports | ||||||||||||||||||
| http POST localhost:8080/api/v2/importer/my-sbom-source \ | ||||||||||||||||||
| sbom[source]=https://example.com/sboms/ \ | ||||||||||||||||||
| sbom[onDuplicate]=ignore \ | ||||||||||||||||||
| sbom[period]=1d | ||||||||||||||||||
|
|
||||||||||||||||||
| # Replace with latest version | ||||||||||||||||||
| http POST localhost:8080/api/v2/importer/internal-builds \ | ||||||||||||||||||
| sbom[source]=https://builds.internal/sboms/ \ | ||||||||||||||||||
| sbom[onDuplicate]=replace \ | ||||||||||||||||||
| sbom[period]=1h | ||||||||||||||||||
| ``` | ||||||||||||||||||
|
|
||||||||||||||||||
| ## How It Works | ||||||||||||||||||
|
|
||||||||||||||||||
| ### Duplicate Detection | ||||||||||||||||||
|
|
||||||||||||||||||
| 1. Extract document identifier from SBOM: | ||||||||||||||||||
| - **SPDX**: `documentNamespace` field | ||||||||||||||||||
| - **CycloneDX**: `serialNumber` field | ||||||||||||||||||
|
|
||||||||||||||||||
| 2. Check database for existing SBOM with same identifier | ||||||||||||||||||
|
|
||||||||||||||||||
| 3. Apply configured behavior: | ||||||||||||||||||
| - **`ingest`**: Continue normal ingestion (hash-based dedup still applies) | ||||||||||||||||||
| - **`ignore`**: Skip ingestion, return existing SBOM info | ||||||||||||||||||
| - **`replace`**: Delete old SBOM and storage, then ingest new version | ||||||||||||||||||
|
|
||||||||||||||||||
| ### Implementation Scope | ||||||||||||||||||
|
|
||||||||||||||||||
| **Core Components**: | ||||||||||||||||||
| - IngestorService: Add `onDuplicate` parameter to `ingest()` method | ||||||||||||||||||
| - Graph layer: Add `get_sbom_by_document_id()` lookup function | ||||||||||||||||||
| - API endpoints: Add `onDuplicate` query parameter | ||||||||||||||||||
| - Importer config: Add `onDuplicate` field to SbomImporter | ||||||||||||||||||
|
|
||||||||||||||||||
| ## Benefits | ||||||||||||||||||
|
|
||||||||||||||||||
| - ✓ Flexible handling for different use cases (audit, latest-only, deduplication) | ||||||||||||||||||
| - ✓ Backward compatible (defaults to current behavior) | ||||||||||||||||||
| - ✓ Configurable per-importer and per-upload | ||||||||||||||||||
| - ✓ Works for both SPDX and CycloneDX formats | ||||||||||||||||||
| - ✓ Prevents storage waste from duplicate documents | ||||||||||||||||||
|
|
||||||||||||||||||
| ## Considerations | ||||||||||||||||||
|
|
||||||||||||||||||
| **Logging**: All duplicate handling actions are logged for audit trail | ||||||||||||||||||
|
|
||||||||||||||||||
| **Atomicity**: Replace operations should ensure atomicity | ||||||||||||||||||
|
|
||||||||||||||||||
| ## Open Questions | ||||||||||||||||||
|
|
||||||||||||||||||
| 1. Should `replace` mode preserve user-added labels from the old SBOM? | ||||||||||||||||||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (typo): ADR number in the title does not match the filename and may be confusing.
The file is named
00013-...but the ADR title starts with00011.Please update the heading number to match the filename, or add a brief note if the mismatch is intentional, to avoid confusion when referencing this ADR.