Skip to content

SBOM merge performance#327

Open
rpkelly wants to merge 3 commits intobottlerocket-os:developfrom
rpkelly:sbom-perf-squashed
Open

SBOM merge performance#327
rpkelly wants to merge 3 commits intobottlerocket-os:developfrom
rpkelly:sbom-perf-squashed

Conversation

@rpkelly
Copy link
Contributor

@rpkelly rpkelly commented Feb 3, 2026

Issue number: 326

Description of changes:
Modified SBOM generation to be slightly faster (estimated 1-3 minutes shaved off core kit generation) and changed the merge algorithm from an O(n^2) one while introducing additional parallelization.

Also, found that some relationships were being dropped during merge so fixed that, making the merged SBOM less flat.

Testing done:

Extracted all SBOMs from the core kit and ran as a benchmark. See below:

Before:

Building sbomtool...
Extracting SBOMs from RPMs...
Found 262 SBOM RPMs
Extracted: 125 SPDX, 125 CycloneDX (71M)

=== Benchmarks ===

--- SPDX only (125 files) ---
  Wall time: 254.95s

--- CycloneDX only (125 files) ---
  Wall time: 246.39s

After:

Building sbomtool...                                                                                                                                                                          
Extracting SBOMs from RPMs...                                                                                                                                                                 
Found 262 SBOM RPMs                                                                                                                                                                           
Extracted: 125 SPDX, 125 CycloneDX (71M)                                                                                                                                                      
                                                                                                                                                                                              
=== Benchmarks ===                                                                                                                                                                            
                                                                                                                                                                                              
--- SPDX only (125 files) ---                                                                                                                                                                 
  Wall time: 3.05s                                                                                                                                                                            
  Output: 4009 packages, 8.4M                                                                                                                                                                 
                                                                                                                                                                                              
--- CycloneDX only (125 files) ---                                                                                                                                                            
  Wall time: 1.77s                                                                                                                                                                            
  Output: 3883 packages, 8.6M                                                                                                                                                                 
                                                                                                                                                                                              
--- Mixed SPDX+CycloneDX (250 files) ---                                                                                                                                                      
  Wall time: 4.23s                                                                                                                                                                            
  Output: 4021 packages, 9.5M                                                                                                                                                                 
                                                                                                                                                                                              
--- Dual output (both formats) ---                                                                                                                                                            
  Wall time: 4.25s                                                                                                                                                                            
  Outputs:                                                                                                                                                                                    
    /tmp/sbomtool-benchmark-991667/both-cyclonedx.json: 7.6M                                                                                                                                  
    /tmp/sbomtool-benchmark-991667/both-spdx.json: 9.5M                                                                                                                                       
                                                                                                                                                                                              
=== Validation ===                                                                                                                                                                            
                                                                                                                                                                                              
Package counts:                                                                                                                                                                               
  SPDX only:      4009                                                                                                                                                                        
  CycloneDX only: 3883
  Mixed:          4021
  Both (SPDX):    4021
  Both (CDX):     4020

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

… dual format output

- Add Union-Find for O(α(n)) package deduplication (replaces O(n²))
- Parallelize SBOM loading with errgroup
- Add dual SPDX/CycloneDX output support
- Fix relationship preservation after deduplication

Signed-off-by: Richard Kelly <rpkelly@amazon.com>
…encoding

Signed-off-by: Richard Kelly <rpkelly@amazon.com>
Signed-off-by: Richard Kelly <rpkelly@amazon.com>
@rpkelly rpkelly changed the title SBOM perf squashed SBOM merge performace Feb 3, 2026
@rpkelly rpkelly changed the title SBOM merge performace SBOM merge performance Feb 3, 2026
// Start worker pool for parallel file processing
numWorkers := runtime.NumCPU()
// Buffer 100 items per worker to reduce contention while limiting memory usage
workCh := make(chan fileWork, numWorkers*100)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 100 an arbitrary number here or there are some heuristics that led to 100 being chosen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arbitrary to stop it from being infinite

}

mergeCmd.Flags().String("output", "", "Output file path for merged SBOM (required)")
mergeCmd.Flags().String("output-format", "", "Output format: spdx-json, cyclonedx-json, or both (default: first input's format)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

help text doesn't match your switch case in parsing this flag - per the switch case below, accepted values are spdx|cyclonedx|both

if err != nil {
return nil, fmt.Errorf("failed to scan buildroot: %w", err)
// Start worker pool for parallel file processing
numWorkers := runtime.NumCPU()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might want runtime.GOMAXPROCS(0) here. This is a container-aware update to this method that will respect any resource limits - since the sbomtool might be running in buildsys, it's probably more appropriate to use here https://go.dev/blog/container-aware-gomaxprocs

I'm not super concerned about this, but curious if you do see any performance differences in making the change - if buildsys containers are given some limited access to cpus/threads, the current code could run itself into resource contention.

}

installedPath := filepath.Clean(relativePath)
installedPath := "/" + rel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should still use filepath.Clean() here to avoid any edge cases

func loadSBOMs(inputFiles []string) ([]*sbom.SBOM, string, error) {
var sboms []*sbom.SBOM
var commonFormat string
numWorkers := runtime.NumCPU()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider runtime.GOMAXPROCS(0): https://go.dev/blog/container-aware-gomaxprocs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants