Conversation
… dual format output - Add Union-Find for O(α(n)) package deduplication (replaces O(n²)) - Parallelize SBOM loading with errgroup - Add dual SPDX/CycloneDX output support - Fix relationship preservation after deduplication Signed-off-by: Richard Kelly <rpkelly@amazon.com>
…encoding Signed-off-by: Richard Kelly <rpkelly@amazon.com>
Signed-off-by: Richard Kelly <rpkelly@amazon.com>
| // Start worker pool for parallel file processing | ||
| numWorkers := runtime.NumCPU() | ||
| // Buffer 100 items per worker to reduce contention while limiting memory usage | ||
| workCh := make(chan fileWork, numWorkers*100) |
There was a problem hiding this comment.
Is 100 an arbitrary number here or there are some heuristics that led to 100 being chosen?
There was a problem hiding this comment.
Arbitrary to stop it from being infinite
| } | ||
|
|
||
| mergeCmd.Flags().String("output", "", "Output file path for merged SBOM (required)") | ||
| mergeCmd.Flags().String("output-format", "", "Output format: spdx-json, cyclonedx-json, or both (default: first input's format)") |
There was a problem hiding this comment.
help text doesn't match your switch case in parsing this flag - per the switch case below, accepted values are spdx|cyclonedx|both
| if err != nil { | ||
| return nil, fmt.Errorf("failed to scan buildroot: %w", err) | ||
| // Start worker pool for parallel file processing | ||
| numWorkers := runtime.NumCPU() |
There was a problem hiding this comment.
you might want runtime.GOMAXPROCS(0) here. This is a container-aware update to this method that will respect any resource limits - since the sbomtool might be running in buildsys, it's probably more appropriate to use here https://go.dev/blog/container-aware-gomaxprocs
I'm not super concerned about this, but curious if you do see any performance differences in making the change - if buildsys containers are given some limited access to cpus/threads, the current code could run itself into resource contention.
| } | ||
|
|
||
| installedPath := filepath.Clean(relativePath) | ||
| installedPath := "/" + rel |
There was a problem hiding this comment.
You should still use filepath.Clean() here to avoid any edge cases
| func loadSBOMs(inputFiles []string) ([]*sbom.SBOM, string, error) { | ||
| var sboms []*sbom.SBOM | ||
| var commonFormat string | ||
| numWorkers := runtime.NumCPU() |
There was a problem hiding this comment.
consider runtime.GOMAXPROCS(0): https://go.dev/blog/container-aware-gomaxprocs
Issue number: 326
Description of changes:
Modified SBOM generation to be slightly faster (estimated 1-3 minutes shaved off core kit generation) and changed the merge algorithm from an O(n^2) one while introducing additional parallelization.
Also, found that some relationships were being dropped during merge so fixed that, making the merged SBOM less flat.
Testing done:
Extracted all SBOMs from the core kit and ran as a benchmark. See below:
Before:
After:
Terms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.