-
Notifications
You must be signed in to change notification settings - Fork 498
feat(ci-scripts): utility scripts for license updates and SBOM #1548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ci-scripts): utility scripts for license updates and SBOM #1548
Conversation
Signed-off-by: Will Killian <wkillian@nvidia.com>
Signed-off-by: Will Killian <wkillian@nvidia.com>
Signed-off-by: Will Killian <wkillian@nvidia.com>
WalkthroughThis pull request introduces new CI scripts for license tracking, expands project metadata with example packages and UV sources, updates documentation and guidelines, and bumps Docker image versions for the uv tool across multiple configurations and Dockerfiles. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🤖 Fix all issues with AI agents
In `@ci/scripts/license_diff.py`:
- Around line 86-89: The added_packages and removed_packages comprehension
currently include internal packages despite the intent to skip
"nvidia-nat*"—update the comprehensions that build added_packages and
removed_packages to filter out packages whose names start with "nvidia-nat" the
same way changed_packages does (i.e., use the same pkg.startswith("nvidia-nat")
check), ensuring all three dictionaries consistently exclude internal packages;
locate the comprehensions that set added_packages, removed_packages, and
changed_packages and apply the filter to the first two.
- Around line 111-124: The change lines currently render "head_version ->
base_version" and "(head_license -> base_license)" which inverts the direction;
update the two formatted strings in the list_of_changes append calls to show
base first then head (i.e., use base_version -> head_version and base_license ->
head_license) so the diff reads "base -> head"; locate the code around the loop
using changed_packages, head_packages, base_packages and pypi_license and swap
the order of the version and license interpolations in both appended strings.
- Around line 42-45: The urllib.request.urlopen calls in pypi_license() (PyPI
metadata fetch using variable url) and in main (GitHub uv.lock fetch) lack
timeouts; either add timeout=10 to both urlopen(...) calls or, preferably,
replace these requests with an httpx.Client() usage (create a client with
default verify=True) and perform client.get(url, timeout=10) to fetch and
json.loads() the response content; update pypi_license() and the main fetch
logic to use the httpx client and ensure responses are checked for successful
status before parsing.
- Around line 134-137: Validate and sanitize the CLI input for --base-branch
(args.base_branch) after parsing to prevent malformed GitHub URLs; specifically,
restrict it to a safe character set (e.g., allow letters, digits, dot,
underscore, hyphen and slash via a regex like r'^[A-Za-z0-9._/-]+$') and call
parser.error(...) or exit with a clear message when the value fails validation,
or alternatively percent-encode the branch name before using it in the GitHub
API URL construction that interpolates args.base_branch.
In `@ci/scripts/sbom_list.py`:
- Around line 62-68: Rename the unused parameter base_name in the function
process_uvlock to _base_name to signal it's intentionally unused (update the
function signature accordingly), and update the docstring parameter section to
document _base_name instead of base_name while keeping the compatibility note;
verify there are no internal references to base_name that need changing and run
tests/linting to ensure no unused-parameter warnings remain.
- Around line 42-44: Replace the blocking urllib.request.urlopen call with the
project's preferred httpx synchronous client: create/reuse an httpx.Client in
main() (or the calling scope), fetch the PyPI URL via client.get(url,
timeout=10) and parse the JSON via response.json(), and ensure the client is
closed (use a with httpx.Client() as client or store and close it) so requests
to the URL in the sbom_list.py function replace urllib.request.urlopen(url) and
json.load(r) with client.get(...).json() using timeout and proper lifecycle
management.
In `@pyproject.toml`:
- Around line 109-154: The examples list contains "text_file_ingest" which lacks
the required nat_ prefix; update the examples array entry to
"nat_text_file_ingest" and also update the corresponding entry in
tool.uv.sources (where "text_file_ingest" is referenced) to
"nat_text_file_ingest" so both the examples list and tool.uv.sources use the
required nat_ prefix.
Salonijain27
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved from a dependency point of view
Signed-off-by: Will Killian <wkillian@nvidia.com>
Signed-off-by: Will Killian <wkillian@nvidia.com>
Signed-off-by: Will Killian <wkillian@nvidia.com>
ericevans-nv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved conditionally, pending implementation of the CodeRabbit suggestions. Adding timeouts to the requests would be helpful to prevent CI from hanging indefinitely.
Signed-off-by: Will Killian <wkillian@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@ci/scripts/license_diff.py`:
- Around line 43-48: The current broad except Exception around the network/json
fetch hides programming errors; replace it by catching only the expected
failures from urllib.request.urlopen and json.load: catch urllib.error.URLError
and urllib.error.HTTPError (or the umbrella urllib.error.URLError which covers
HTTPError) and json.JSONDecodeError (and optionally socket.timeout if timeouts
are used) and return "(License not found)" for those cases, but let other
exceptions propagate; ensure you import urllib.error and json.JSONDecodeError
and use an except (urllib.error.URLError, json.JSONDecodeError) as e block
around the urlopen/json.load sequence (referencing url, urllib.request.urlopen,
and json.load) so only network/JSON errors are swallowed.
In `@ci/scripts/sbom_list.py`:
- Around line 42-47: The try/except around the PyPI fetch is too broad; narrow
it to only handle expected network and JSON errors by catching
urllib.error.HTTPError and urllib.error.URLError from urllib.request.urlopen and
json.JSONDecodeError (and optionally ValueError) from json.load, return
"(License not found)" for those cases, and re-raise any other unexpected
exceptions so programming errors aren't swallowed; reference the existing url
construction, urllib.request.urlopen, and json.load when locating where to
replace the broad "except Exception" with these specific exception types.
🧹 Nitpick comments (2)
ci/scripts/sbom_list.py (2)
80-88: Open TSV withnewline=""and UTF-8 to avoid CSV quirks.
csv.writerrecommendsnewline=""to prevent extra blank lines on Windows, and explicit UTF-8 avoids locale issues in license text.♻️ Suggested change
- with open("licenses.tsv", "w") as f: + with open("licenses.tsv", "w", newline="", encoding="utf-8") as f: writer = csv.writer(f, delimiter="\t")
104-111: Sort package names for deterministicsbom_list.tsv.
licenses.tsvis sorted; do the same here to make diffs stable across runs.♻️ Suggested change
- for pkg in tqdm(pkgs.keys(), desc="Processing packages", unit="packages"): + for pkg in tqdm(sorted(pkgs.keys()), desc="Processing packages", unit="packages"): try: sbom_list.append({
This is a user manually invoked script. It does not run in CI at all. I placed it here because it is maintainer-oriented rather than developer-oriented and I didn't want to confuse consumers of the library. |
mnajafian-nv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, I made some suggestions for your review on Exception Handling, Missing Timeouts, and Dead Code.
|
/merge |
Description
Adds two new scripts:
./ci/scripts/license_diff.py- shows the license/package changes from HEAD to a base branch (default isdevelop). Very useful for finding new/updated/removed packages for PRs. Output written to standard output../ci/scripts/sbom_list.py- shows the full package SBOM with package name, version, and license. Exported assbom_list.tsv.This also re-adds all examples into the top-level so we can track all example dependencies in the top-level
uv.lockAnd we also now pin a specific version of the uv containers to err on the side of caution.
Example output of
license_diff.pyis below. (For this very PR)Notable output considerations:
(source)Closes
By Submitting this PR I confirm:
Summary by CodeRabbit
Release Notes
Documentation
New Features
Chores