5.4.26#113
Conversation
|
Warning Review limit reached
More reviews will be available in 49 minutes and 49 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR updates the release version, changes extension handling, adds new purl/component/snippet scan modes, and enforces a maximum decompressed file-content size. ChangesExtension, scan entry points, and limits
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
* add purl scan mode * add componentes and snippets scanning mode
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
src/purl_scan.c (2)
118-123: 🩺 Stability & Availability | 🔵 Trivial | 💤 Low valueGuard against
reallocfailure / size overflow.If
reallocreturns NULL the originalurl_hashespointer is leaked and the next line dereferences NULL. Capturing into a temp (and optionally using a checked growth) makes this robust. Low priority since cohorts are small, but trivial to harden.♻️ Suggested hardening
if (e->n_url_hashes >= e->url_hashes_cap) { e->url_hashes_cap = e->url_hashes_cap ? e->url_hashes_cap * 2 : 8; - e->url_hashes = realloc(e->url_hashes, e->url_hashes_cap * sizeof(char *)); + char **tmp = realloc(e->url_hashes, e->url_hashes_cap * sizeof(char *)); + if (!tmp) + return; + e->url_hashes = tmp; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/purl_scan.c` around lines 118 - 123, The URL hashes growth logic in purl_scan.c is not safe against allocation failure because the result of realloc is written directly back to e->url_hashes and then used immediately. Update the append path around the url_hashes resizing in the relevant function to store realloc in a temporary pointer, verify it is non-NULL before assigning back, and only increment n_url_hashes after a successful growth. Also add a checked capacity calculation for the e->url_hashes_cap * sizeof(char *) allocation to avoid size overflow when growing the buffer.
537-545: 🩺 Stability & Availability | 🔵 Trivial | 💤 Low valueCheck
strdupresult before use.
rec = strdup(...)is passed tofield_nwithout a NULL check; on allocation failurefield_nwould operate on NULL. Minor given the context, but a quick guard avoids an undefined path.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/purl_scan.c` around lines 537 - 545, The file= parsing path in purl_scan.c should guard against strdup allocation failure before calling field_n on rec. Update the snippet around the strdup(line + tagln + MD5_LEN * 2 + 1) assignment so it checks whether rec is NULL, and if so emit the same malformed/allocation-style error handling used nearby, free any needed resources, and return EXIT_FAILURE before target_path is derived. Keep the fix localized to the field_n/target_path parsing flow so NULL never reaches field_n.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@inc/purl_scan.h`:
- Around line 39-49: The `component_scan` contract in the header is outdated: it
describes a single `url_hash_hex`, but the implementation in `component_scan`
actually accepts a comma-separated list and returns a `{"results": [...]}`
payload. Update the Doxygen `@brief` and `@param` text in `purl_scan.h` to
describe list-based input and multi-result output, keeping the wording aligned
with the behavior in `src/purl_scan.c` so callers see the correct API contract.
In `@src/purl_scan.c`:
- Around line 567-569: The copy into scan->file_size in purl_scan is unbounded
and can overflow because size_field may exceed the 32-byte allocation from scan
initialization. Update the assignment in the scan parsing flow to use a bounded
copy or validated length for scan->file_size, and keep the existing
scan->source_md5 handling unchanged since that buffer is already sized
correctly. Use the relevant purl_scan logic around size_field and the scan
struct fields to locate and fix this safely.
In `@src/url.c`:
- Around line 360-362: The logging in get_oldest_url can dereference comp or
comp->purls when allocation or initialization failed, so add null guards before
the scanlog call. Update the get_oldest_url path to verify comp is non-null
after calloc and to check comp->purls before accessing comp->purls[0], falling
back to safe placeholder text in the log message when either is missing.
---
Nitpick comments:
In `@src/purl_scan.c`:
- Around line 118-123: The URL hashes growth logic in purl_scan.c is not safe
against allocation failure because the result of realloc is written directly
back to e->url_hashes and then used immediately. Update the append path around
the url_hashes resizing in the relevant function to store realloc in a temporary
pointer, verify it is non-NULL before assigning back, and only increment
n_url_hashes after a successful growth. Also add a checked capacity calculation
for the e->url_hashes_cap * sizeof(char *) allocation to avoid size overflow
when growing the buffer.
- Around line 537-545: The file= parsing path in purl_scan.c should guard
against strdup allocation failure before calling field_n on rec. Update the
snippet around the strdup(line + tagln + MD5_LEN * 2 + 1) assignment so it
checks whether rec is NULL, and if so emit the same malformed/allocation-style
error handling used nearby, free any needed resources, and return EXIT_FAILURE
before target_path is derived. Keep the fix localized to the field_n/target_path
parsing flow so NULL never reaches field_n.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bbb8d979-36d6-4ad1-8ec4-dfdca4578232
📒 Files selected for processing (9)
inc/limits.hinc/match_list.hinc/purl_scan.hinc/report.hsrc/help.csrc/main.csrc/match_list.csrc/purl_scan.csrc/url.c
✅ Files skipped from review due to trivial changes (2)
- inc/limits.h
- src/help.c
| scanlog("get_oldest_url iter=%d purl[0]=%s identified=%d rank=%d release=%s\n", | ||
| iteration, comp->purls[0] ? comp->purls[0] : "(null)", comp->identified, comp->rank, | ||
| comp->release_date ? comp->release_date : "(null)"); |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Guard comp/comp->purls before logging to avoid null dereference.
This log dereferences comp->purls[0] without checking comp->purls, and comp is also not checked after calloc. A low-memory path can crash here.
Suggested fix
component_data_t * comp = calloc(1, sizeof(*comp));
+ if (!comp)
+ {
+ scanlog("get_oldest_url: out of memory allocating component_data_t\n");
+ free(url);
+ return false;
+ }
bool result = fill_component(comp, key, NULL, (uint8_t *)url);
@@
- scanlog("get_oldest_url iter=%d purl[0]=%s identified=%d rank=%d release=%s\n",
- iteration, comp->purls[0] ? comp->purls[0] : "(null)", comp->identified, comp->rank,
+ const char *purl0 = (comp->purls && comp->purls[0]) ? comp->purls[0] : "(null)";
+ scanlog("get_oldest_url iter=%d purl[0]=%s identified=%d rank=%d release=%s\n",
+ iteration, purl0, comp->identified, comp->rank,
comp->release_date ? comp->release_date : "(null)");📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| scanlog("get_oldest_url iter=%d purl[0]=%s identified=%d rank=%d release=%s\n", | |
| iteration, comp->purls[0] ? comp->purls[0] : "(null)", comp->identified, comp->rank, | |
| comp->release_date ? comp->release_date : "(null)"); | |
| component_data_t * comp = calloc(1, sizeof(*comp)); | |
| if (!comp) | |
| { | |
| scanlog("get_oldest_url: out of memory allocating component_data_t\n"); | |
| free(url); | |
| return false; | |
| } | |
| bool result = fill_component(comp, key, NULL, (uint8_t *)url); | |
| const char *purl0 = (comp->purls && comp->purls[0]) ? comp->purls[0] : "(null)"; | |
| scanlog("get_oldest_url iter=%d purl[0]=%s identified=%d rank=%d release=%s\n", | |
| iteration, purl0, comp->identified, comp->rank, | |
| comp->release_date ? comp->release_date : "(null)"); |
🧰 Tools
🪛 Cppcheck (2.21.0)
[warning] 361-361: If memory allocation fails, then there is a possible null pointer dereference
(nullPointerOutOfMemory)
[warning] 361-361: If memory allocation fails, then there is a possible null pointer dereference
(nullPointerOutOfMemory)
[warning] 361-361: If memory allocation fails, then there is a possible null pointer dereference
(nullPointerOutOfMemory)
[warning] 361-361: If memory allocation fails, then there is a possible null pointer dereference
(nullPointerOutOfMemory)
[warning] 362-362: If memory allocation fails, then there is a possible null pointer dereference
(nullPointerOutOfMemory)
[warning] 362-362: If memory allocation fails, then there is a possible null pointer dereference
(nullPointerOutOfMemory)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/url.c` around lines 360 - 362, The logging in get_oldest_url can
dereference comp or comp->purls when allocation or initialization failed, so add
null guards before the scanlog call. Update the get_oldest_url path to verify
comp is non-null after calloc and to check comp->purls before accessing
comp->purls[0], falling back to safe placeholder text in the log message when
either is missing.
Source: Linters/SAST tools
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/copyright.c (1)
39-58: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueHelper returns string literals as
char *.
copyright_id_to_source_namereturns string literals through a non-const char *. Returning aconst char *would better signal that callers must not mutate the result. Minor and optional.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/copyright.c` around lines 39 - 58, The helper copyright_id_to_source_name currently returns string literals through a non-const char *; update its return type to const char * and keep the switch cases returning the same literals. Also check any callers in copyright.c that consume this helper so they accept a const-qualified result without casting away const.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/main.c`:
- Around line 494-497: Validate the `--max-file-content-size` option in the
`case 262` handler before assigning `max_file_content_size`, because `strtoull`
currently accepts non-numeric input and zero, which can unintentionally disable
all file scanning in `mz_get_key`. Add input checking in the argument parsing
path in `main` so invalid or zero values are either rejected with an error or
explicitly treated as unlimited, and keep the existing `scanlog` message
consistent with the chosen behavior.
In `@src/mz.c`:
- Around line 104-113: The size guard in mz.c is fine, but a zero value for
max_file_content_size will cause every file to be rejected here. Update the
--max-file-content-size handling in main.c (case 262 / the option parsing that
sets max_file_content_size) to validate the parsed value and reject or default
invalid/empty input so this variable is never left at 0. Make the validation
happen before mz.c uses the limit, and keep the existing rejection path
unchanged.
---
Nitpick comments:
In `@src/copyright.c`:
- Around line 39-58: The helper copyright_id_to_source_name currently returns
string literals through a non-const char *; update its return type to const char
* and keep the switch cases returning the same literals. Also check any callers
in copyright.c that consume this helper so they accept a const-qualified result
without casting away const.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 033abae6-ed9d-4a84-9b51-933a529b0365
📒 Files selected for processing (6)
inc/limits.hsrc/copyright.csrc/help.csrc/limits.csrc/main.csrc/mz.c
✅ Files skipped from review due to trivial changes (1)
- src/help.c
| case 262: /* --max-file-content-size (value in MB) */ | ||
| max_file_content_size = strtoull(optarg, NULL, 10) * 1024 * 1024; | ||
| scanlog("Max file content size set to %lu MB\n", (unsigned long) (max_file_content_size / (1024 * 1024))); | ||
| break; |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Validate --max-file-content-size input.
strtoull returns 0 for non-numeric input, and an explicit 0 is accepted as-is. Either case sets max_file_content_size = 0, which makes mz_get_key() (src/mz.c Line 105) reject every file with the "exceeds the maximum allowed" error. Consider rejecting 0/unparseable values, or treating 0 as "unlimited".
🛡️ Suggested guard
case 262: /* --max-file-content-size (value in MB) */
- max_file_content_size = strtoull(optarg, NULL, 10) * 1024 * 1024;
+ {
+ char *endptr = NULL;
+ unsigned long long mb = strtoull(optarg, &endptr, 10);
+ if (endptr == optarg || *endptr != '\0' || mb == 0)
+ {
+ fprintf(stderr, "Invalid --max-file-content-size value: %s\n", optarg);
+ exit(EXIT_FAILURE);
+ }
+ max_file_content_size = mb * 1024 * 1024;
+ }
scanlog("Max file content size set to %lu MB\n", (unsigned long) (max_file_content_size / (1024 * 1024)));
break;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| case 262: /* --max-file-content-size (value in MB) */ | |
| max_file_content_size = strtoull(optarg, NULL, 10) * 1024 * 1024; | |
| scanlog("Max file content size set to %lu MB\n", (unsigned long) (max_file_content_size / (1024 * 1024))); | |
| break; | |
| case 262: /* --max-file-content-size (value in MB) */ | |
| { | |
| char *endptr = NULL; | |
| unsigned long long mb = strtoull(optarg, &endptr, 10); | |
| if (endptr == optarg || *endptr != '\0' || mb == 0) | |
| { | |
| fprintf(stderr, "Invalid --max-file-content-size value: %s\n", optarg); | |
| exit(EXIT_FAILURE); | |
| } | |
| max_file_content_size = mb * 1024 * 1024; | |
| } | |
| scanlog("Max file content size set to %lu MB\n", (unsigned long) (max_file_content_size / (1024 * 1024))); | |
| break; |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/main.c` around lines 494 - 497, Validate the `--max-file-content-size`
option in the `case 262` handler before assigning `max_file_content_size`,
because `strtoull` currently accepts non-numeric input and zero, which can
unintentionally disable all file scanning in `mz_get_key`. Add input checking in
the argument parsing path in `main` so invalid or zero values are either
rejected with an error or explicitly treated as unlimited, and keep the existing
`scanlog` message consistent with the chosen behavior.
| /* Reject files whose content exceeds the configured maximum size */ | ||
| if (job.data_ln > max_file_content_size) | ||
| { | ||
| fprintf(stderr, "File content size (%.2f MB) exceeds the maximum allowed (%lu MB). Use --max-file-content-size to change the limit.\n", (double) job.data_ln / (1024 * 1024), (unsigned long) (max_file_content_size / (1024 * 1024))); | ||
| free(job.data); | ||
| free(job.key); | ||
| free(job.mz); | ||
| exit(EXIT_FAILURE); | ||
| } | ||
|
|
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Limit check looks correct; note max_file_content_size == 0 rejects all files.
The decompressed-size guard and cleanup ordering are correct. Be aware the effective behavior depends on input validation for --max-file-content-size; a 0 value (from invalid/empty input) would reject every file here. See the related comment in src/main.c (case 262).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/mz.c` around lines 104 - 113, The size guard in mz.c is fine, but a zero
value for max_file_content_size will cause every file to be rejected here.
Update the --max-file-content-size handling in main.c (case 262 / the option
parsing that sets max_file_content_size) to validate the parsed value and reject
or default invalid/empty input so this variable is never left at 0. Make the
validation happen before mz.c uses the limit, and keep the existing rejection
path unchanged.
Summary by CodeRabbit
-P/--purl,-C/--url-hash, and-S/--snippet-scan(with JSON output).