Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughBumps Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI / Main
participant Flags as Engine Flags
participant License as License Processor
participant Reporter as Report Generator
CLI->>Flags: parse -F / --flags
Flags-->>CLI: flags bitmask (includes ENABLE_LICENSE_FULL_REPORT?)
CLI->>License: start scan (full_license_report set/cleared)
License->>License: detect "LicenseRef" via substring
License->>License: tokenize compound expressions -> build "urls" map
License-->>Reporter: emit license entries (apply caps if full_report=false)
Reporter-->>CLI: write report/output
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/license.c (1)
316-358: Consider edge cases in license expression parsing.The AND/OR parsing logic assumes well-formed SPDX license expressions. If the input contains malformed expressions (e.g., trailing operators, consecutive spaces, or parentheses from complex expressions like
(MIT OR Apache-2.0)), the parsing may produce unexpected results.For example,
"(MIT OR Apache-2.0)"would emit"(MIT"and"Apache-2.0)"with parentheses included.💡 Optional: Strip parentheses from tokens
while (token) { /* Skip AND/OR operators */ if (strcmp(token, "AND") == 0 || strcmp(token, "OR") == 0) { token = strtok_r(NULL, " ", &saveptr); continue; } + /* Strip leading/trailing parentheses */ + while (*token == '(') token++; + char *end = token + strlen(token) - 1; + while (end > token && *end == ')') *end-- = '\0'; + if (!*token) { + token = strtok_r(NULL, " ", &saveptr); + continue; + } if (!first_entry) len += sprintf(buffer + len, ",");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/license.c` around lines 316 - 358, The parsing loop in src/license.c that tokenizes lic_copy with strtok_r(…, " ") (variables: lic_copy, token, saveptr, first_license, first_entry, buffer) doesn't handle parentheses, extra spaces or malformed expressions; update the tokenizer to split on spaces and parentheses (e.g., use delimiters " ()" or post-process each token to trim leading/trailing '(' and ')'), skip empty tokens and skip stray operators (AND/OR) or trailing operators, and ensure first_license is set only from a valid license token (fall back to a cleaned single-license value if none found) so URLs emit clean SPDX IDs without surrounding parentheses or empty entries.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/license.c`:
- Around line 316-358: The parsing loop in src/license.c that tokenizes lic_copy
with strtok_r(…, " ") (variables: lic_copy, token, saveptr, first_license,
first_entry, buffer) doesn't handle parentheses, extra spaces or malformed
expressions; update the tokenizer to split on spaces and parentheses (e.g., use
delimiters " ()" or post-process each token to trim leading/trailing '(' and
')'), skip empty tokens and skip stray operators (AND/OR) or trailing operators,
and ensure first_license is set only from a valid license token (fall back to a
cleaned single-license value if none found) so URLs emit clean SPDX IDs without
surrounding parentheses or empty entries.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: cb657bb6-bd09-4541-8e6b-21c71150584f
📒 Files selected for processing (5)
inc/scanoss.hsrc/help.csrc/license.csrc/main.csrc/match.c
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/license.c`:
- Around line 587-588: The early-exit condition inside the license iteration
incorrectly requires i > 0 so a component license at index 0 won't trigger the
break; remove the unnecessary i > 0 check in the loop that evaluates "if (i > 0
&& component_license && !full_license_report) break;" and change it to simply
check "if (component_license && !full_license_report) break;" so that when
component_license is present (even at i==0) the loop exits immediately and only
the component license is reported.
- Around line 326-352: The code can emit an invalid URL when first_license stays
empty; update the block that appends the URL (referencing variables
first_license, first_entry, lic_copy, buffer, len) to only append the ,"url":
"https://spdx.org/licenses/%s.html" fragment when first_license contains a
non-empty string (e.g., check first_license[0] != '\0'), otherwise skip adding
the URL (or add a safe fallback like omitting the url field or using a
placeholder). Ensure this check comes after the token parsing loop and before
the sprintf that writes the URL to buffer.
- Around line 561-574: The off-by-one stems from incrementing file_header_filter
and scancode_file_filter before checking; currently the code uses ">=3" which
skips the 3rd license. Update the conditions in the blocks referencing
licenses_by_type.licenses[i].id (the file_header_filter and scancode_file_filter
checks that also use full_license_report) to use ">3" instead of ">=3" so the
first three licenses are processed and only the 4th+ are skipped when
full_license_report is false.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
src/license.c
Outdated
| //file header license and scancode_file liceses are limited to a maximum of 3. | ||
| if (licenses_by_type.licenses[i].id == 2) | ||
| { | ||
| file_header_filter++; | ||
| if (file_header_filter >=3 && !full_license_report) | ||
| continue; | ||
| } | ||
|
|
||
| if (licenses_by_type.licenses[i].id == 4) | ||
| { | ||
| scancode_file_filter++; | ||
| if (scancode_file_filter >=3 && !full_license_report) | ||
| continue; | ||
| } |
There was a problem hiding this comment.
Off-by-one: allows 2 licenses instead of 3.
The comment states "limited to a maximum of 3", but the current logic skips when filter >= 3, which triggers on the 3rd item (filter values: 1→process, 2→process, 3→skip). This allows only 2.
🐛 Proposed fix
if (licenses_by_type.licenses[i].id == 2)
{
file_header_filter++;
- if (file_header_filter >=3 && !full_license_report)
+ if (file_header_filter > 3 && !full_license_report)
continue;
}
if (licenses_by_type.licenses[i].id == 4)
{
scancode_file_filter++;
- if (scancode_file_filter >=3 && !full_license_report)
+ if (scancode_file_filter > 3 && !full_license_report)
continue;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| //file header license and scancode_file liceses are limited to a maximum of 3. | |
| if (licenses_by_type.licenses[i].id == 2) | |
| { | |
| file_header_filter++; | |
| if (file_header_filter >=3 && !full_license_report) | |
| continue; | |
| } | |
| if (licenses_by_type.licenses[i].id == 4) | |
| { | |
| scancode_file_filter++; | |
| if (scancode_file_filter >=3 && !full_license_report) | |
| continue; | |
| } | |
| //file header license and scancode_file liceses are limited to a maximum of 3. | |
| if (licenses_by_type.licenses[i].id == 2) | |
| { | |
| file_header_filter++; | |
| if (file_header_filter > 3 && !full_license_report) | |
| continue; | |
| } | |
| if (licenses_by_type.licenses[i].id == 4) | |
| { | |
| scancode_file_filter++; | |
| if (scancode_file_filter > 3 && !full_license_report) | |
| continue; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/license.c` around lines 561 - 574, The off-by-one stems from incrementing
file_header_filter and scancode_file_filter before checking; currently the code
uses ">=3" which skips the 3rd license. Update the conditions in the blocks
referencing licenses_by_type.licenses[i].id (the file_header_filter and
scancode_file_filter checks that also use full_license_report) to use ">3"
instead of ">=3" so the first three licenses are processed and only the 4th+ are
skipped when full_license_report is false.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/license.c`:
- Around line 294-295: The current check in src/license.c using
license_source_id and strstr(license, "LicenseRef") drops entire Scancode
records when any LicenseRef token appears; change it so only pure LicenseRef
entries are skipped and compound expressions keep their SPDX parts: in the block
that uses license_source_id, replace the broad strstr test with a stricter test
that returns buffer only when the whole license string is a single LicenseRef-*
token (e.g., matches start with "LicenseRef" and contains no other SPDX
operators or tokens), and for compound expressions (strings containing SPDX
operators like AND/OR/() or multiple tokens) strip/ignore only the LicenseRef-*
tokens when expanding/normalizing the expression instead of returning early;
update the logic around the current strncmp/strstr usage and any normalization
code that consumes license so it removes LicenseRef-* tokens before further
parsing.
- Around line 561-587: The loop updates file_header_filter,
scancode_file_filter, scanoss_license, and component_license regardless of
whether license_to_json actually emitted output, so enforce caps only on emitted
licenses by making emission observable and using that to drive continues and
counters: modify license_to_json (or add a helper like license_would_emit) to
return/indicate whether it will emit (a bool "emitted") without side effects,
then call that before applying the cap checks (or call license_to_json and use
its emitted return) and only increment file_header_filter/scancode_file_filter
or set scanoss_license/component_license when emitted is true; update the logic
around the id checks (licenses_by_type.licenses[i].id) and the early continue
conditions to reference emitted instead of assuming every iteration produced
output, using the existing variables file_header_filter, scancode_file_filter,
scanoss_license, component_license, full_license_report and the license_to_json
symbol to locate where to change behavior.
- Around line 333-346: The code currently treats the token after a "WITH"
operator as a license and generates a licenses URL; change the logic so that
when you encounter "WITH" (in the block referencing token, saveptr) you set a
flag (e.g., next_is_exception) and advance to the next token, then when emitting
the JSON entry (the code using token, buffer, len, first_entry, first_license)
use that flag to choose "https://spdx.org/exceptions/%s.html" instead of the
licenses path and reset the flag; ensure first_license is only set for actual
licenses (not exceptions) and clear the flag after use.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
| if (!strncmp(license_source_id, "scancode", 8) && strstr(license, "LicenseRef")) | ||
| return buffer; |
There was a problem hiding this comment.
Don't drop mixed SPDX expressions just because they contain LicenseRef.
This now skips the whole Scancode record for values like MIT AND LicenseRef-scancode-foo, so the valid SPDX part is lost too. Restrict the skip to pure LicenseRef-* entries, or ignore only the LicenseRef-* token when expanding compound expressions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/license.c` around lines 294 - 295, The current check in src/license.c
using license_source_id and strstr(license, "LicenseRef") drops entire Scancode
records when any LicenseRef token appears; change it so only pure LicenseRef
entries are skipped and compound expressions keep their SPDX parts: in the block
that uses license_source_id, replace the broad strstr test with a stricter test
that returns buffer only when the whole license string is a single LicenseRef-*
token (e.g., matches start with "LicenseRef" and contains no other SPDX
operators or tokens), and for compound expressions (strings containing SPDX
operators like AND/OR/() or multiple tokens) strip/ignore only the LicenseRef-*
tokens when expanding/normalizing the expression instead of returning early;
update the logic around the current strncmp/strstr usage and any normalization
code that consumes license so it removes LicenseRef-* tokens before further
parsing.
| /* Skip AND/OR/WITH operators */ | ||
| if (strcmp(token, "AND") == 0 || strcmp(token, "OR") == 0 || strcmp(token, "WITH") == 0) | ||
| { | ||
| token = strtok_r(NULL, " ()", &saveptr); | ||
| continue; | ||
| } | ||
| if (!first_entry) | ||
| len += sprintf(buffer + len, ","); | ||
| else | ||
| { | ||
| strncpy(first_license, token, MAX_FIELD_LN - 1); | ||
| first_entry = false; | ||
| } | ||
| len += sprintf(buffer + len, "\"%s\": \"https://spdx.org/licenses/%s.html\"", token, token); |
There was a problem hiding this comment.
WITH exceptions need SPDX /exceptions/ URLs.
In expressions like GPL-2.0-only WITH Classpath-exception-2.0, the token after WITH is an SPDX exception, not a license. This code currently emits spdx.org/licenses/... for it, which produces an invalid URL.
💡 Suggested fix
- char *token = strtok_r(lic_copy, " ()", &saveptr);
+ char *token = strtok_r(lic_copy, " ()", &saveptr);
+ const char *spdx_kind = "licenses";
while (token)
{
/* Skip AND/OR/WITH operators */
- if (strcmp(token, "AND") == 0 || strcmp(token, "OR") == 0 || strcmp(token, "WITH") == 0)
+ if (strcmp(token, "AND") == 0 || strcmp(token, "OR") == 0 || strcmp(token, "WITH") == 0)
{
+ spdx_kind = (strcmp(token, "WITH") == 0) ? "exceptions" : "licenses";
token = strtok_r(NULL, " ()", &saveptr);
continue;
}
if (!first_entry)
len += sprintf(buffer + len, ",");
@@
- len += sprintf(buffer + len, "\"%s\": \"https://spdx.org/licenses/%s.html\"", token, token);
+ len += sprintf(buffer + len, "\"%s\": \"https://spdx.org/%s/%s.html\"", token, spdx_kind, token);
+ spdx_kind = "licenses";
token = strtok_r(NULL, " ()", &saveptr);
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/license.c` around lines 333 - 346, The code currently treats the token
after a "WITH" operator as a license and generates a licenses URL; change the
logic so that when you encounter "WITH" (in the block referencing token,
saveptr) you set a flag (e.g., next_is_exception) and advance to the next token,
then when emitting the JSON entry (the code using token, buffer, len,
first_entry, first_license) use that flag to choose
"https://spdx.org/exceptions/%s.html" instead of the licenses path and reset the
flag; ensure first_license is only set for actual licenses (not exceptions) and
clear the flag after use.
src/license.c
Outdated
| //file header license and scancode_file liceses are limited to a maximum of 3. | ||
| if (licenses_by_type.licenses[i].id == 2) | ||
| { | ||
| file_header_filter++; | ||
| if (file_header_filter >=3 && !full_license_report) | ||
| continue; | ||
| } | ||
|
|
||
| if (licenses_by_type.licenses[i].id == 4) | ||
| { | ||
| scancode_file_filter++; | ||
| if (scancode_file_filter >=3 && !full_license_report) | ||
| continue; | ||
| } | ||
|
|
||
| if (licenses_by_type.licenses[i].id == 5 && scanoss_license && !full_license_report) | ||
| continue; | ||
|
|
||
| buffer = license_to_json(crclist, buffer, licenses_by_type.licenses[i].text, licenses_by_type.licenses[i].id, &first); | ||
| //just report component license if available | ||
| if (licenses_by_type.licenses[i].id == 0 && !first) | ||
| component_license = true; | ||
|
|
||
| else if (licenses_by_type.licenses[i].id > 0 && !first) | ||
| scanoss_license = true; | ||
|
|
||
| if (i > 0 && component_license && !full_license_report) |
There was a problem hiding this comment.
Count only emitted licenses when enforcing caps and break conditions.
file_header_filter, scancode_file_filter, scanoss_license, and component_license are updated per iteration, but license_to_json() can no-op for duplicates, short strings, or skipped Scancode LicenseRef values. That means non-emitted entries still consume the quota and can suppress later valid licenses.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/license.c` around lines 561 - 587, The loop updates file_header_filter,
scancode_file_filter, scanoss_license, and component_license regardless of
whether license_to_json actually emitted output, so enforce caps only on emitted
licenses by making emission observable and using that to drive continues and
counters: modify license_to_json (or add a helper like license_would_emit) to
return/indicate whether it will emit (a bool "emitted") without side effects,
then call that before applying the cap checks (or call license_to_json and use
its emitted return) and only increment file_header_filter/scancode_file_filter
or set scanoss_license/component_license when emitted is true; update the logic
around the id checks (licenses_by_type.licenses[i].id) and the early continue
conditions to reference emitted instead of assuming every iteration produced
output, using the existing variables file_header_filter, scancode_file_filter,
scanoss_license, component_license, full_license_report and the license_to_json
symbol to locate where to change behavior.
There was a problem hiding this comment.
♻️ Duplicate comments (3)
src/license.c (3)
567-570:⚠️ Potential issue | 🟠 MajorCheck the
idboundary before emitting, and only advancelast_idon actual output.As written, the first record from the next
idgroup is already appended before the break runs. Also, a skipped/non-emitted record can still changelast_idand prematurely terminate the loop.💡 Suggested fix
for (int i = 0; i < licenses_by_type.count; i++) { - buffer = license_to_json(crclist, buffer, licenses_by_type.licenses[i].text, licenses_by_type.licenses[i].id, &first); - if (last_id >= 0 && last_id != licenses_by_type.licenses[i].id && !first && !full_license_report) + if (last_id >= 0 && last_id != licenses_by_type.licenses[i].id && !first && !full_license_report) break; - last_id = licenses_by_type.licenses[i].id; + + char *next = license_to_json(crclist, buffer, licenses_by_type.licenses[i].text, licenses_by_type.licenses[i].id, &first); + if (next != buffer) + last_id = licenses_by_type.licenses[i].id; + buffer = next; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/license.c` around lines 567 - 570, The loop currently calls license_to_json(...) before checking the id boundary and always updates last_id, which lets a non-emitted record change last_id and can append the first record of the next id group before breaking; fix by testing the boundary (compare licenses_by_type.licenses[i].id against last_id and respect full_license_report) before calling license_to_json so you never append from the next id group, and only assign last_id = licenses_by_type.licenses[i].id after license_to_json actually emitted output (i.e., after observing the first flag/return indicating data was written).
342-355:⚠️ Potential issue | 🟠 MajorUse SPDX
/exceptions/URLs afterWITH.The token after
WITHis an SPDX exception, so emitting it under/licenses/produces an invalid URL. TrackWITHand switch the next token tohttps://spdx.org/exceptions/...; only actual licenses should go under/licenses/.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/license.c` around lines 342 - 355, The code currently treats "WITH" like a skip and misclassifies the following token as a license (producing /licenses/ URLs); change this by tracking when you see the "WITH" operator (e.g., bool saw_with = false), and when token == "WITH" set saw_with = true and advance to the next token via strtok_r; for that next token use the exceptions path ("https://spdx.org/exceptions/%s.html") instead of the licenses path when building the buffer (then reset saw_with = false); keep existing handling of "AND"/"OR" as skips and preserve first_entry/first_license logic when adding entries to buffer.
303-304:⚠️ Potential issue | 🟠 MajorDon't drop mixed SPDX expressions just because they contain
LicenseRef.This still skips the whole Scancode record for values like
MIT AND LicenseRef-scancode-foo, so the valid SPDX part is lost too. Restrict the fast-path skip to pureLicenseRef-*entries, or strip only theLicenseRef-*token when expanding compound expressions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/license.c`:
- Around line 567-570: The loop currently calls license_to_json(...) before
checking the id boundary and always updates last_id, which lets a non-emitted
record change last_id and can append the first record of the next id group
before breaking; fix by testing the boundary (compare
licenses_by_type.licenses[i].id against last_id and respect full_license_report)
before calling license_to_json so you never append from the next id group, and
only assign last_id = licenses_by_type.licenses[i].id after license_to_json
actually emitted output (i.e., after observing the first flag/return indicating
data was written).
- Around line 342-355: The code currently treats "WITH" like a skip and
misclassifies the following token as a license (producing /licenses/ URLs);
change this by tracking when you see the "WITH" operator (e.g., bool saw_with =
false), and when token == "WITH" set saw_with = true and advance to the next
token via strtok_r; for that next token use the exceptions path
("https://spdx.org/exceptions/%s.html") instead of the licenses path when
building the buffer (then reset saw_with = false); keep existing handling of
"AND"/"OR" as skips and preserve first_entry/first_license logic when adding
entries to buffer.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Version: 5.4.24