Skip to content

5.4.24#111

Open
mscasso-scanoss wants to merge 5 commits intomainfrom
5.4.24
Open

5.4.24#111
mscasso-scanoss wants to merge 5 commits intomainfrom
5.4.24

Conversation

@mscasso-scanoss
Copy link
Copy Markdown
Contributor

@mscasso-scanoss mscasso-scanoss commented Apr 9, 2026

Summary by CodeRabbit

  • New Features

    • Added "Enable full license report" reporting option.
    • Compound license expressions now emit multiple SPDX URLs plus a primary URL for clearer attribution.
  • Bug Fixes

    • Reduced duplicate and excessive license entries in compact reports; improved license ordering and selection logic.
  • Documentation

    • Updated built-in help text to document the new reporting flag.

Version: 5.4.24

@mscasso-scanoss mscasso-scanoss self-assigned this Apr 9, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Bumps SCANOSS_VERSION to 5.4.24 and adds ENABLE_LICENSE_FULL_REPORT flag; CLI/help expose the flag. License parsing and reporting logic updated (LicenseRef detection, compound-license URLs, output capping/early-exit when full report is disabled). Match selection ordering adjusted.

Changes

Cohort / File(s) Summary
Version & Flags
inc/scanoss.h
Bumped SCANOSS_VERSION to "5.4.24" and added ENABLE_LICENSE_FULL_REPORT = 32768.
CLI / Help
src/main.c, src/help.c
-F/--flags parsing now detects ENABLE_LICENSE_FULL_REPORT and sets full_license_report; help text documents the new flag (default: disabled).
License handling
src/license.c
Changed LicenseRef detection to substring match; tokenize compound expressions (AND, OR, WITH) to emit a "urls" map (and set "url" to first token); added truncation/early-exit when full_license_report is false; replaced special-case ID ordering with priority table.
Match selection
src/match.c
Reordered candidate-selection checks so identification-based skip happens before path_is_third_party() preference comparison.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI / Main
    participant Flags as Engine Flags
    participant License as License Processor
    participant Reporter as Report Generator

    CLI->>Flags: parse -F / --flags
    Flags-->>CLI: flags bitmask (includes ENABLE_LICENSE_FULL_REPORT?)
    CLI->>License: start scan (full_license_report set/cleared)
    License->>License: detect "LicenseRef" via substring
    License->>License: tokenize compound expressions -> build "urls" map
    License-->>Reporter: emit license entries (apply caps if full_report=false)
    Reporter-->>CLI: write report/output
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • 5.4.20 #103: Overlaps on license handling changes and CLI flag behavior (strong code-level overlap with src/license.c and flags).
  • 5.4.23 #110: Also bumps SCANOSS_VERSION in inc/scanoss.h; related to the version macro change.

Suggested reviewers

  • scanoss-qg

Poem

🐇 A flag hops in, the version climbs high,
I sniff out LicenseRef where it hides nearby,
I split the strings and map each URL,
Short reports hush when the brief bell will knell,
The rabbit twitches — the engine sighs.

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title '5.4.24' only indicates a version number and does not meaningfully describe the actual changes in the pull request. Provide a descriptive title that summarizes the main changes, such as 'Add full license report feature' or 'Version 5.4.24: Add license report flag and improve license sorting'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 5.4.24

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/license.c (1)

316-358: Consider edge cases in license expression parsing.

The AND/OR parsing logic assumes well-formed SPDX license expressions. If the input contains malformed expressions (e.g., trailing operators, consecutive spaces, or parentheses from complex expressions like (MIT OR Apache-2.0)), the parsing may produce unexpected results.

For example, "(MIT OR Apache-2.0)" would emit "(MIT" and "Apache-2.0)" with parentheses included.

💡 Optional: Strip parentheses from tokens
 		while (token)
 		{
 			/* Skip AND/OR operators */
 			if (strcmp(token, "AND") == 0 || strcmp(token, "OR") == 0)
 			{
 				token = strtok_r(NULL, " ", &saveptr);
 				continue;
 			}
+			/* Strip leading/trailing parentheses */
+			while (*token == '(') token++;
+			char *end = token + strlen(token) - 1;
+			while (end > token && *end == ')') *end-- = '\0';
+			if (!*token) {
+				token = strtok_r(NULL, " ", &saveptr);
+				continue;
+			}
 			if (!first_entry)
 				len += sprintf(buffer + len, ",");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/license.c` around lines 316 - 358, The parsing loop in src/license.c that
tokenizes lic_copy with strtok_r(…, " ") (variables: lic_copy, token, saveptr,
first_license, first_entry, buffer) doesn't handle parentheses, extra spaces or
malformed expressions; update the tokenizer to split on spaces and parentheses
(e.g., use delimiters " ()" or post-process each token to trim leading/trailing
'(' and ')'), skip empty tokens and skip stray operators (AND/OR) or trailing
operators, and ensure first_license is set only from a valid license token (fall
back to a cleaned single-license value if none found) so URLs emit clean SPDX
IDs without surrounding parentheses or empty entries.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/license.c`:
- Around line 316-358: The parsing loop in src/license.c that tokenizes lic_copy
with strtok_r(…, " ") (variables: lic_copy, token, saveptr, first_license,
first_entry, buffer) doesn't handle parentheses, extra spaces or malformed
expressions; update the tokenizer to split on spaces and parentheses (e.g., use
delimiters " ()" or post-process each token to trim leading/trailing '(' and
')'), skip empty tokens and skip stray operators (AND/OR) or trailing operators,
and ensure first_license is set only from a valid license token (fall back to a
cleaned single-license value if none found) so URLs emit clean SPDX IDs without
surrounding parentheses or empty entries.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cb657bb6-bd09-4541-8e6b-21c71150584f

📥 Commits

Reviewing files that changed from the base of the PR and between c99222b and b0276d9.

📒 Files selected for processing (5)
  • inc/scanoss.h
  • src/help.c
  • src/license.c
  • src/main.c
  • src/match.c

Copy link
Copy Markdown
Contributor

@scanoss-qg scanoss-qg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/license.c`:
- Around line 587-588: The early-exit condition inside the license iteration
incorrectly requires i > 0 so a component license at index 0 won't trigger the
break; remove the unnecessary i > 0 check in the loop that evaluates "if (i > 0
&& component_license && !full_license_report) break;" and change it to simply
check "if (component_license && !full_license_report) break;" so that when
component_license is present (even at i==0) the loop exits immediately and only
the component license is reported.
- Around line 326-352: The code can emit an invalid URL when first_license stays
empty; update the block that appends the URL (referencing variables
first_license, first_entry, lic_copy, buffer, len) to only append the ,"url":
"https://spdx.org/licenses/%s.html" fragment when first_license contains a
non-empty string (e.g., check first_license[0] != '\0'), otherwise skip adding
the URL (or add a safe fallback like omitting the url field or using a
placeholder). Ensure this check comes after the token parsing loop and before
the sprintf that writes the URL to buffer.
- Around line 561-574: The off-by-one stems from incrementing file_header_filter
and scancode_file_filter before checking; currently the code uses ">=3" which
skips the 3rd license. Update the conditions in the blocks referencing
licenses_by_type.licenses[i].id (the file_header_filter and scancode_file_filter
checks that also use full_license_report) to use ">3" instead of ">=3" so the
first three licenses are processed and only the 4th+ are skipped when
full_license_report is false.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9dc51fdc-e2da-43a7-9442-b6422e954587

📥 Commits

Reviewing files that changed from the base of the PR and between 4c1fab5 and 276f76a.

📒 Files selected for processing (1)
  • src/license.c

src/license.c Outdated
Comment on lines +561 to +574
//file header license and scancode_file liceses are limited to a maximum of 3.
if (licenses_by_type.licenses[i].id == 2)
{
file_header_filter++;
if (file_header_filter >=3 && !full_license_report)
continue;
}

if (licenses_by_type.licenses[i].id == 4)
{
scancode_file_filter++;
if (scancode_file_filter >=3 && !full_license_report)
continue;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Off-by-one: allows 2 licenses instead of 3.

The comment states "limited to a maximum of 3", but the current logic skips when filter >= 3, which triggers on the 3rd item (filter values: 1→process, 2→process, 3→skip). This allows only 2.

🐛 Proposed fix
 		if (licenses_by_type.licenses[i].id == 2)
 		{
 			file_header_filter++;
-			if (file_header_filter >=3 && !full_license_report)
+			if (file_header_filter > 3 && !full_license_report)
 				continue;
 		}
 		
 		if (licenses_by_type.licenses[i].id == 4)
 		{
 			scancode_file_filter++;
-			if (scancode_file_filter >=3 && !full_license_report)
+			if (scancode_file_filter > 3 && !full_license_report)
 				continue;
 		}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
//file header license and scancode_file liceses are limited to a maximum of 3.
if (licenses_by_type.licenses[i].id == 2)
{
file_header_filter++;
if (file_header_filter >=3 && !full_license_report)
continue;
}
if (licenses_by_type.licenses[i].id == 4)
{
scancode_file_filter++;
if (scancode_file_filter >=3 && !full_license_report)
continue;
}
//file header license and scancode_file liceses are limited to a maximum of 3.
if (licenses_by_type.licenses[i].id == 2)
{
file_header_filter++;
if (file_header_filter > 3 && !full_license_report)
continue;
}
if (licenses_by_type.licenses[i].id == 4)
{
scancode_file_filter++;
if (scancode_file_filter > 3 && !full_license_report)
continue;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/license.c` around lines 561 - 574, The off-by-one stems from incrementing
file_header_filter and scancode_file_filter before checking; currently the code
uses ">=3" which skips the 3rd license. Update the conditions in the blocks
referencing licenses_by_type.licenses[i].id (the file_header_filter and
scancode_file_filter checks that also use full_license_report) to use ">3"
instead of ">=3" so the first three licenses are processed and only the 4th+ are
skipped when full_license_report is false.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/license.c`:
- Around line 294-295: The current check in src/license.c using
license_source_id and strstr(license, "LicenseRef") drops entire Scancode
records when any LicenseRef token appears; change it so only pure LicenseRef
entries are skipped and compound expressions keep their SPDX parts: in the block
that uses license_source_id, replace the broad strstr test with a stricter test
that returns buffer only when the whole license string is a single LicenseRef-*
token (e.g., matches start with "LicenseRef" and contains no other SPDX
operators or tokens), and for compound expressions (strings containing SPDX
operators like AND/OR/() or multiple tokens) strip/ignore only the LicenseRef-*
tokens when expanding/normalizing the expression instead of returning early;
update the logic around the current strncmp/strstr usage and any normalization
code that consumes license so it removes LicenseRef-* tokens before further
parsing.
- Around line 561-587: The loop updates file_header_filter,
scancode_file_filter, scanoss_license, and component_license regardless of
whether license_to_json actually emitted output, so enforce caps only on emitted
licenses by making emission observable and using that to drive continues and
counters: modify license_to_json (or add a helper like license_would_emit) to
return/indicate whether it will emit (a bool "emitted") without side effects,
then call that before applying the cap checks (or call license_to_json and use
its emitted return) and only increment file_header_filter/scancode_file_filter
or set scanoss_license/component_license when emitted is true; update the logic
around the id checks (licenses_by_type.licenses[i].id) and the early continue
conditions to reference emitted instead of assuming every iteration produced
output, using the existing variables file_header_filter, scancode_file_filter,
scanoss_license, component_license, full_license_report and the license_to_json
symbol to locate where to change behavior.
- Around line 333-346: The code currently treats the token after a "WITH"
operator as a license and generates a licenses URL; change the logic so that
when you encounter "WITH" (in the block referencing token, saveptr) you set a
flag (e.g., next_is_exception) and advance to the next token, then when emitting
the JSON entry (the code using token, buffer, len, first_entry, first_license)
use that flag to choose "https://spdx.org/exceptions/%s.html" instead of the
licenses path and reset the flag; ensure first_license is only set for actual
licenses (not exceptions) and clear the flag after use.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 043b8f7a-902d-45bc-a6ba-d473ec854e99

📥 Commits

Reviewing files that changed from the base of the PR and between 276f76a and 6cf3fe3.

📒 Files selected for processing (1)
  • src/license.c

Comment on lines +294 to 295
if (!strncmp(license_source_id, "scancode", 8) && strstr(license, "LicenseRef"))
return buffer;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't drop mixed SPDX expressions just because they contain LicenseRef.

This now skips the whole Scancode record for values like MIT AND LicenseRef-scancode-foo, so the valid SPDX part is lost too. Restrict the skip to pure LicenseRef-* entries, or ignore only the LicenseRef-* token when expanding compound expressions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/license.c` around lines 294 - 295, The current check in src/license.c
using license_source_id and strstr(license, "LicenseRef") drops entire Scancode
records when any LicenseRef token appears; change it so only pure LicenseRef
entries are skipped and compound expressions keep their SPDX parts: in the block
that uses license_source_id, replace the broad strstr test with a stricter test
that returns buffer only when the whole license string is a single LicenseRef-*
token (e.g., matches start with "LicenseRef" and contains no other SPDX
operators or tokens), and for compound expressions (strings containing SPDX
operators like AND/OR/() or multiple tokens) strip/ignore only the LicenseRef-*
tokens when expanding/normalizing the expression instead of returning early;
update the logic around the current strncmp/strstr usage and any normalization
code that consumes license so it removes LicenseRef-* tokens before further
parsing.

Comment on lines +333 to +346
/* Skip AND/OR/WITH operators */
if (strcmp(token, "AND") == 0 || strcmp(token, "OR") == 0 || strcmp(token, "WITH") == 0)
{
token = strtok_r(NULL, " ()", &saveptr);
continue;
}
if (!first_entry)
len += sprintf(buffer + len, ",");
else
{
strncpy(first_license, token, MAX_FIELD_LN - 1);
first_entry = false;
}
len += sprintf(buffer + len, "\"%s\": \"https://spdx.org/licenses/%s.html\"", token, token);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

WITH exceptions need SPDX /exceptions/ URLs.

In expressions like GPL-2.0-only WITH Classpath-exception-2.0, the token after WITH is an SPDX exception, not a license. This code currently emits spdx.org/licenses/... for it, which produces an invalid URL.

💡 Suggested fix
-		char *token = strtok_r(lic_copy, " ()", &saveptr);
+		char *token = strtok_r(lic_copy, " ()", &saveptr);
+		const char *spdx_kind = "licenses";
 
 		while (token)
 		{
 			/* Skip AND/OR/WITH operators */
-			if (strcmp(token, "AND") == 0 || strcmp(token, "OR") == 0 || strcmp(token, "WITH") == 0)
+			if (strcmp(token, "AND") == 0 || strcmp(token, "OR") == 0 || strcmp(token, "WITH") == 0)
 			{
+				spdx_kind = (strcmp(token, "WITH") == 0) ? "exceptions" : "licenses";
 				token = strtok_r(NULL, " ()", &saveptr);
 				continue;
 			}
 			if (!first_entry)
 				len += sprintf(buffer + len, ",");
@@
-			len += sprintf(buffer + len, "\"%s\": \"https://spdx.org/licenses/%s.html\"", token, token);
+			len += sprintf(buffer + len, "\"%s\": \"https://spdx.org/%s/%s.html\"", token, spdx_kind, token);
+			spdx_kind = "licenses";
 			token = strtok_r(NULL, " ()", &saveptr);
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/license.c` around lines 333 - 346, The code currently treats the token
after a "WITH" operator as a license and generates a licenses URL; change the
logic so that when you encounter "WITH" (in the block referencing token,
saveptr) you set a flag (e.g., next_is_exception) and advance to the next token,
then when emitting the JSON entry (the code using token, buffer, len,
first_entry, first_license) use that flag to choose
"https://spdx.org/exceptions/%s.html" instead of the licenses path and reset the
flag; ensure first_license is only set for actual licenses (not exceptions) and
clear the flag after use.

src/license.c Outdated
Comment on lines +561 to +587
//file header license and scancode_file liceses are limited to a maximum of 3.
if (licenses_by_type.licenses[i].id == 2)
{
file_header_filter++;
if (file_header_filter >=3 && !full_license_report)
continue;
}

if (licenses_by_type.licenses[i].id == 4)
{
scancode_file_filter++;
if (scancode_file_filter >=3 && !full_license_report)
continue;
}

if (licenses_by_type.licenses[i].id == 5 && scanoss_license && !full_license_report)
continue;

buffer = license_to_json(crclist, buffer, licenses_by_type.licenses[i].text, licenses_by_type.licenses[i].id, &first);
//just report component license if available
if (licenses_by_type.licenses[i].id == 0 && !first)
component_license = true;

else if (licenses_by_type.licenses[i].id > 0 && !first)
scanoss_license = true;

if (i > 0 && component_license && !full_license_report)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Count only emitted licenses when enforcing caps and break conditions.

file_header_filter, scancode_file_filter, scanoss_license, and component_license are updated per iteration, but license_to_json() can no-op for duplicates, short strings, or skipped Scancode LicenseRef values. That means non-emitted entries still consume the quota and can suppress later valid licenses.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/license.c` around lines 561 - 587, The loop updates file_header_filter,
scancode_file_filter, scanoss_license, and component_license regardless of
whether license_to_json actually emitted output, so enforce caps only on emitted
licenses by making emission observable and using that to drive continues and
counters: modify license_to_json (or add a helper like license_would_emit) to
return/indicate whether it will emit (a bool "emitted") without side effects,
then call that before applying the cap checks (or call license_to_json and use
its emitted return) and only increment file_header_filter/scancode_file_filter
or set scanoss_license/component_license when emitted is true; update the logic
around the id checks (licenses_by_type.licenses[i].id) and the early continue
conditions to reference emitted instead of assuming every iteration produced
output, using the existing variables file_header_filter, scancode_file_filter,
scanoss_license, component_license, full_license_report and the license_to_json
symbol to locate where to change behavior.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
src/license.c (3)

567-570: ⚠️ Potential issue | 🟠 Major

Check the id boundary before emitting, and only advance last_id on actual output.

As written, the first record from the next id group is already appended before the break runs. Also, a skipped/non-emitted record can still change last_id and prematurely terminate the loop.

💡 Suggested fix
 	for (int i = 0; i < licenses_by_type.count; i++)
 	{
-		buffer = license_to_json(crclist, buffer, licenses_by_type.licenses[i].text, licenses_by_type.licenses[i].id, &first);
-		if (last_id >= 0 && last_id != licenses_by_type.licenses[i].id && !first && !full_license_report)
+		if (last_id >= 0 && last_id != licenses_by_type.licenses[i].id && !first && !full_license_report)
 			break;
-		last_id = licenses_by_type.licenses[i].id;
+
+		char *next = license_to_json(crclist, buffer, licenses_by_type.licenses[i].text, licenses_by_type.licenses[i].id, &first);
+		if (next != buffer)
+			last_id = licenses_by_type.licenses[i].id;
+		buffer = next;
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/license.c` around lines 567 - 570, The loop currently calls
license_to_json(...) before checking the id boundary and always updates last_id,
which lets a non-emitted record change last_id and can append the first record
of the next id group before breaking; fix by testing the boundary (compare
licenses_by_type.licenses[i].id against last_id and respect full_license_report)
before calling license_to_json so you never append from the next id group, and
only assign last_id = licenses_by_type.licenses[i].id after license_to_json
actually emitted output (i.e., after observing the first flag/return indicating
data was written).

342-355: ⚠️ Potential issue | 🟠 Major

Use SPDX /exceptions/ URLs after WITH.

The token after WITH is an SPDX exception, so emitting it under /licenses/ produces an invalid URL. Track WITH and switch the next token to https://spdx.org/exceptions/...; only actual licenses should go under /licenses/.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/license.c` around lines 342 - 355, The code currently treats "WITH" like
a skip and misclassifies the following token as a license (producing /licenses/
URLs); change this by tracking when you see the "WITH" operator (e.g., bool
saw_with = false), and when token == "WITH" set saw_with = true and advance to
the next token via strtok_r; for that next token use the exceptions path
("https://spdx.org/exceptions/%s.html") instead of the licenses path when
building the buffer (then reset saw_with = false); keep existing handling of
"AND"/"OR" as skips and preserve first_entry/first_license logic when adding
entries to buffer.

303-304: ⚠️ Potential issue | 🟠 Major

Don't drop mixed SPDX expressions just because they contain LicenseRef.

This still skips the whole Scancode record for values like MIT AND LicenseRef-scancode-foo, so the valid SPDX part is lost too. Restrict the fast-path skip to pure LicenseRef-* entries, or strip only the LicenseRef-* token when expanding compound expressions.

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/license.c`:
- Around line 567-570: The loop currently calls license_to_json(...) before
checking the id boundary and always updates last_id, which lets a non-emitted
record change last_id and can append the first record of the next id group
before breaking; fix by testing the boundary (compare
licenses_by_type.licenses[i].id against last_id and respect full_license_report)
before calling license_to_json so you never append from the next id group, and
only assign last_id = licenses_by_type.licenses[i].id after license_to_json
actually emitted output (i.e., after observing the first flag/return indicating
data was written).
- Around line 342-355: The code currently treats "WITH" like a skip and
misclassifies the following token as a license (producing /licenses/ URLs);
change this by tracking when you see the "WITH" operator (e.g., bool saw_with =
false), and when token == "WITH" set saw_with = true and advance to the next
token via strtok_r; for that next token use the exceptions path
("https://spdx.org/exceptions/%s.html") instead of the licenses path when
building the buffer (then reset saw_with = false); keep existing handling of
"AND"/"OR" as skips and preserve first_entry/first_license logic when adding
entries to buffer.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6ce1e145-0e5a-4b1a-b220-879e183108bd

📥 Commits

Reviewing files that changed from the base of the PR and between 6cf3fe3 and 692cae7.

📒 Files selected for processing (1)
  • src/license.c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants