Skip to content

add FileContentsLimit feature to file contents service#78

Merged
mscasso-scanoss merged 7 commits intomainfrom
mscasso/feat/sp-4187-add-file-contents-limit
Apr 8, 2026
Merged

add FileContentsLimit feature to file contents service#78
mscasso-scanoss merged 7 commits intomainfrom
mscasso/feat/sp-4187-add-file-contents-limit

Conversation

@mscasso-scanoss
Copy link
Copy Markdown
Contributor

@mscasso-scanoss mscasso-scanoss commented Mar 31, 2026

Summary by CodeRabbit

  • New Features

    • Added a configurable file-contents size limit (default 50 MB). Requests exceeding the configured limit now return HTTP 413 with a JSON error describing the size violation.
  • Tests

    • Added automated tests covering allowed, disabled (0), and exceeded limit behaviors.
  • Documentation

    • Updated changelog to document the new configurable file-contents limit.

@mscasso-scanoss mscasso-scanoss self-assigned this Mar 31, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 31, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a configurable file-contents size limit: config files updated, server config gains Scanning.FileContentsLimit (env: SCANOSS_FILE_CONTENTS_LIMIT, default 50 MB), the file-contents API enforces the limit (returns HTTP 413 JSON on exceed), and tests + fixture were added to validate behavior.

Changes

Cohort / File(s) Summary
Configuration
config/app-config-dev.json, config/app-config-prod.json
Added Scanning.FileContentsLimit: 50 to dev and prod configs.
Server configuration
pkg/config/server_config.go
Added FileContentsLimit int64 \env:"SCANOSS_FILE_CONTENTS_LIMIT"`toServerConfig.Scanning; default initialized to 50`.
Service implementation
pkg/service/filecontents_service.go
After generating scan output, compute limitBytes from config (MB→bytes); if limitBytes > 0 and len(output) > limitBytes, log a warning and return HTTP 413 with JSON {"error":...} (skipping charset detection and normal text/plain response); otherwise continue existing flow.
Service tests
pkg/service/filecontents_service_test.go
Added fileContentsLimit table cases and TestFileContentsLimitExceeded asserting 413 + JSON error payload; tests set myConfig.Scanning.FileContentsLimit per case.
Test support script
test-support/scanoss.sh
Added special MD5 case (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa) that streams ~1,100,000 bytes to simulate oversized file contents for testing.
Lint suppression
pkg/service/utils_service_test.go
Added //nolint:unparam to newReq docblock.
Changelog
CHANGELOG.md
Replaced Unreleased with ## [1.6.6] - 2026-04-07 and added ### Added entry documenting configurable SCANOSS_FILE_CONTENTS_LIMIT (default 50 MB) and behavior for exceeded requests.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Client
  participant API as APIService
  participant Config as Config
  participant Script as TestSupport

  Client->>API: GET /file-contents?md5=<md5>
  API->>Config: read Scanning.FileContentsLimit (MB)
  API->>Script: request file contents for <md5>
  Script-->>API: stream file contents (bytes)
  API->>API: compute len(output) and limitBytes (MB->bytes)
  alt output > limitBytes and limitBytes > 0
    API-->>Client: 413 Request Entity Too Large (application/json) {"error":"...exceeds the maximum allowed limit..."}
  else
    API-->>Client: 200 OK (original content response with detected charset/headers)
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I counted bytes beneath the moonlit code,
Fifty-meg carrots keep our payloads owed,
If bytes balloon past the gentle brim,
I hop back, send JSON, tidy and trim,
A happy rabbit guards the transfer road.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately and concisely describes the main change: adding a FileContentsLimit feature to the file contents service, which is reflected throughout all modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch mscasso/feat/sp-4187-add-file-contents-limit

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
pkg/service/filecontents_service_test.go (1)

121-121: Avoid unconditional zero-value override of FileContentsLimit in table cases.

Cases that omit fileContentsLimit currently force it to 0, which changes semantics from config-default behavior and narrows coverage unintentionally.

♻️ Suggested refactor
 	tests := []struct {
 		name              string
 		input             map[string]string
 		binary            string
 		telemetry         bool
-		fileContentsLimit int64
+		fileContentsLimit *int64
 		want              int
 	}{
@@
-			myConfig.Scanning.FileContentsLimit = test.fileContentsLimit
+			if test.fileContentsLimit != nil {
+				myConfig.Scanning.FileContentsLimit = *test.fileContentsLimit
+			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/service/filecontents_service_test.go` at line 121, The test
unconditionally sets myConfig.Scanning.FileContentsLimit =
test.fileContentsLimit which forces a zero-value override for cases that
intended to rely on the default; change the test table and assignment so the
override is applied only when the case explicitly specifies a value (e.g. make
test.fileContentsLimit a *int or add a boolean like fileContentsLimitSet) and
set myConfig.Scanning.FileContentsLimit only when that pointer/flag is
non-nil/true, leaving the default otherwise; update all table entries
accordingly to preserve semantics and test coverage for default behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/service/filecontents_service.go`:
- Around line 85-96: The current guard checks limitBytes after using
exec.CommandContext(...).Output(), which buffers full stdout and can cause
memory pressure; change the implementation that uses exec.CommandContext to call
cmd.StdoutPipe(), start the command with cmd.Start(), then read the stdout
through an io.LimitedReader (or read and count bytes while copying into a buffer
up to limitBytes+1) so you can detect overflow before allocating the full
response; if the read exceeds s.config.Scanning.FileContentsLimit (limitBytes)
return the 413 response (as in the existing block that references limitBytes,
md5, ContentTypeKey, ApplicationJSON) and ensure you call cmd.Wait() and handle
command errors properly.

---

Nitpick comments:
In `@pkg/service/filecontents_service_test.go`:
- Line 121: The test unconditionally sets myConfig.Scanning.FileContentsLimit =
test.fileContentsLimit which forces a zero-value override for cases that
intended to rely on the default; change the test table and assignment so the
override is applied only when the case explicitly specifies a value (e.g. make
test.fileContentsLimit a *int or add a boolean like fileContentsLimitSet) and
set myConfig.Scanning.FileContentsLimit only when that pointer/flag is
non-nil/true, leaving the default otherwise; update all table entries
accordingly to preserve semantics and test coverage for default behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ac9b1845-0e80-45e6-b5f5-8a406125758b

📥 Commits

Reviewing files that changed from the base of the PR and between 7588ae0 and f887629.

📒 Files selected for processing (6)
  • config/app-config-dev.json
  • config/app-config-prod.json
  • pkg/config/server_config.go
  • pkg/service/filecontents_service.go
  • pkg/service/filecontents_service_test.go
  • test-support/scanoss.sh

Copy link
Copy Markdown
Contributor

@agustingroh agustingroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please update the CHANGELOG.md file with latest changes

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/service/filecontents_service.go`:
- Line 86: The comment on the line describing FileContentsLimit is missing a
space after the '//' and breaks gocritic commentFormatting; update the comment
to start with "// " (for example "// unlimited for FileContentsLimit <= 0") so
the linter passes and the comment is properly formatted near the
FileContentsLimit reference in filecontents_service.go.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6f70509f-fffd-4c33-807b-74ae61f1fe8c

📥 Commits

Reviewing files that changed from the base of the PR and between 950c69b and cb16812.

📒 Files selected for processing (2)
  • CHANGELOG.md
  • pkg/service/filecontents_service.go
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.md

http.Error(w, "ERROR recovering file contents", http.StatusInternalServerError)
return
}
limitBytes := s.config.Scanning.FileContentsLimit * 1024 * 1024
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be computed at initial startup?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To simplify the user's life, the limit is expressed in MB in the configuration file. Here I'm just adjusting the units for the comparison. I think it is clearer from a code reading perspective to do it here

}
limitBytes := s.config.Scanning.FileContentsLimit * 1024 * 1024
// unlimited for FileContentsLimit <= 0
if limitBytes > 0 && int64(len(output)) > limitBytes {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

len(output) should only be calculated once

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

zs.Warnf("File contents size %d bytes exceeds limit %d MB for md5 %s", len(output), s.config.Scanning.FileContentsLimit, md5)
w.Header().Set(ContentTypeKey, ApplicationJSON)
w.WriteHeader(http.StatusRequestEntityTooLarge)
resp := map[string]string{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably ok to just return this in plain text?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the log entry. The user will receive the error in json format.

CHANGELOG.md Outdated
### Added
- Added configurable file contents size limit (`SCANOSS_FILE_CONTENTS_LIMIT`).
- Limits the maximum file size returned by the `file_contents` endpoint (default: 50 MB).
- Returns HTTP 400 when the file exceeds the configured limit.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add in the exact error code

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
pkg/service/filecontents_service.go (1)

85-85: ⚠️ Potential issue | 🟠 Major

Fix gocritic comment formatting at Line 85 (CI is failing).

The comment must include a space after // to satisfy commentFormatting.

Suggested fix
-	//Convert the configured limit from MB to bytes.
+	// Convert the configured limit from MB to bytes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/service/filecontents_service.go` at line 85, There is a gocritic
commentFormatting failure for the inline comment "//Convert the configured limit
from MB to bytes." — update that comment to include a space after the slashes
(e.g. "// Convert the configured limit from MB to bytes.") wherever it appears
in pkg/service/filecontents_service.go (near the code handling file size limit
logic) so CI's commentFormatting check passes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@pkg/service/filecontents_service.go`:
- Line 85: There is a gocritic commentFormatting failure for the inline comment
"//Convert the configured limit from MB to bytes." — update that comment to
include a space after the slashes (e.g. "// Convert the configured limit from MB
to bytes.") wherever it appears in pkg/service/filecontents_service.go (near the
code handling file size limit logic) so CI's commentFormatting check passes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 15daa80f-2c50-4aaa-ba85-a8601d5b2a9d

📥 Commits

Reviewing files that changed from the base of the PR and between 896f09d and 8815788.

📒 Files selected for processing (2)
  • CHANGELOG.md
  • pkg/service/filecontents_service.go
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.md

@mscasso-scanoss mscasso-scanoss force-pushed the mscasso/feat/sp-4187-add-file-contents-limit branch from 8815788 to 95a74c5 Compare April 7, 2026 10:40
@eeisegn eeisegn self-requested a review April 7, 2026 17:22
Copy link
Copy Markdown
Contributor

@eeisegn eeisegn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update the error code in the change log to 413 before merging

@mscasso-scanoss mscasso-scanoss merged commit 4810902 into main Apr 8, 2026
3 checks passed
@mscasso-scanoss mscasso-scanoss deleted the mscasso/feat/sp-4187-add-file-contents-limit branch April 8, 2026 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants