feat: add dependency scanning by matiasdaloia · Pull Request #24 · scanoss/crypto-finder

matiasdaloia · 2026-03-30T10:15:17Z

Summary by CodeRabbit

New Features
- Dependency scanning across Go/Java/Python/Rust with parallel scans, dependency attribution, Java runtime selection, and finding-centric call‑graph export; CLI flags to enable/export these scans.
Documentation
- Comprehensive dependency‑scanning guide, updated usage and Docker image docs, README and CONTRIBUTING updates with lint/install guidance.
Chores
- New dedicated dependency‑scanning Docker image, pinned tooling versions, CI/lint installer changes, and updated container base images.
Output
- Interim results schema bumped to v1.3 and new call‑graph JSON export format; stable short finding IDs added.

feat: execute dependency scans in parallel, fix rule filtering feat: improve dependency scanning output, support multiple call chains for the same finding chore: add testdata for dependency scanning feat: add java parser feat: add --export-callgraph flag for scanning dependencies fix: lint errors feat(deps-scan): add multi-ecosystem dependency scanning, callgraph support, and deps Docker images - add dependency scanning for Python and Rust, plus Cargo resolver support - new pip-based resolver and tests - new cargo metadata resolver and tests - extend resolver model with workspace members for monorepo/workspace projects - expand callgraph capabilities - add Python and Rust tree-sitter parsers and tests - register Python/Rust parsers in parser registry - support richer callchain tracing and report merging for dependency findings - refactor scan command internals - extract scan-specific utilities into internal/scan: - validation, duration parsing, ecosystem detection - callgraph export, report counting, summary rendering - centralize flag validation rules (including export-callgraph dependency check) - simplify internal/cli/scan.go orchestration paths - harden IO/error handling and lint compliance across engine/dependency/callgraph - handle close/remove/write errors consistently - tighten file/dir permissions where needed - reduce copy-heavy loops, preallocate slices, and address complexity/style findings - ensure make lint runs with local cache paths reliably - add/adjust container and release support for dependency scanning - add Dockerfile.deps and Dockerfile.goreleaser.deps - update Dockerfile/Dockerfile.slim and goreleaser config for deps variants - update Make targets for deps image workflows - update docs and release notes - README and docs/DOCKER_USAGE.md for latest-deps usage and dependency scanning guidance - CHANGELOG entries for 0.3.0 dependency-scanning additions/fixes chore: update scanoss.json chore: fix lint errors, increase test coverage chore(tests): add more coverage fix(tests): unit tests and lint errors

github-actions · 2026-03-30T10:15:54Z

🔍 SCANOSS Code Similarity Detected

📄 1 snippet matches found

🔗 View detailed findings on commit 5bd9ced

Files with similarities:

internal/callgraph/text_helpers.go

💡 Click the commit link above to see detailed annotations for each match.

github-actions · 2026-03-30T10:15:58Z

SCANOSS SCAN Completed 🚀

Detected components: 5
Undeclared components: 1
Declared components: 4
Detected files: 61
Detected files undeclared: 1
Detected files declared: 60
Licenses detected: 5
Licenses detected with copyleft: 3
Policies: ❌ 1 fail (1 total)

View more details on SCANOSS Action Summary

coderabbitai · 2026-03-30T10:16:41Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a dependency-scanning subsystem: multi-ecosystem resolvers, tree-sitter language parsers, a call‑graph builder/tracer with Java bytecode enrichment and caches, CLI/config and Docker support for dependency images, interim output schema updates, extensive tests and documentation, plus CI/Makefile adjustments.

Changes

Cohort / File(s)	Summary
CI / Lint / Repo metadata `/.github/workflows/lint.yml`, `/.golangci-lint-version`, `/.gitignore`, `CONTRIBUTING.md`, `go.mod`	Pin golangci-lint via `.golangci-lint-version` and install-mode=binary; ignore `bin/` and `demo/`; add `make lint-install` docs; promote/add tree-sitter & YAML deps.
Build & Release / Docker `Dockerfile`, `Dockerfile.slim`, `Dockerfile.test`, `Dockerfile.goreleaser`, `Dockerfile.goreleaser.deps`, `Dockerfile.deps`, `.goreleaser.yml`, `Makefile`	Upgrade Go base to 1.26, add native toolchain packages, pin OpenGrep installer v1.12.1, add multi-arch `-deps` images and goreleaser manifest entries; add Make targets for lint-install, docker-build-deps, docker-scan-deps and push of deps tags.
Docs, Changelog & README `CHANGELOG.md`, `README.md`, `docs/*`, `docs/DEPENDENCY_SCANNING.md`, `docs/DOCKER_USAGE.md`, `docs/CONFIGURATION.md`, `docs/OUTPUT_FORMATS.md`	Add comprehensive dependency-scanning docs, configuration/JDK guidance, Docker usage for `latest-deps`, CLI flags docs, and interim output schema v1.3 documentation.
Callgraph Core Types & Builder `internal/callgraph/types.go`, `internal/callgraph/builder.go`, `internal/callgraph/tracer.go`, `internal/callgraph/text_helpers.go`, `internal/callgraph/parser_options.go`, `internal/callgraph/parser_registry.go`	Introduce callgraph package types, Parser/TypeResolver interfaces, Builder with multi-phase parse/type resolution (incl. fluent-chain resolution), Tracer for BFS backtraces, argument parsing helpers, parser options and factory registry.
Language Parsers (tree-sitter) `internal/callgraph/go_parser.go`, `internal/callgraph/java_parser.go`, `internal/callgraph/python_parser.go`, `internal/callgraph/rust_parser.go`	Add Go/Java/Python/Rust parsers extracting imports, functions/methods, calls, arguments and provenance. Java parser includes constructors, reflection heuristics and overload disambiguation.
Java Bytecode Resolver & Cache `internal/callgraph/java_type_resolver.go`, `internal/callgraph/bytecode_cache.go`, `internal/callgraph/*_test.go`	Add Java bytecode indexing/resolution (JAR/.class parsing), type-index merging, platform signature handling, and a disk-backed bytecode index cache with schema promotion and tests.
Dependency Framework & Resolvers `internal/dependency/resolver.go`, `internal/dependency/registry.go`, `internal/dependency/go_resolver.go`, `internal/dependency/maven_resolver.go`, `internal/dependency/cargo_resolver.go`, `internal/dependency/pip_resolver.go`	New dependency package, registry, and resolvers for Go/Maven/Cargo/Pip that shell out to ecosystem tools, build graphs, fetch/extract sources and support multi-tier fallbacks.
Source Cache & Findings Cache `internal/dependency/source_cache.go`, `internal/engine/findings_cache.go`, `internal/dependency/source_cache_test.go`, `internal/engine/findings_cache_test.go`	Filesystem-backed SourceCache for ZIP/JAR extraction (zip‑slip protection, size limits) and DiskFindingsCache for interim-report caching with atomic writes, version envelope, corruption handling, and rules-hash utility.
Dependency Scanner & Engine Integration `internal/engine/dependency_scanner.go`, `internal/engine/orchestrator.go`, `internal/engine/*`	Add DependencyScanner to resolve and scan dependencies in parallel, merge/attribute dependency findings, assign stable FindingIDs, and wire into Orchestrator via extended ScanOptions (RulePaths, JavaRuntime, cache token).
Scan Utilities & Export `internal/scan/doc.go`, `internal/scan/export.go`, `internal/scan/duration.go`, `internal/scan/ecosystem.go`, `internal/scan/root_module.go`, `internal/scan/report.go`	Helpers for ecosystem detection, duration parsing (supports d/w suffixes), root-module detection, finding counting, and a finding-centric call‑graph exporter (`ExportCallGraph`) that builds per-finding call-chains.
CLI, Config & Java runtime helpers `internal/cli/scan.go`, `internal/cli/root.go`, `internal/config/config.go`, `internal/javaruntime/config.go`	Add CLI flags/state and orchestration for dependency scanning and call-graph export; config fields and env bindings for Java JDK major/homes; new `javaruntime` package to normalize/resolve Java runtimes and produce cache tokens.
Entities & Output `internal/entities/interim.go`, `internal/output/json.go`, `internal/engine/finding_ids_test.go`	Bump InterimFormatVersion 1.1→1.3; remove `match_type`; add `source`, `dependency_info`, and `finding_id` to CryptographicAsset; switch JSON writer to Encoder with HTML escaping disabled.
Tests & Coverage many `internal/_test.go`, `internal/callgraph/_test.go`	Large set of new and updated tests covering parsers, resolvers, builder, caches, scanner, orchestrator, javaruntime, rule filtering, and utilities.
Minor Fixes & Misc `internal/api/client.go`, `internal/deduplicator/deduplicator.go`, `internal/errors/formatter.go`	Add `#nosec` suppression for HTTP Do, fix range iteration to use element pointer in deduplicator grouping, and micro-optimizations in multi-error formatting.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI
    participant Orch as Orchestrator
    participant Resolver as DepResolver
    participant Cache as FindingsCache
    participant Scanner as DependencyScanner
    participant Parser as Parser (per-ecosystem)
    participant CGBuilder as CallGraphBuilder
    participant Tracer as Tracer
    participant Exporter as Exporter

    CLI->>Orch: start scan (with opts incl. JavaRuntime, RulePaths, export)
    Orch->>Resolver: DetectEcosystem + Resolve(target)
    Resolver-->>Orch: ResolveResult (deps, graph)
    Orch->>Cache: ComputeRulesHash & Get(cacheKey)
    alt cache hit
        Cache-->>Orch: cached DepScanResult
    else cache miss
        Orch->>Scanner: ScanWithDependencies(userReport, opts)
        Scanner->>Resolver: resolve per-dependency sources
        Scanner->>Parser: parse sources (per-ecosystem)
        Parser-->>CGBuilder: FileAnalyses
        CGBuilder-->>Scanner: CallGraph (optionally enriched by TypeResolver)
        Scanner->>Cache: Put(cacheKey, DepScanResult)
    end
    alt export requested
        Orch->>Exporter: ExportCallGraph(path, format, DepScanResult)
        Exporter->>Tracer: TraceBack(crypto calls -> user entries)
        Tracer-->>Exporter: Call chains
        Exporter-->>Orch: write export
    end
    Orch-->>CLI: interim report / artifacts

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

feat: add OID mapping #18 — Modifies interim report schema and related asset handling; strong code-level overlap with internal/entities/interim.go.
feat: add deduplication logic #16 — Touches deduplication and interim-report processing; related to internal/deduplicator changes.

Poem

🐇
I hop through code and parse each tree,
chase jars and imports, one, two, three,
I cache my finds and stitch each chain,
hop, test, and build the deps terrain,
nibbling bytes till traces flee.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feature/mdaloia/add-parallel-jar-processing

coderabbitai

Actionable comments posted: 20

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

CONTRIBUTING.md (1)
127-136: ⚠️ Potential issue | 🟡 Minor

Mirror make lint-install in the other workflow snippets.

These lines add the new bootstrap step, but the earlier setup and PR checklist sections still tell contributors to run make lint directly. The guide stays internally inconsistent, so first-time contributors will keep missing the pinned binary step.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CONTRIBUTING.md` around lines 127 - 136, Update the CONTRIBUTING.md sections
that still instruct contributors to run only "make lint" (specifically the
earlier "setup" and "PR checklist" passages) to mirror the new bootstrap step by
adding "make lint-install" before "make lint"; ensure both snippets and any
checklist items mention running "make lint-install" (to install the pinned
golangci-lint) followed by "make lint" so the process matches the bootstrap step
already shown.
internal/output/json.go (1)
71-93: ⚠️ Potential issue | 🟡 Minor

Double newline on stdout output.

json.Encoder.Encode() automatically appends a trailing newline to its output. Lines 91-93 add another newline, resulting in double newlines when writing to stdout. The file output path doesn't have this issue since it writes data directly.
Proposed fix
 	if destination == "" || destination == "-" {
 		// Write to stdout
 		if _, err := os.Stdout.Write(data); err != nil {
 			return fmt.Errorf("failed to write to stdout: %w", err)
 		}
-		// Add newline for better terminal output
-		if _, err := os.Stdout.WriteString("\n"); err != nil {
-			return fmt.Errorf("failed to write newline to stdout: %w", err)
-		}
 	} else {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/output/json.go` around lines 71 - 93, The code calls
enc.Encode(report) which already appends a trailing newline, then writes
buf.Bytes() to stdout and calls os.Stdout.WriteString("\n") causing a double
blank line; remove the extra newline write by deleting the
os.Stdout.WriteString("\n") call (or alternatively trim a single trailing
newline from data before the second write), updating the stdout branch where
enc.Encode, data := buf.Bytes(), os.Stdout.Write(data) and os.Stdout.WriteString
are used so only one newline is emitted.

🟡 Minor comments (13)

internal/callgraph/go_parser.go-86-89 (1)

86-89: ⚠️ Potential issue | 🟡 Minor

Don't hide Go parse failures.

If ParseFile errors here, the file is skipped with no warning. Unsupported syntax or read failures then look like "no calls found", which is very hard to distinguish from a real clean scan.

💡 Proposed fix

 import (
 	"context"
 	"fmt"
 	"os"
 	"path/filepath"
 	"strings"
 
+	"github.com/rs/zerolog/log"
 	sitter "github.com/smacker/go-tree-sitter"
 	"github.com/smacker/go-tree-sitter/golang"
 )
@@
 		fullPath := filepath.Join(dir, name)
 		analysis, err := p.ParseFile(fullPath, packagePath)
 		if err != nil {
-			// Log and skip files that can't be parsed
+			log.Warn().Err(err).Str("path", fullPath).Msg("Skipping Go file during callgraph parsing")
 			continue
 		}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/callgraph/go_parser.go` around lines 86 - 89, ParseFile failures are
silently ignored; change the p.ParseFile error branch (the call to
p.ParseFile(fullPath, packagePath)) to log the error (including fullPath,
packagePath and err) before continuing so parse issues aren't hidden; update the
handler to use the existing logger (e.g., p.logger or the package log) to emit a
warning/error with context and the err value, then continue as before.

CHANGELOG.md-116-117 (1)

116-117: ⚠️ Potential issue | 🟡 Minor

Version link references are out of order.

The [0.3.0] link reference should come after [0.2.4] to maintain chronological ordering consistent with lines 106-115.

Proposed fix

-[0.3.0]: https://github.com/scanoss/crypto-finder/compare/v0.2.4...v0.3.0
 [0.2.4]: https://github.com/scanoss/crypto-finder/compare/v0.2.3...v0.2.4
+[0.3.0]: https://github.com/scanoss/crypto-finder/compare/v0.2.4...v0.3.0

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG.md` around lines 116 - 117, The two release link references in
CHANGELOG.md are reversed; move the `[0.3.0]:
https://github.com/scanoss/crypto-finder/compare/v0.2.4...v0.3.0` line so it
appears after the `[0.2.4]:
https://github.com/scanoss/crypto-finder/compare/v0.2.3...v0.2.4` line to
restore chronological ordering (ensure the `[0.2.4]` line precedes the `[0.3.0]`
line).

internal/callgraph/rust_parser_test.go-125-143 (1)

125-143: ⚠️ Potential issue | 🟡 Minor

Handle os.WriteFile errors to avoid silent test failures.

The os.WriteFile calls on lines 129, 131, and 133 ignore errors. If file creation fails, the test will pass incorrectly since no files would exist.

🛡️ Proposed fix

 	// Regular file
-	os.WriteFile(filepath.Join(dir, "lib.rs"), []byte("fn foo() {}"), 0o644)
+	if err := os.WriteFile(filepath.Join(dir, "lib.rs"), []byte("fn foo() {}"), 0o644); err != nil {
+		t.Fatal(err)
+	}
 	// Test file — should be skipped
-	os.WriteFile(filepath.Join(dir, "lib_test.rs"), []byte("fn test_foo() {}"), 0o644)
+	if err := os.WriteFile(filepath.Join(dir, "lib_test.rs"), []byte("fn test_foo() {}"), 0o644); err != nil {
+		t.Fatal(err)
+	}
 	// Another test file
-	os.WriteFile(filepath.Join(dir, "tests.rs"), []byte("fn test_bar() {}"), 0o644)
+	if err := os.WriteFile(filepath.Join(dir, "tests.rs"), []byte("fn test_bar() {}"), 0o644); err != nil {
+		t.Fatal(err)
+	}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/callgraph/rust_parser_test.go` around lines 125 - 143, The test
TestRustParser_SkipTestFiles ignores errors from os.WriteFile which can hide
setup failures; update each os.WriteFile call (for "lib.rs", "lib_test.rs",
"tests.rs") to capture its returned error and call t.Fatalf (or t.Fatalf-like
failure) if non-nil so the test fails immediately on file creation errors;
locate these calls in TestRustParser_SkipTestFiles and add error handling for
each write.

docs/DEPENDENCY_SCANNING.md-596-607 (1)

596-607: ⚠️ Potential issue | 🟡 Minor

Use unique finding_id values in this example.

The sample reuses a1b2c3d4 for two different assets, which conflicts with the earlier “join key back to the call graph export” explanation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/DEPENDENCY_SCANNING.md` around lines 596 - 607, The example JSON reuses
the same finding_id ("a1b2c3d4") for multiple assets which breaks the join key
semantics; update the snippet so each asset object has a unique finding_id
(e.g., different UUID-like strings) to match the “join key back to the call
graph export” behavior, ensuring every object with fields like "start_line",
"match", "rules", "status" and "metadata" uses a distinct finding_id value.

docs/DEPENDENCY_SCANNING.md-188-188 (1)

188-188: ⚠️ Potential issue | 🟡 Minor

Add languages to the unlabeled fenced blocks.

markdownlint is already flagging these openings. Tag them as text, json, go, etc. so the doc stays lint-clean.

Also applies to: 328-328, 389-389, 415-415, 426-426, 435-435, 452-452, 482-482, 520-520, 546-546, 565-565, 629-629, 710-710, 719-719, 730-730, 820-820, 953-953, 968-968, 1005-1005
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/DEPENDENCY_SCANNING.md` at line 188, The markdown contains unlabeled
fenced code blocks (plain ``` openings) that lint rules flag; update each
unlabeled triple-backtick block (e.g., the occurrences indicated around the
document) to include an appropriate language tag such as text, json, go, or bash
so linting passes (replace ``` with ```text or ```json/```go/```bash as
appropriate), and ensure all other unlabeled blocks listed in the comment are
similarly annotated.

docs/DEPENDENCY_SCANNING.md-66-73 (1)

66-73: ⚠️ Potential issue | 🟡 Minor

The worker-pool default here is stale.

The diagram hard-codes default: 4 goroutines, but the CLI flag/help text describes a dynamic default of half the CPU cores capped at 8. This example will be wrong on most machines.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/DEPENDENCY_SCANNING.md` around lines 66 - 73, The Mermaid diagram's
Worker Pool node currently hard-codes "default: 4 goroutines" which is stale;
update the Pool node (Pool["Worker Pool<br/><i>default: 4 goroutines</i>"]) to
reflect the real dynamic default used by the CLI (half the CPU cores, capped at
8) — e.g. replace the hard-coded text with "default: half of CPU cores (capped
at 8)" or the equivalent expression "default: min(8, numCPU/2)" so the diagram
stays accurate.

internal/callgraph/python_parser_test.go-166-169 (1)

166-169: ⚠️ Potential issue | 🟡 Minor

Check the fixture write errors in test setup.

If one of these files fails to write, the test will fail later with a misleading parser assertion instead of the actual setup error.

Suggested fix

-	os.WriteFile(filepath.Join(dir, "crypto.py"), []byte("def foo(): pass"), 0o644)
+	if err := os.WriteFile(filepath.Join(dir, "crypto.py"), []byte("def foo(): pass"), 0o644); err != nil {
+		t.Fatal(err)
+	}
 	// Test files — should be skipped
-	os.WriteFile(filepath.Join(dir, "test_crypto.py"), []byte("def test_foo(): pass"), 0o644)
-	os.WriteFile(filepath.Join(dir, "crypto_test.py"), []byte("def test_bar(): pass"), 0o644)
+	if err := os.WriteFile(filepath.Join(dir, "test_crypto.py"), []byte("def test_foo(): pass"), 0o644); err != nil {
+		t.Fatal(err)
+	}
+	if err := os.WriteFile(filepath.Join(dir, "crypto_test.py"), []byte("def test_bar(): pass"), 0o644); err != nil {
+		t.Fatal(err)
+	}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/callgraph/python_parser_test.go` around lines 166 - 169, The
os.WriteFile calls that create the fixture files ("crypto.py", "test_crypto.py",
"crypto_test.py") should check their returned errors and fail the test
immediately instead of letting later parser assertions surface; update each
os.WriteFile invocation used in the test setup to capture its error and call
t.Fatalf or t.Fatalf-like helper with a descriptive message (including the
filename and error) when non-nil, so setup write failures are reported clearly
(update the WriteFile calls that create "crypto.py", "test_crypto.py", and
"crypto_test.py").

internal/dependency/source_cache.go-22-28 (1)

22-28: ⚠️ Potential issue | 🟡 Minor

Comment doesn't match implementation.

The comment on line 22 states ~/.crypto-finder/cache/sources/ but the actual path created on line 28 is ~/.scanoss/crypto-finder/cache/sources/. Please update the comment to match the implementation.
📝 Suggested fix
-// NewSourceCache creates a new source cache under ~/.crypto-finder/cache/sources/.
+// NewSourceCache creates a new source cache under ~/.scanoss/crypto-finder/cache/sources/.
 func NewSourceCache() (*SourceCache, error) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/dependency/source_cache.go` around lines 22 - 28, Update the
function comment for NewSourceCache to reflect the actual directory it creates
(i.e., ~/.scanoss/crypto-finder/cache/sources/) instead of
~/.crypto-finder/cache/sources/ so the docstring matches the implementation;
locate the NewSourceCache function comment and change the path text accordingly.

internal/callgraph/java_type_resolver.go-869-897 (1)

869-897: ⚠️ Potential issue | 🟡 Minor

Propagate archive close errors instead of dropping them.

reader.Close() at line 873 and rc.Close() at line 897 are both unchecked. This causes errcheck failures and can mask I/O errors—truncated ZIP/JAR reads may be reported as success when close operations surface the real error. The function always returns nil for the error even if close operations fail.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/callgraph/java_type_resolver.go` around lines 869 - 897, The
function currently ignores errors from reader.Close() and rc.Close(); change the
signature to use a named error return and replace defer reader.Close() with a
deferred closure that captures reader.Close() and sets/returns the function
error if close fails, and after reading each file (data, err := io.ReadAll(rc))
check rc.Close() (capture its error into a variable like cerr) and if cerr !=
nil && err == nil set err = cerr (or return nil, err) so close failures are
propagated instead of dropped; update the logic around f.Open/ReadAll to prefer
returning the close error when present (refer to the variables reader and rc and
ensure the function's error return is used to surface close errors).

internal/callgraph/java_type_resolver.go-39-45 (1)

39-45: ⚠️ Potential issue | 🟡 Minor

Handle the error from os.UserHomeDir() instead of silently falling back.

If os.UserHomeDir() fails, home becomes "" and filepath.Join("", ".m2", "repository") returns the relative path .m2/repository. This resolves to the current working directory rather than the user's Maven cache, causing JAR discovery to read from the wrong repository. Explicitly handle the error or provide a safer fallback.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/callgraph/java_type_resolver.go` around lines 39 - 45,
NewJavaBytecodeTypeResolver currently ignores the error from os.UserHomeDir(),
causing mavenRepoPath to become a relative path; change
NewJavaBytecodeTypeResolver to handle the error and avoid silently falling back:
update the signature to return (*JavaBytecodeTypeResolver, error), call
os.UserHomeDir() and if it returns an error or empty string fall back to
checking the HOME env (using the existing getenv) and if still empty return a
clear error; only construct and return the JavaBytecodeTypeResolver (setting
mavenRepoPath, runtimeConfig, getenv) when you have a valid home directory,
otherwise propagate the error to callers so JAR discovery uses the correct
repository.

internal/javaruntime/config.go-37-50 (1)

37-50: ⚠️ Potential issue | 🟡 Minor

Test coverage is missing for this new module.

Pipeline failures indicate 0% test coverage for this file. The functions here handle critical configuration parsing that should be validated:

NewConfig validates major and homes

NormalizeMajor handles edge cases

NormalizeHomes validates path entries

Would you like me to generate unit tests covering the key functions in this module? I can create tests for:

NormalizeMajor with valid/invalid/empty inputs

NormalizeHomes with various edge cases

ParseHomeEntries and ParseHomeEnv parsing

ResolveExplicitSelection with mock release files
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/javaruntime/config.go` around lines 37 - 50, Add unit tests for the
javaruntime config module: create tests that cover NewConfig (valid major +
homes and error paths when NormalizeMajor or NormalizeHomes return errors),
exhaustive cases for NormalizeMajor (valid majors, empty, invalid strings),
NormalizeHomes (nil/empty map, invalid paths, mixed valid entries),
ParseHomeEntries and ParseHomeEnv (different env formats, missing keys,
duplicate keys), and ResolveExplicitSelection by creating temporary mock release
files and directories to simulate explicit selection and error conditions; use
table-driven tests, temp dirs/files and mock data to assert returned Config
fields and error values for functions NormalizeMajor, NormalizeHomes,
ParseHomeEntries, ParseHomeEnv, NewConfig, and ResolveExplicitSelection.

internal/javaruntime/config.go-273-275 (1)

273-275: ⚠️ Potential issue | 🟡 Minor

PATH may be set without existing value in base environment.

Line 274 sets PATH to only the JDK bin directory when no existing PATH was found. This could break commands that need other paths. Consider preserving a reasonable default or warning.

🛡️ Proposed fix to preserve existing PATH

 	if !pathUpdated {
-		env = append(env, "PATH="+filepath.Join(javaHome, "bin"))
+		// Prepend JDK bin to existing PATH from environment, if any
+		existingPath := os.Getenv("PATH")
+		if existingPath != "" {
+			env = append(env, "PATH="+filepath.Join(javaHome, "bin")+string(os.PathListSeparator)+existingPath)
+		} else {
+			env = append(env, "PATH="+filepath.Join(javaHome, "bin"))
+		}
 	}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/javaruntime/config.go` around lines 273 - 275, When pathUpdated is
false you currently set PATH to only the JDK bin which drops any existing system
paths; change the append logic so you preserve the process or a sensible default
PATH. Specifically, when pathUpdated is false build the value as
filepath.Join(javaHome, "bin")+":"+os.Getenv("PATH") and if os.Getenv("PATH") is
empty fall back to a hard-coded minimal default (e.g.
"/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"), then append
that string to env; update references around pathUpdated, env, javaHome and use
os.Getenv for the fallback.

internal/dependency/maven_resolver.go-743-769 (1)

743-769: ⚠️ Potential issue | 🟡 Minor

Multiple unchecked errors in listDependenciesPerModule.

Static analysis flags lines 748, 759, 768, and 769. The temp file handling has several unchecked error returns that could mask issues:

Line 748: tmpFile.Close() error ignored
Line 759: os.Remove(tmpPath) error ignored on early return
Line 768: os.ReadFile(tmpPath) error ignored
Line 769: os.Remove(tmpPath) error ignored

While these are mostly non-fatal, the pattern is inconsistent with listDependencies which uses defer for cleanup.

🛡️ Proposed fix for consistent temp file handling

 func (r *MavenResolver) listDependenciesPerModule(ctx context.Context, targetDir string, modules []string) (*listDepsResult, error) {
 	seen := make(map[string]bool)
 	var allDeps []Dependency
 	anyPartial := false
 
 	for _, module := range modules {
 		tmpFile, err := os.CreateTemp("", "mvn-deps-*.txt")
 		if err != nil {
 			continue
 		}
 		tmpPath := tmpFile.Name()
-		_ = tmpFile.Close()
+		if err := tmpFile.Close(); err != nil {
+			log.Debug().Err(err).Str("module", module).Msg("Failed to close temp file")
+		}
+		defer func(path string) {
+			if removeErr := os.Remove(path); removeErr != nil && !os.IsNotExist(removeErr) {
+				log.Debug().Err(removeErr).Str("path", path).Msg("Failed to remove temp file")
+			}
+		}(tmpPath)
 
 		// ... rest of the loop ...
 
-		data, _ := os.ReadFile(tmpPath)
-		_ = os.Remove(tmpPath)
+		data, readErr := os.ReadFile(tmpPath)
+		if readErr != nil {
+			log.Debug().Err(readErr).Str("module", module).Msg("Failed to read dependency output")
+			continue
+		}

Note: Using defer inside a loop requires a closure capturing tmpPath to avoid all defers using the last value.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/dependency/maven_resolver.go` around lines 743 - 769, In
listDependenciesPerModule, fix the unchecked temp-file errors by properly
handling tmpFile.Close(), os.ReadFile(tmpPath) and os.Remove(tmpPath) rather
than ignoring their returns: after creating tmpFile capture tmpPath and use a
deferred cleanup closure (e.g., defer func(path string){ if err :=
os.Remove(path); err != nil { /* log or aggregate */ } }(tmpPath)) to ensure
removal even on early returns, check and handle tmpFile.Close() error (return or
log) immediately after Close, and check the error from os.ReadFile(tmpPath) and
propagate it instead of discarding data; also ensure the early-return branch
that calls r.configureMavenCommand handles tmpPath removal consistently (use the
defer), and include cmd.Run() stderr in any returned error where appropriate so
failures are visible.

🧹 Nitpick comments (21)

internal/scan/ecosystem.go (1)

23-26: Consider adding Gradle support for Java ecosystem detection.

The Java detection only checks for pom.xml (Maven). Gradle is widely used and could be detected via build.gradle or build.gradle.kts.

Proposed enhancement

 	// Check for Java (Maven)
 	if _, err := os.Stat(filepath.Join(target, "pom.xml")); err == nil {
 		return ecosystemJava
 	}
+	// Check for Java (Gradle)
+	for _, manifest := range []string{"build.gradle", "build.gradle.kts"} {
+		if _, err := os.Stat(filepath.Join(target, manifest)); err == nil {
+			return ecosystemJava
+		}
+	}
 	// Check for Rust (Cargo)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/scan/ecosystem.go` around lines 23 - 26, The Java ecosystem
detection currently only checks for a Maven pom.xml; update the check in
internal/scan/ecosystem.go to also detect Gradle by testing for build.gradle and
build.gradle.kts in the target directory (using os.Stat with filepath.Join as
done for pom.xml) and return ecosystemJava if any of those files exist; ensure
you keep the existing pom.xml check and add additional checks (or a loop) for
"build.gradle" and "build.gradle.kts" so Gradle projects are recognized.

internal/scan/root_module_test.go (1)

12-30: Consider adding t.Parallel() to subtests.

For consistency with parallel testing patterns and to maximize test concurrency, consider adding t.Parallel() to each subtest.

♻️ Proposed fix

 	t.Run("go", func(t *testing.T) {
+		t.Parallel()
 		dir := t.TempDir()
 		if err := os.WriteFile(filepath.Join(dir, "go.mod"), []byte("module example.com/app\n"), 0o600); err != nil {
 			t.Fatalf("write go.mod: %v", err)
@@ -22,6 +23,7 @@
 	})

 	t.Run("fallback", func(t *testing.T) {
+		t.Parallel()
 		dir := filepath.Join(t.TempDir(), "repo-name")
 		if err := os.MkdirAll(dir, 0o755); err != nil {
 			t.Fatalf("mkdir repo: %v", err)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/scan/root_module_test.go` around lines 12 - 30, Add parallelization
to the subtests in root_module_test.go by calling t.Parallel() at the start of
each t.Run handler; specifically, inside the "go" and "fallback" anonymous test
functions (the closures that call DetectRootModule) invoke t.Parallel() as the
first statement so the subtests run concurrently without changing the test logic
or expectations.

Dockerfile (1)

46-47: Comment says "minimum" but installer pins to exact version.

The comment on line 46 states "minimum version 1.12.1" but the installer URL pins to exactly v1.12.1. Consider updating the comment to reflect the actual behavior (pinned to v1.12.1 for reproducibility).

📝 Proposed fix

-# Install OpenGrep (minimum version 1.12.1)
+# Install OpenGrep (pinned to v1.12.1 for reproducibility)
 RUN curl -fsSL https://raw.githubusercontent.com/opengrep/opengrep/v1.12.1/install.sh | bash

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@Dockerfile` around lines 46 - 47, The comment says "minimum version 1.12.1"
but the installer RUN line uses a pinned URL
"https://raw.githubusercontent.com/opengrep/opengrep/v1.12.1/install.sh"; update
the comment to accurately describe the behavior (e.g., "Install OpenGrep v1.12.1
(pinned for reproducibility)") or change the installer invocation to a truly
minimum-version installer if you intend to allow newer versions; locate the RUN
curl line and the preceding comment and make the text consistent with the chosen
behavior.

docs/DOCKER_USAGE.md (1)

224-237: Consider clarifying the Python pip install behavior.

The note on line 237 is helpful, but users might still be confused about why Python requires this extra step. Consider adding a brief explanation that Python dependencies need to be installed in the container because pip resolution happens at runtime (unlike Go/Java/Rust which can resolve from project manifest files alone).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/DOCKER_USAGE.md` around lines 224 - 237, Update the "Python Projects"
section (the "Python Projects" header and the docker run example that runs "pip
install -r /workspace/code/requirements.txt" followed by "crypto-finder scan
--scan-dependencies /workspace/code") to add one brief clarifying sentence after
the existing note about the deps image and virtualenv explaining that Python
dependencies must be installed inside the container because pip performs
dependency resolution at runtime from installed packages (unlike Go/Java/Rust
which can rely solely on manifest files), so users must run "pip install" in the
container before scanning.

internal/callgraph/parser_registry.go (1)

7-20: Consider using constants for ecosystem names.

The ecosystem strings ("go", "java", "python", "rust") are used as magic strings. If these are referenced elsewhere in the codebase, consider defining constants to prevent typos and enable refactoring.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/callgraph/parser_registry.go` around lines 7 - 20, The switch in
NewParserForEcosystem uses magic strings ("go","java","python","rust"); define
named constants (e.g., EcosystemGo, EcosystemJava, EcosystemPython,
EcosystemRust) and replace the string literals in NewParserForEcosystem with
those constants, updating any other usages across the codebase to use the
constants as well so typos are avoided and refactoring is easier; keep the
return behavior (calling NewGoParser, NewJavaParser, NewPythonParser,
NewRustParser and defaulting to nil) unchanged.

internal/dependency/go_resolver.go (1)

124-138: Consider deduplicating graph edges.

The go mod graph output can contain duplicate edges (same parent→child relationship from different version constraints). The current implementation appends all occurrences, which may result in duplicate entries in graph[parent].

♻️ Optional: deduplicate edges

 	graph := make(map[string][]string)
+	seen := make(map[string]map[string]bool)
 	for _, line := range strings.Split(stdout.String(), "\n") {
 		line = strings.TrimSpace(line)
 		if line == "" {
 			continue
 		}
 		parts := strings.Fields(line)
 		if len(parts) != 2 {
 			continue
 		}
 		// Strip version from module paths for cleaner lookup
 		parent := stripVersion(parts[0])
 		child := stripVersion(parts[1])
+		if seen[parent] == nil {
+			seen[parent] = make(map[string]bool)
+		}
+		if seen[parent][child] {
+			continue
+		}
+		seen[parent][child] = true
 		graph[parent] = append(graph[parent], child)
 	}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/dependency/go_resolver.go` around lines 124 - 138, The graph
building loop in go_resolver.go can append duplicate child entries for the same
parent; update the logic around variables graph, parent, and child to
deduplicate edges by either (a) changing graph to map[string]map[string]struct{}
(a set per parent) and insert child into that set after
stripVersion(parts[0])/stripVersion(parts[1]), or (b) keep graph as
map[string][]string but check for existence (loop or helper contains) before
appending child to graph[parent]; ensure the final graph representation used
elsewhere is adapted if you switch to the set form or converted back to slices
without duplicates.

Dockerfile.goreleaser.deps (2)

56-56: PATH references /root/ directories.

The PATH includes /root/.cargo/bin and /root/.opengrep/cli/latest, which assumes running as root. This is fine for the current setup, but if a non-root user is added later (common security practice), these paths won't be accessible.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile.goreleaser.deps` at line 56, The Dockerfile's ENV PATH line
hardcodes /root paths (/root/.cargo/bin and /root/.opengrep/cli/latest) which
will break if a non-root user is used; update the ENV PATH assignment (the ENV
PATH="${JAVA_HOME}/bin:...") to reference user-agnostic locations or
variables—for example use ${HOME}/.cargo/bin and ${HOME}/.opengrep/cli/latest
(and ensure HOME is set appropriately) or move/install those tools into
system-wide locations like /usr/local/bin or /opt and reference those paths
instead so the PATH works for non-root users.
24-25: Security consideration: piping curl to bash.

While common for installer scripts, piping curl | bash can be risky if the remote script is compromised. Consider pinning to a specific commit hash or downloading and verifying the script checksum first.

For a production image, you might want to:

Download the script to a file

Verify its checksum

Then execute it
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile.goreleaser.deps` around lines 24 - 25, Replace the unsafe "RUN
curl ... | bash" install step for OpenGrep v1.12.1 with a download-and-verify
flow: fetch the install.sh to a file (the existing URL for v1.12.1), fetch a
pinned checksum (or pin to a specific commit tarball), verify the downloaded
script's SHA256 before executing, and only then run the installer script; update
the Dockerfile step that currently contains "RUN curl -fsSL
https://raw.githubusercontent.com/opengrep/opengrep/v1.12.1/install.sh | bash"
to perform these download, verify, and execute substeps (or switch to installing
from a pinned release artifact) so the image no longer pipes remote scripts
directly into bash.

internal/scan/root_module.go (1)

107-114: Consider stricter matching for the name key.

The check strings.HasPrefix(line, "name") would match keys like names or name_extra. For TOML parsing, a stricter check would be more robust.

♻️ Proposed fix

-		if !inSection || !strings.HasPrefix(line, "name") {
+		if !inSection {
+			continue
+		}
+		// Match "name" key exactly (not "names", "name_extra", etc.)
+		if !strings.HasPrefix(line, "name") || (len(line) > 4 && line[4] != ' ' && line[4] != '=' && line[4] != '\t') {
 			continue
 		}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/scan/root_module.go` around lines 107 - 114, The current check uses
strings.HasPrefix(line, "name") which incorrectly matches keys like "names" or
"name_extra"; update the logic in the block that uses inSection and line (around
the split into parts) to first split on "=", trim and unquote the left-hand side
and then require that the key equals exactly "name" before returning the value;
i.e., replace the strings.HasPrefix(line, "name") test with a strict key
comparison (trimSpace(parts[0]) == "name") or an equivalent regex that enforces
a full key match so only the exact TOML key "name" is accepted.

internal/dependency/pip_resolver_test.go (1)

202-275: Consider checking errors from os operations in test setup.

Several os.MkdirAll and os.WriteFile calls in test setup don't check for errors (lines 208-212). While failures here would likely cause the test to fail anyway, explicit error checking would provide clearer failure messages.

♻️ Suggested improvement

-	os.MkdirAll(filepath.Join(sitePackages, "bs4"), 0o755)
-	os.MkdirAll(filepath.Join(sitePackages, "PIL"), 0o755)
-
-	// Create a single-file module
-	os.WriteFile(filepath.Join(sitePackages, "six.py"), []byte("# six"), 0o644)
+	if err := os.MkdirAll(filepath.Join(sitePackages, "bs4"), 0o755); err != nil {
+		t.Fatalf("setup: %v", err)
+	}
+	if err := os.MkdirAll(filepath.Join(sitePackages, "PIL"), 0o755); err != nil {
+		t.Fatalf("setup: %v", err)
+	}
+	if err := os.WriteFile(filepath.Join(sitePackages, "six.py"), []byte("# six"), 0o644); err != nil {
+		t.Fatalf("setup: %v", err)
+	}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/dependency/pip_resolver_test.go` around lines 202 - 275, The test
TestResolvePackageDir_WithMapping calls os.MkdirAll and os.WriteFile without
checking errors; update the setup in that test to capture and handle return
errors (e.g., if err := os.MkdirAll(...); err != nil { t.Fatalf("mkdir failed:
%v", err) } and similarly for os.WriteFile) so failures produce clear test
diagnostics when running resolvePackageDir in the pip_resolver_test.go file.

internal/dependency/pip_resolver.go (1)

166-191: Consider extending pyproject.toml parsing for Poetry projects.

The current implementation only parses [project].name. Poetry projects use [tool.poetry].name instead. Consider adding fallback support for Poetry's format:

♻️ Suggested enhancement

 func parsePyprojectName(content string) string {
 	inProjectSection := false
+	inPoetrySection := false
 	scanner := bufio.NewScanner(strings.NewReader(content))
 	for scanner.Scan() {
 		line := strings.TrimSpace(scanner.Text())
 		if line == "[project]" {
 			inProjectSection = true
+			inPoetrySection = false
+			continue
+		}
+		if line == "[tool.poetry]" {
+			inPoetrySection = true
+			inProjectSection = false
 			continue
 		}
 		if strings.HasPrefix(line, "[") {
 			inProjectSection = false
+			inPoetrySection = false
 			continue
 		}
-		if inProjectSection && strings.HasPrefix(line, "name") {
+		if (inProjectSection || inPoetrySection) && strings.HasPrefix(line, "name") {
 			parts := strings.SplitN(line, "=", 2)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/dependency/pip_resolver.go` around lines 166 - 191, The parser
parsePyprojectName only checks for the [project] section; add fallback support
for Poetry-style pyproject.toml by recognizing and handling the [tool.poetry]
section as well. Update the scanner logic so it sets inProjectSection when
encountering either "[project]" or "[tool.poetry]" (or tracks a variable like
currentSection and accepts both names), keep the same name extraction for lines
starting with "name", and continue to reset the section flag when any other
"["..."]" header is found; ensure the function returns the extracted name from
either section.

internal/dependency/cargo_resolver.go (1)

147-160: Consider handling additional Cargo package ID formats.

The cargoPackageNameFromID function handles name@version and name version (...) formats. Cargo's package ID format has evolved across versions, and there may be edge cases with path dependencies or git dependencies that have different ID structures.

The current implementation is reasonable for common cases, but you may want to add test coverage for git-sourced dependencies (e.g., "name 0.1.0 (git+https://github.com/...#hash)").
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/dependency/cargo_resolver.go` around lines 147 - 160, The
cargoPackageNameFromID function only splits on "@" or the first space and can
mis-handle git- or path-sourced IDs like "name 0.1.0 (git+...#hash)" or local
path formats; update cargoPackageNameFromID to first strip any trailing
parenthetical source suffix (remove " (...)" if present), then apply the
existing "@" and space splitting to return the name, and add unit tests covering
git-sourced IDs, path-sourced IDs, plain "name@version" and "name version"
formats to ensure correct extraction; refer to the cargoPackageNameFromID
function when making these changes and adding tests.

internal/dependency/maven_resolver.go (2)

376-377: Unchecked error from os.ReadFile for partial results.

Static analysis flags line 376. The code intentionally ignores the error to parse whatever partial content was written. Add a brief comment explaining this is intentional for clarity.

♻️ Optional: Document intentional error suppression

 	// Even on error, parse whatever was written to the output file (partial results).
-	data, _ := os.ReadFile(tmpPath)
+	// Ignore read error: file may be empty or partial, which is acceptable.
+	data, _ := os.ReadFile(tmpPath) //nolint:errcheck
 	deps := r.parseDependencyList(string(data))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/dependency/maven_resolver.go` around lines 376 - 377, The
os.ReadFile(tmpPath) call currently ignores its returned error before calling
r.parseDependencyList(string(data)); add a brief inline comment explaining that
suppressing the ReadFile error is intentional to allow parsing of any
partially-written contents (e.g., "intentionally ignoring file read error to
parse partial results") so future readers/static analyzers understand this is
deliberate; place the comment immediately above the os.ReadFile(tmpPath) line
and mention tmpPath and r.parseDependencyList to make the intent clear.

135-135: Unchecked error return from installModules.

Static analysis flags that the error return from r.installModules is not checked. While the comment at line 865 indicates this is non-fatal, ignoring the error silently makes debugging harder. The error is already logged inside installModules, so this is acceptable but could be more explicit.

♻️ Optional: Add explicit comment or use blank identifier pattern

 		if len(deps) == 0 && isInterModuleFailure(depsResult.stderr, rootModule, moduleArtifactIDs) {
 			log.Info().Msg("Inter-module dependency failure detected, building modules locally (Tier 3)")
-			_ = r.installModules(ctx, targetDir)
+			// Error already logged inside installModules; partial install may still help
+			_ = r.installModules(ctx, targetDir)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/dependency/maven_resolver.go` at line 135, The call to
r.installModules(ctx, targetDir) currently discards the returned error, which
static analysis flags; change the call to capture the error (err :=
r.installModules(ctx, targetDir)) and then either handle it or explicitly
acknowledge it non‑fatally (if installModules already logs errors) with an if
err != nil { /* intentionally non‑fatal: installModules logs errors */ } so the
error is not silently ignored; reference the r.installModules call and add the
short comment inside the if block to make the intent clear to readers and
linters.

internal/javaruntime/config.go (1)

152-173: CacheKeyToken has implicit dependency on environment.

The function reads JAVA_HOME from the environment when in auto mode (lines 157-172). This makes the function impure and harder to test. Consider accepting JAVA_HOME as a parameter or making the environment access configurable.

However, given this is configuration code and the behavior is documented, this is acceptable for now.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/javaruntime/config.go` around lines 152 - 173, The CacheKeyToken
function currently reads JAVA_HOME from the environment (via os.Getenv) which
makes it impure and hard to test; change CacheKeyToken to accept javaHome as an
explicit string parameter (or accept a small interface/func that returns the env
value) and update callers to pass os.Getenv("JAVA_HOME") where appropriate,
keeping the existing logic that falls back to "jdk-auto" when javaHome is empty
or RuntimeVersion(javaHome) returns an error; preserve use of IsExplicitMajor(),
RequestedMajorOrAuto(), RuntimeVersion, MajorFromVersion and IsSupportedMajor
when migrating behavior.

internal/callgraph/python_parser.go (3)

154-155: Reusing Go node type constant for Python identifiers.

Line 154 uses goNodeIdentifier (presumably defined elsewhere as "identifier") for Python identifiers. While the node type string is the same in both grammars, this coupling to a Go-named constant is confusing.

Consider defining a Python-specific constant or a shared nodeIdentifier constant.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/callgraph/python_parser.go` around lines 154 - 155, The code in
python_parser.go currently uses the Go-specific constant goNodeIdentifier for
Python identifier nodes; define a clearer constant (e.g. pyNodeIdentifier or a
shared nodeIdentifier) and replace the use of goNodeIdentifier in the Python
parser switch/case with the new constant (update any references in python
parsing functions such as the case handling that assigns alias =
grandchild.Content(src)); ensure the new constant is declared near other Python
parser constants so the coupling to a Go-named symbol is removed and the intent
is clear.

513-514: Double TrimPrefix for * is intentional for **kwargs.

The two consecutive TrimPrefix("*") calls handle both *args and **kwargs patterns. This is correct but could use a comment for clarity.

♻️ Optional: Add clarifying comment

+		// Handle *args and **kwargs by stripping prefix stars
 		clean = strings.TrimPrefix(clean, "*")
 		clean = strings.TrimPrefix(clean, "*")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/callgraph/python_parser.go` around lines 513 - 514, The repeated
calls to strings.TrimPrefix on the variable clean are intentional to strip both
single and double asterisks (i.e., handle "*args" and "**kwargs"); add a brief
clarifying comment immediately above the two lines (where clean is trimmed)
stating that the first TrimPrefix removes a leading "*" and the second handles
the case of "**" so both varargs and kwargs are normalized, referencing the
variable name clean and the use of strings.TrimPrefix.

97-101: Using context.TODO() for tree-sitter parsing.

The parser accepts a context but passes context.TODO(). Consider propagating the context from ParseDirectory through to parseFile to support cancellation during long parses.

♻️ Optional: Propagate context for cancellation support

-func (p *PythonParser) ParseDirectory(dir, packagePath string) ([]*FileAnalysis, error) {
+func (p *PythonParser) ParseDirectory(ctx context.Context, dir, packagePath string) ([]*FileAnalysis, error) {
 	// ...
-		analysis, err := p.parseFile(fullPath, packagePath)
+		analysis, err := p.parseFile(ctx, fullPath, packagePath)
 	// ...
 }

-func (p *PythonParser) parseFile(filePath, packagePath string) (*FileAnalysis, error) {
+func (p *PythonParser) parseFile(ctx context.Context, filePath, packagePath string) (*FileAnalysis, error) {
 	// ...
-	tree, err := p.parser.ParseCtx(context.TODO(), nil, src)
+	tree, err := p.parser.ParseCtx(ctx, nil, src)

Note: This would require updating the Parser interface, which may be a larger change.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/callgraph/python_parser.go` around lines 97 - 101, The parser
currently calls p.parser.ParseCtx(context.TODO(), ...) which prevents
cancellation; change parseFile to accept a context (e.g., ctx context.Context)
and pass that ctx through to p.parser.ParseCtx(ctx, nil, src), and update
ParseDirectory (and any callers) to pass its context down into parseFile; if the
Parser interface requires it, add/adjust the method signatures (e.g.,
parseFile(ctx context.Context, filePath string, src []byte) and the
ParseDirectory caller) so the real context is propagated instead of
context.TODO().

internal/engine/dependency_scanner.go (2)

339-342: Cache key construction may cause collisions for empty tokens.

When JavaRuntimeCacheToken is empty, the cache key ends with : followed by rulesHash. If rulesHash happens to start with the same characters as a Java token, this could theoretically collide with explicit tokens. Consider using a delimiter that won't appear in tokens.

However, given that the rules hash is SHA-256 hex and Java tokens are prefixed with jdk-, collision risk is minimal in practice.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/engine/dependency_scanner.go` around lines 339 - 342, The cacheKey
construction (variable cacheKey built from key, rulesHash and
opts.ScanOptions.JavaRuntimeCacheToken) can produce ambiguous values when
JavaRuntimeCacheToken is empty; change the concatenation to use an unambiguous
separator or labeled segment (e.g., append a fixed label like "|token=" + token
or include a delimiter such as '|' before the token) so an empty token doesn't
leave an extra ':' position that could collide; update the logic around
cacheKey, key, rulesHash and opts.ScanOptions.JavaRuntimeCacheToken to always
include the label/delimiter even for empty tokens to make keys unambiguous.
606-629: canonicalDependencies performs redundant sorting.

The function sorts the slice twice: once at line 608 and again at line 625. The second sort after deduplication is necessary for stable output, but the first sort before deduplication is only needed to ensure deterministic deduplication behavior. This is acceptable but could be optimized by sorting only once if needed.
♻️ Optional: Remove redundant first sort if determinism isn't required during dedup

If the preference for keeping entries with non-empty Dir is sufficient, you could skip the first sort:
 func canonicalDependencies(deps []dependency.Dependency) []dependency.Dependency {
 	ordered := append([]dependency.Dependency(nil), deps...)
-	sort.Slice(ordered, func(i, j int) bool {
-		return dependencyLess(ordered[i], ordered[j])
-	})
 
 	unique := make(map[string]dependency.Dependency, len(ordered))
 	for _, dep := range ordered {
However, the current approach ensures deterministic dedup behavior when multiple entries have Dir != "", so keeping both sorts is acceptable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/engine/dependency_scanner.go` around lines 606 - 629,
canonicalDependencies currently sorts twice (the first sort on ordered and the
final sort on result); remove the redundant initial sort.Slice call to avoid
double-sorting while keeping deterministic output by building unique via
dependencyKey and the Dir-preference logic (existing.Dir == "" && dep.Dir != "")
and retaining the final sort before return (which uses dependencyLess). In
short: in canonicalDependencies, stop calling sort.Slice on the intermediate
ordered slice, still copy deps into ordered, perform dedup into the unique map
using dependencyKey and the Dir rule, collect into result and run the final sort
using dependencyLess before returning.

internal/callgraph/builder.go (1)

336-340: Hard-coded library-specific scoring heuristic.

Lines 338-340 contain a hard-coded check for SignatureAlgorithm. and io.jsonwebtoken, which is a very specific heuristic for JJWT library. This couples the generic call graph builder to a specific library.

Consider extracting this into a configurable scoring hook or documenting why this specific case is handled.

♻️ Optional: Document or generalize the heuristic

 		if strings.HasSuffix(candidate.ID.Type, "Builder") {
 			score += 1
 		}
+		// Boost JJWT SignatureAlgorithm calls for crypto-finder's primary use case
 		if strings.Contains(call.Raw, "SignatureAlgorithm.") && strings.Contains(candidate.ID.Package, "io.jsonwebtoken") {
 			score += 2
 		}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/callgraph/builder.go` around lines 336 - 340, The code currently
applies a hard-coded heuristic that bumps score when call.Raw contains
"SignatureAlgorithm." and candidate.ID.Package contains "io.jsonwebtoken"
(modifying variable score using call.Raw and candidate.ID.Package); extract this
into a configurable scoring hook or callback on the call-graph builder (e.g.,
add a ScoreCandidate(call, candidate) method or a ScoreHooks slice on the
CallGraphBuilder) and replace the inline check with a call to that hook, provide
a default implementation that preserves the existing JJWT rule, and add
documentation/config option so consumers can override or remove library-specific
heuristics instead of embedding "SignatureAlgorithm." / "io.jsonwebtoken"
directly in the builder.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8d3a006e-8d5f-42ac-a552-e6fb9da1146e

📥 Commits

Reviewing files that changed from the base of the PR and between 0e63f49 and 5bd9ced.

⛔ Files ignored due to path filters (5)

coverage_targeted.out is excluded by !**/*.out
go.sum is excluded by !**/*.sum
testdata/projects/cryptowrapper_dep/go.sum is excluded by !**/*.sum
testdata/projects/go_with_crypto_dep/go.sum is excluded by !**/*.sum
testdata/projects/go_with_dep_chain/go.sum is excluded by !**/*.sum

📒 Files selected for processing (113)

.github/workflows/lint.yml
.gitignore
.golangci-lint-version
.goreleaser.yml
CHANGELOG.md
CONTRIBUTING.md
Dockerfile
Dockerfile.deps
Dockerfile.goreleaser
Dockerfile.goreleaser.deps
Dockerfile.slim
Dockerfile.test
Makefile
README.md
docs/CONFIGURATION.md
docs/DEPENDENCY_SCANNING.md
docs/DOCKER_USAGE.md
docs/OUTPUT_FORMATS.md
go.mod
internal/api/client.go
internal/callgraph/builder.go
internal/callgraph/bytecode_cache.go
internal/callgraph/bytecode_cache_test.go
internal/callgraph/core_test.go
internal/callgraph/go_parser.go
internal/callgraph/go_parser_test.go
internal/callgraph/java_parser.go
internal/callgraph/java_parser_test.go
internal/callgraph/java_type_resolver.go
internal/callgraph/java_type_resolver_test.go
internal/callgraph/parser_registry.go
internal/callgraph/python_parser.go
internal/callgraph/python_parser_test.go
internal/callgraph/rust_parser.go
internal/callgraph/rust_parser_test.go
internal/callgraph/text_helpers.go
internal/callgraph/tracer.go
internal/callgraph/types.go
internal/cli/root.go
internal/cli/scan.go
internal/cli/scan_test.go
internal/config/config.go
internal/config/config_test.go
internal/deduplicator/deduplicator.go
internal/dependency/cargo_resolver.go
internal/dependency/cargo_resolver_integration_test.go
internal/dependency/cargo_resolver_test.go
internal/dependency/go_resolver.go
internal/dependency/go_resolver_test.go
internal/dependency/maven_resolver.go
internal/dependency/maven_resolver_test.go
internal/dependency/pip_resolver.go
internal/dependency/pip_resolver_test.go
internal/dependency/registry.go
internal/dependency/registry_test.go
internal/dependency/resolver.go
internal/dependency/source_cache.go
internal/dependency/source_cache_test.go
internal/engine/dependency_scanner.go
internal/engine/dependency_scanner_test.go
internal/engine/finding_ids_test.go
internal/engine/findings_cache.go
internal/engine/findings_cache_test.go
internal/engine/orchestrator.go
internal/engine/orchestrator_test.go
internal/engine/processor_test.go
internal/engine/rule_filter.go
internal/engine/rule_filter_test.go
internal/entities/interim.go
internal/entities/interim_additional_test.go
internal/errors/formatter.go
internal/javaruntime/config.go
internal/output/json.go
internal/output/output_test.go
internal/scan/doc.go
internal/scan/duration.go
internal/scan/ecosystem.go
internal/scan/export.go
internal/scan/report.go
internal/scan/root_module.go
internal/scan/root_module_test.go
internal/scan/scan_test.go
internal/scan/summary.go
internal/scan/validation.go
internal/scanner/interface.go
internal/scanner/interface_spinner_test.go
internal/scanner/opengrep/scanner.go
internal/scanner/opengrep/scanner_integration_test.go
internal/scanner/registry.go
internal/scanner/registry_test.go
internal/scanner/semgrep/transformer.go
internal/scanner/semgrep/transformer_test.go
internal/skip/matcher_test.go
internal/skip/source_defaults.go
internal/utils/duration.go
internal/utils/duration_test.go
internal/utils/fd.go
internal/utils/fd_test.go
internal/version/version.go
scanoss.json
schemas/interim-report-schema.json
testdata/code/python/crypto_usage.py
testdata/projects/cryptowrapper_dep/go.mod
testdata/projects/cryptowrapper_dep/wrapper.go
testdata/projects/go_with_crypto_dep/go.mod
testdata/projects/go_with_crypto_dep/main.go
testdata/projects/go_with_crypto_dep/mypkg/crypto.go
testdata/projects/go_with_dep_chain/go.mod
testdata/projects/go_with_dep_chain/main.go
testdata/projects/go_with_dep_chain/mypkg/crypto.go
testdata/projects/java_with_crypto_dep/pom.xml
testdata/projects/java_with_crypto_dep/src/main/java/com/testapp/Main.java
testdata/projects/java_with_crypto_dep/src/main/java/com/testapp/service/CryptoService.java

💤 Files with no reviewable changes (3)

internal/engine/processor_test.go
internal/output/output_test.go
internal/engine/orchestrator_test.go

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (2)

internal/engine/findings_cache.go (2)

63-85: ⚠️ Potential issue | 🟠 Major

Version the on-disk cache payload before accepting cache hits.

Get currently accepts any JSON that unmarshals into entities.InterimReport, so old/new schema drift can silently return stale or malformed cache entries. Add a versioned wrapper and evict on mismatch/corruption.

Suggested change

+const findingsCacheVersion = 1
+
+type findingsCacheEnvelope struct {
+  Version int                     `json:"version"`
+  Report  *entities.InterimReport `json:"report"`
+}
+
 func (c *DiskFindingsCache) Get(_ context.Context, key string) (*entities.InterimReport, bool, error) {
   path := filepath.Join(c.dir, cacheKeyToFilename(key))
@@
-  var report entities.InterimReport
-  if json.Unmarshal(data, &report) != nil {
+  var env findingsCacheEnvelope
+  if json.Unmarshal(data, &env) != nil || env.Version != findingsCacheVersion || env.Report == nil {
     // Corrupted cache file — treat as miss and remove it
     if removeErr := os.Remove(path); removeErr != nil && !errors.Is(removeErr, os.ErrNotExist) {
       return nil, false, fmt.Errorf("failed to remove corrupted cache file: %w", removeErr)
     }
     return nil, false, nil
   }
 
-  return &report, true, nil
+  return env.Report, true, nil
 }
@@
-  data, err := json.Marshal(report)
+  data, err := json.Marshal(findingsCacheEnvelope{
+    Version: findingsCacheVersion,
+    Report:  report,
+  })

Also applies to: 89-93

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/engine/findings_cache.go` around lines 63 - 85, The Get method on
DiskFindingsCache currently accepts any JSON that unmarshals to
entities.InterimReport; update DiskFindingsCache.Get to read a versioned wrapper
struct (e.g., {Version int, Payload entities.InterimReport}) instead, verify the
Version matches the current cache schema constant (add a const like
findingsCacheVersion), and treat mismatches as corruption: remove the file
(using the existing os.Remove logic) and return a cache miss; apply the same
version-check/evict logic to the corresponding save/load paths so cache writes
use the wrapper and cacheKeyToFilename handling remains unchanged.

114-118: ⚠️ Potential issue | 🟠 Major

Sanitize cache key filenames for cross-platform validity.

Replacing only / still leaves Windows-invalid characters (e.g. : in Maven-style coordinates), causing cache writes/reads to fail on Windows. Prefer a whitelist sanitizer.

Suggested change

+import "regexp"
+
+var cacheFilenameUnsafe = regexp.MustCompile(`[^A-Za-z0-9._-]`)
+
 func cacheKeyToFilename(key string) string {
-  safe := strings.ReplaceAll(key, "/", "_")
+  safe := cacheFilenameUnsafe.ReplaceAllString(key, "_")
   return safe + ".json"
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/engine/findings_cache.go` around lines 114 - 118, cacheKeyToFilename
currently only replaces "/" so filenames can still contain Windows-invalid
characters; update cacheKeyToFilename to perform whitelist sanitization: allow a
safe set (e.g., ASCII letters, digits, '-', '_', '.'), replace any other
character with '_' (use a regexp or explicit runes check), also trim/normalize
sequences of underscores and enforce a sensible max length before appending
".json", and ensure the returned filename is never empty; reference the
cacheKeyToFilename function when making these changes.

🧹 Nitpick comments (3)

internal/callgraph/builder.go (3)

374-386: Domain-specific heuristic couples builder to JWT library.

The hardcoded check for "SignatureAlgorithm." and "io.jsonwebtoken" works but creates implicit coupling. Consider extracting scoring heuristics to a configurable strategy if more library-specific rules accumulate.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/callgraph/builder.go` around lines 374 - 386, The
fluentFallbackScore function currently embeds a library-specific heuristic
(checking call.Raw for "SignatureAlgorithm." and candidate.ID.Package for
"io.jsonwebtoken"); extract scoring rules into a configurable strategy so
library-specific rules live outside this function: define a scoring strategy
(e.g., interface or map) that maps predicate functions or package name patterns
to score adjustments, update fluentFallbackScore to iterate over those
configured rules instead of the hardcoded JWT check, and register the existing
JWT rule (matching call.Raw and candidate.ID.Package) in that configuration so
behavior remains unchanged while enabling future rules to be added without
coupling to the function.

233-241: Consider set-based duplicate tracking for large graphs.

The linear scan for duplicate detection is O(n) per call. For functions called from many sites, this could become O(n²). A map-based seen set would be O(1) but adds memory overhead.

This is acceptable for typical call graphs; flag for future optimization if profiling shows hotspots.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/callgraph/builder.go` around lines 233 - 241, The addCaller function
currently does an O(n) linear scan over callers[calleeKey] to avoid duplicates;
switch to set-based duplicate tracking by changing callers from
map[string][]string to map[string]map[string]struct{} (or otherwise store a
per-callee seen map) so you can check existence in O(1) before inserting; update
addCaller to create the inner map if nil, check if callerKey exists, and add it
if not (preserving existing semantics), and note this increases memory usage but
avoids O(n^2) behavior for hot callees.

766-780: Minor: Only handles uppercase L suffix for long literals.

Java allows both 123L and 123l. The function also doesn't handle hex/octal/binary literals. This is acceptable for a heuristic, but could miss some cases.

♻️ Optional improvement

 func looksLikeIntegerLiteral(expr string) bool {
-	expr = strings.TrimSuffix(expr, "L")
+	expr = strings.TrimSuffix(strings.TrimSuffix(expr, "L"), "l")
 	if expr == "" {
 		return false
 	}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/callgraph/builder.go` around lines 766 - 780, The function
looksLikeIntegerLiteral only strips an uppercase "L" suffix which misses
lowercase "l" long literals; update the logic in looksLikeIntegerLiteral to
strip either "L" or "l" (e.g., check the last rune and trim if it's 'L' or 'l'
or use a case-insensitive trim) before validating digits, while keeping the
existing handling for optional leading '+'/'-'; optionally note but do not
implement handling for hex/octal/binary as it's beyond this heuristic.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/callgraph/java_type_resolver.go`:
- Around line 362-366: The slice of sig.fullClass used to populate
FunctionID.Package can panic when sig.fullClass has no dot; update the logic
around strings.LastIndex(sig.fullClass, ".") in the FunctionID construction so
it checks for -1 and uses an empty package (or sig.fullClass as appropriate)
when no separator exists; modify the code that builds FunctionID (referencing
sig.fullClass, sig.className, call.Callee.Name, and FunctionID) to compute idx
:= strings.LastIndex(...) and conditionally set Package = "" (or the fullClass
if desired) when idx == -1, otherwise use sig.fullClass[:idx].

In `@internal/dependency/maven_resolver.go`:
- Around line 773-780: When os.CreateTemp creates tmpFile (tmpPath) and
tmpFile.Close() returns an error, the created temp file is not removed; update
the error path in the block that calls tmpFile.Close() so that before returning
the fmt.Errorf("close temp file: %w", err) you attempt to remove the temp file
(os.Remove(tmpPath)), handle or wrap any remove error appropriately (e.g.,
return a combined error or log the remove error), ensuring the cleanup runs on
Close failure; modify the code around tmpFile, tmpPath, os.CreateTemp and
tmpFile.Close to perform this removal.

In `@internal/engine/findings_cache.go`:
- Around line 152-159: The loop over rulePaths currently treats a path that adds
zero unique files (because files were already seen) as an error; change it to
track whether any rule files were found at all versus whether they were newly
added: call expandRulePathForHash(p, &files, seen) and have it (or the caller)
return/indicate both (a) whether the path contained any rule files and (b)
whether any new files were added; then only error when no rule files were found
for the path (found==false), not when added==false; update the corresponding
check in the second similar block (the one referenced at lines ~180-190) to use
the same found-vs-added distinction so overlapping inputs (parent+child dirs)
don’t produce false errors in the code that uses expandRulePathForHash.
- Around line 97-109: The current atomic-write uses a fixed tmp variable (tmp :=
path + ".tmp") which causes concurrent Put calls to collide; change the logic in
the Put function to create a unique temp file in the same directory (use
os.CreateTemp / os.WriteFile with a unique suffix or generate a
UUID/PID+timestamp) instead of path + ".tmp", write and close/sync the temp
file, then os.Rename the unique temp into path, and on rename error attempt to
remove that specific temp file (using os.Remove on the unique name) and join
errors as before; ensure all references to tmp, os.WriteFile, os.Rename and
os.Remove are updated to use the unique temp filename and its cleanup.

---

Duplicate comments:
In `@internal/engine/findings_cache.go`:
- Around line 63-85: The Get method on DiskFindingsCache currently accepts any
JSON that unmarshals to entities.InterimReport; update DiskFindingsCache.Get to
read a versioned wrapper struct (e.g., {Version int, Payload
entities.InterimReport}) instead, verify the Version matches the current cache
schema constant (add a const like findingsCacheVersion), and treat mismatches as
corruption: remove the file (using the existing os.Remove logic) and return a
cache miss; apply the same version-check/evict logic to the corresponding
save/load paths so cache writes use the wrapper and cacheKeyToFilename handling
remains unchanged.
- Around line 114-118: cacheKeyToFilename currently only replaces "/" so
filenames can still contain Windows-invalid characters; update
cacheKeyToFilename to perform whitelist sanitization: allow a safe set (e.g.,
ASCII letters, digits, '-', '_', '.'), replace any other character with '_' (use
a regexp or explicit runes check), also trim/normalize sequences of underscores
and enforce a sensible max length before appending ".json", and ensure the
returned filename is never empty; reference the cacheKeyToFilename function when
making these changes.

---

Nitpick comments:
In `@internal/callgraph/builder.go`:
- Around line 374-386: The fluentFallbackScore function currently embeds a
library-specific heuristic (checking call.Raw for "SignatureAlgorithm." and
candidate.ID.Package for "io.jsonwebtoken"); extract scoring rules into a
configurable strategy so library-specific rules live outside this function:
define a scoring strategy (e.g., interface or map) that maps predicate functions
or package name patterns to score adjustments, update fluentFallbackScore to
iterate over those configured rules instead of the hardcoded JWT check, and
register the existing JWT rule (matching call.Raw and candidate.ID.Package) in
that configuration so behavior remains unchanged while enabling future rules to
be added without coupling to the function.
- Around line 233-241: The addCaller function currently does an O(n) linear scan
over callers[calleeKey] to avoid duplicates; switch to set-based duplicate
tracking by changing callers from map[string][]string to
map[string]map[string]struct{} (or otherwise store a per-callee seen map) so you
can check existence in O(1) before inserting; update addCaller to create the
inner map if nil, check if callerKey exists, and add it if not (preserving
existing semantics), and note this increases memory usage but avoids O(n^2)
behavior for hot callees.
- Around line 766-780: The function looksLikeIntegerLiteral only strips an
uppercase "L" suffix which misses lowercase "l" long literals; update the logic
in looksLikeIntegerLiteral to strip either "L" or "l" (e.g., check the last rune
and trim if it's 'L' or 'l' or use a case-insensitive trim) before validating
digits, while keeping the existing handling for optional leading '+'/'-';
optionally note but do not implement handling for hex/octal/binary as it's
beyond this heuristic.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3bec05b8-c19e-4b66-8374-82d08c552e02

📥 Commits

Reviewing files that changed from the base of the PR and between 5bd9ced and 008af8b.

📒 Files selected for processing (23)

internal/callgraph/builder.go
internal/callgraph/bytecode_cache.go
internal/callgraph/bytecode_cache_test.go
internal/callgraph/core_test.go
internal/callgraph/java_parser.go
internal/callgraph/java_type_resolver.go
internal/callgraph/java_type_resolver_test.go
internal/callgraph/text_helpers.go
internal/callgraph/tracer.go
internal/config/config.go
internal/dependency/maven_resolver.go
internal/dependency/resolver.go
internal/dependency/resolver_test.go
internal/engine/dependency_scanner.go
internal/engine/findings_cache.go
internal/engine/rule_filter.go
internal/javaruntime/config.go
internal/javaruntime/config_test.go
internal/scan/export.go
internal/scan/root_module.go
internal/scan/root_module_test.go
internal/scan/scan_test.go
scanoss.json

✅ Files skipped from review due to trivial changes (3)

internal/callgraph/bytecode_cache_test.go
internal/engine/rule_filter.go
internal/callgraph/java_type_resolver_test.go

🚧 Files skipped from review as they are similar to previous changes (3)

internal/callgraph/text_helpers.go
internal/engine/dependency_scanner.go
internal/callgraph/tracer.go

github-actions · 2026-03-31T08:07:24Z

🔍 SCANOSS Code Similarity Detected

📄 3 snippet matches found

🔗 View detailed findings on commit 7a11bf6

Files with similarities:

internal/scan/root_module.go
internal/utils/duration.go
internal/utils/fd.go

💡 Click the commit link above to see detailed annotations for each match.

github-actions · 2026-03-31T08:07:28Z

SCANOSS SCAN Completed 🚀

Detected components: 13
Undeclared components: 9
Declared components: 4
Detected files: 63
Detected files undeclared: 3
Detected files declared: 60
Licenses detected: 12
Licenses detected with copyleft: 3
Policies: ❌ 1 fail (1 total)

View more details on SCANOSS Action Summary

coderabbitai

Actionable comments posted: 11

♻️ Duplicate comments (1)

internal/callgraph/java_parser.go (1)

1544-1552: ⚠️ Potential issue | 🟠 Major

Keep package qualifiers in overload IDs.

normalizeJavaTypeName strips com.a.Key and com.b.Key down to the same Key, so distinct overloads collapse onto one FunctionID and later declarations overwrite earlier ones.

Suggested fix

 func decorateJavaOverloadName(name string, params []FunctionParameter) string {
 	if len(params) == 0 {
 		return name + "$"
 	}
 
 	parts := make([]string, 0, len(params))
 	for _, param := range params {
-		parts = append(parts, normalizeJavaTypeName(param.Type))
+		parts = append(parts, normalizeJavaReferenceType(param.Type))
 	}
 	return name + "$" + strings.Join(parts, ",")
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/callgraph/java_parser.go` around lines 1544 - 1552,
decorateJavaOverloadName is collapsing distinct overloaded types because
normalizeJavaTypeName strips package qualifiers (so com.a.Key and com.b.Key both
become Key). Update the logic so overload IDs keep package qualifiers: either
stop calling normalizeJavaTypeName here and use the parameter's fully-qualified
type (param.Type) after safe normalization that preserves package segments, or
add a new normalizer (e.g., normalizeJavaTypeNameWithPackage) and call that from
decorateJavaOverloadName; ensure the result still produces filesystem/ID-safe
strings (replace dots or illegal chars) but does not drop package prefixes so
overloads remain unique.

🧹 Nitpick comments (2)

internal/dependency/maven_resolver.go (1)
768-838: Consider extracting helper functions to reduce cognitive complexity.

Static analysis reports cognitive complexity 27 (threshold 20). The per-module loop handles temp file lifecycle, command execution, and result parsing. Extracting a resolveModuleDependencies(ctx, targetDir, module) ([]Dependency, error) helper would isolate the per-module logic and reduce complexity.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/dependency/maven_resolver.go` around lines 768 - 838, The
listDependenciesPerModule function is too complex—extract the per-module logic
into a helper like resolveModuleDependencies(ctx, targetDir, module) that
performs temp file creation/cleanup, builds and configures the mvn exec.Command
(use configureMavenCommand), runs the command capturing stderr, reads the tmp
file, parses using r.parseDependencyList, and returns ([]Dependency, bool,
error) (deps, partialFlag, err); then simplify listDependenciesPerModule to
iterate modules, call resolveModuleDependencies, merge unique dependencies using
dependencyCoordinateKey, and aggregate the anyPartial flag and errors as before.
internal/callgraph/types.go (1)
228-254: Unused sep parameter in ParseFunctionID.

The second parameter is documented as "the package path separator" but is unused (named _). Either remove it from the signature or implement separator-aware parsing if different ecosystems require it.
If the parameter is truly unnecessary
-func ParseFunctionID(s, _ string) (FunctionID, error) {
+func ParseFunctionID(s string) (FunctionID, error) {
Note: This would require updating all callers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/callgraph/types.go` around lines 228 - 254, ParseFunctionID
currently accepts an unused second parameter (documented as the package path
separator) named `_`; either remove the unused parameter from the signature of
ParseFunctionID and update all callers to match, or implement separator-aware
parsing using that parameter (use it instead of hard-coded "." when searching
for method pattern and last separator, e.g., replace strings.Index(s, ".(") /
strings.LastIndex(s, ".") logic to respect the provided sep). Update the
function declaration and internal uses of "." to reference the provided sep (or
remove the param and keep existing "." logic) and run/build to fix any caller
sites.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/callgraph/builder.go`:
- Around line 554-562: The function updateFluentReturnTypeFromResolvedCall
copies callee.ReturnType into lastReturnType without normalizing qualified
package prefixes, so lookups against typePackages (keyed by fn.ID.Type) miss
types like "com.example.Builder" or "pkg.Type"; update the function (and the
similar block around lines 573-583) to normalize the return type before
assigning to *lastReturnType by stripping generic suffixes and package
qualifiers (i.e., call stripGenericSuffix and then a
package-stripper/qualified-name-normalizer so the value matches the keys in
typePackages/fn.ID.Type), keeping references to the existing symbols
updateFluentReturnTypeFromResolvedCall, callee.ReturnType, lastReturnType and
typePackages.
- Around line 233-241: addCaller currently only appends, so when
resolveFluentCall mutates call.Callee the reverse map (graph.Callers) ends up
containing both the stale unresolved key and the rewritten target; update the
logic so that when a callee is rewritten you remove the old reverse edge before
adding the new one—either enhance addCaller to accept an optional oldCalleeKey
to remove any matching callerKey from callers[oldCalleeKey] prior to appending,
or ensure resolveFluentCall (or its caller) explicitly deletes the callerKey
from callers[oldKey] then calls addCaller with the new calleeKey; apply the same
fix to the other addCaller usage site that exhibits the same behavior.

In `@internal/callgraph/go_parser.go`:
- Around line 87-94: The code silently swallows parse errors when calling
p.ParseFile for fullPath; update the error branch in the loop that processes
files (where p.ParseFile(fullPath, packagePath) returns err) to log the failure
instead of just continuing: add the "github.com/rs/zerolog/log" import and call
log.Error().Err(err).Str("file", fullPath).Str("package",
packagePath).Msg("failed to parse file") (or equivalent) before continue so
parse failures are recorded; keep the rest of the logic appending analysis to
analyses unchanged.

In `@internal/callgraph/java_parser.go`:
- Around line 1298-1328: When object == "" the resolver must check static
wildcard imports before assuming a local method: look up
BaseFunctionName(method) in analysis.WildcardImports (in the same place you
check analysis.Imports) and if found use splitQualifiedJavaType on the imported
value to build and return a FunctionID (Package/Type/Name) just like the
existing Imports branch; only fall back to currentClass or the simple local call
after this wildcard-static-import check. Ensure you reference the same symbols:
object, analysis.WildcardImports, BaseFunctionName(method),
splitQualifiedJavaType, FunctionID, analysis.Imports, and currentClass.

In `@internal/callgraph/java_type_resolver.go`:
- Around line 494-507: The resolver currently trusts whatever
r.bytecodeCache.Get returns and increments stats.cacheHits; after a successful
Get (ok == true) inspect the returned CachedBytecodeIndex.SchemaVersion against
the expected CachedBytecodeIndex schema constant (e.g.
CachedBytecodeIndexSchemaVersion or the version field on the CachedBytecodeIndex
type) and if they differ treat it as a cache miss: log a debug/warn with context
(artifact/task.artifactKey and both versions), do NOT increment stats.cacheHits
or return a jarIndexChunk, and continue to the miss path; apply the same
validation/behavior in the second hit branch later in the function (the other
block that currently returns jarIndexChunk around the 574–579 area) so no cache
implementation can silently feed stale schema entries.
- Around line 346-352: The loop incorrectly allows arity==0 to match any
signature because of the "&& arity > 0" guard; update the condition that filters
signatures to require exact parameter count by removing the arity>0 check so
that you only continue when len(sig.paramTypes) != arity, ensuring a foo() call
only matches signatures with zero parameters; keep the rest of the logic that
calls applyResolvedJavaCall(graph, fn, call, sig, methodsByQualifiedArity).
- Around line 1014-1016: Replace the blind defer of reader.Close() so its error
is checked and propagated or logged: change the defer to a closure that captures
the Close error (e.g. defer func() { if cerr := reader.Close(); cerr != nil { if
err == nil { err = fmt.Errorf("reader.Close: %w", cerr) } else { err =
fmt.Errorf("%v; reader.Close: %w", err, cerr) } } }()) — or if this function
does not use a named return error variable `err`, log the Close error instead
(using the package logger or fmt.Errorf/ fmt.Println) so archive-close failures
are not dropped; the change should reference the existing reader variable and
ensure golangci-lint/errcheck no longer reports an unchecked Close.

In `@internal/callgraph/rust_parser.go`:
- Around line 413-464: The code incorrectly treats every qualified path as a
Type::method; update the fallback logic around FunctionCall/FunctionID
construction (the block using lastTypeSep, prefix, and creating Type) to only
split into Package and Type when the last path segment looks like a type (e.g.,
starts with an uppercase letter or matches Rust type naming), otherwise treat
the entire prefix as the Package and leave Type empty; keep the earlier imports
checks (analysis.Imports, firstSep handling) but change the lastTypeSep branch
to perform the uppercase check on prefix[lastTypeSep+2:] before assigning Type
so module paths like "ring::aead" produce Package "ring::aead" and empty Type.

In `@internal/dependency/maven_resolver.go`:
- Around line 663-675: The temp-file cleanup defer is defined after calling
tmpFile.Close(), so if tmpFile.Close() returns an error (in the code around
tmpFile.Close() in maven_resolver.go) the function returns before the defer can
remove the file; move the defer that removes tmpPath to immediately after
creating tmpFile (right after tmpPath := tmpFile.Name()) so it always executes,
or alternatively, if you prefer minimal change, ensure that on tmpFile.Close()
error you explicitly call os.Remove(tmpPath) (checking for os.IsNotExist) before
returning; reference tmpFile, tmpPath, tmpFile.Close(), and os.Remove when
making the change.
- Around line 356-368: The temporary file can leak if tmpFile.Close() returns an
error before the cleanup defer is registered; move the cleanup defer so it is
set immediately after creating the temp file (after tmpPath := tmpFile.Name())
and before calling tmpFile.Close(), and keep the existing Close error handling
(returning a wrapped error) so that even when Close fails the defer removing
tmpPath runs; reference tmpFile, tmpPath, os.CreateTemp, tmpFile.Close and the
existing deferred removal log.

---

Duplicate comments:
In `@internal/callgraph/java_parser.go`:
- Around line 1544-1552: decorateJavaOverloadName is collapsing distinct
overloaded types because normalizeJavaTypeName strips package qualifiers (so
com.a.Key and com.b.Key both become Key). Update the logic so overload IDs keep
package qualifiers: either stop calling normalizeJavaTypeName here and use the
parameter's fully-qualified type (param.Type) after safe normalization that
preserves package segments, or add a new normalizer (e.g.,
normalizeJavaTypeNameWithPackage) and call that from decorateJavaOverloadName;
ensure the result still produces filesystem/ID-safe strings (replace dots or
illegal chars) but does not drop package prefixes so overloads remain unique.

---

Nitpick comments:
In `@internal/callgraph/types.go`:
- Around line 228-254: ParseFunctionID currently accepts an unused second
parameter (documented as the package path separator) named `_`; either remove
the unused parameter from the signature of ParseFunctionID and update all
callers to match, or implement separator-aware parsing using that parameter (use
it instead of hard-coded "." when searching for method pattern and last
separator, e.g., replace strings.Index(s, ".(") / strings.LastIndex(s, ".")
logic to respect the provided sep). Update the function declaration and internal
uses of "." to reference the provided sep (or remove the param and keep existing
"." logic) and run/build to fix any caller sites.

In `@internal/dependency/maven_resolver.go`:
- Around line 768-838: The listDependenciesPerModule function is too
complex—extract the per-module logic into a helper like
resolveModuleDependencies(ctx, targetDir, module) that performs temp file
creation/cleanup, builds and configures the mvn exec.Command (use
configureMavenCommand), runs the command capturing stderr, reads the tmp file,
parses using r.parseDependencyList, and returns ([]Dependency, bool, error)
(deps, partialFlag, err); then simplify listDependenciesPerModule to iterate
modules, call resolveModuleDependencies, merge unique dependencies using
dependencyCoordinateKey, and aggregate the anyPartial flag and errors as before.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 692dce10-26a3-4662-87ae-4ef5dab7202b

📥 Commits

Reviewing files that changed from the base of the PR and between 008af8b and 7a11bf6.

📒 Files selected for processing (26)

Makefile
internal/callgraph/builder.go
internal/callgraph/bytecode_cache.go
internal/callgraph/bytecode_cache_test.go
internal/callgraph/go_parser.go
internal/callgraph/go_parser_test.go
internal/callgraph/java_parser.go
internal/callgraph/java_parser_test.go
internal/callgraph/java_type_resolver.go
internal/callgraph/java_type_resolver_test.go
internal/callgraph/parser_options.go
internal/callgraph/parser_registry.go
internal/callgraph/python_parser.go
internal/callgraph/python_parser_test.go
internal/callgraph/rust_parser.go
internal/callgraph/rust_parser_test.go
internal/callgraph/types.go
internal/cli/scan.go
internal/cli/scan_test.go
internal/dependency/maven_resolver.go
internal/dependency/source_cache.go
internal/dependency/source_cache_test.go
internal/engine/findings_cache.go
internal/engine/findings_cache_test.go
internal/scan/duration.go
internal/scan/scan_test.go

✅ Files skipped from review due to trivial changes (6)

internal/callgraph/parser_registry.go
internal/callgraph/java_type_resolver_test.go
internal/engine/findings_cache_test.go
internal/callgraph/go_parser_test.go
internal/dependency/source_cache_test.go
internal/callgraph/java_parser_test.go

🚧 Files skipped from review as they are similar to previous changes (6)

internal/callgraph/rust_parser_test.go
internal/callgraph/python_parser_test.go
Makefile
internal/dependency/source_cache.go
internal/callgraph/bytecode_cache.go
internal/callgraph/python_parser.go

github-actions · 2026-03-31T09:29:45Z

SCANOSS SCAN Completed 🚀

Detected components: 4
Undeclared components: 0
Declared components: 4
Detected files: 60
Detected files undeclared: 0
Detected files declared: 60
Licenses detected: 4
Licenses detected with copyleft: 2
Policies: ✅ 1 pass (1 total)

View more details on SCANOSS Action Summary

coderabbitai

Actionable comments posted: 13

♻️ Duplicate comments (3)

internal/callgraph/tracer.go (1)

73-85: ⚠️ Potential issue | 🟠 Major

Stop the BFS once a chain hits user code.

This still only appends on len(callers) == 0, so third_party -> userpkg -> target keeps expanding past the first user frame and can return non-user entry points.

💡 Minimal fix

 		// Get the current head of the chain (the function we're tracing from)
 		head := current.chain[0]
 		headKey := head.Function.String()
+
+		if len(current.chain) > 1 && chainReachesUserCode(current.chain, userPackages, t.pkgSep) {
+			results = append(results, CallChain{Steps: current.chain})
+			continue
+		}
 
 		// Find all callers of the head function
 		callers := t.graph.Callers[headKey]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/callgraph/tracer.go` around lines 73 - 85, The BFS expansion
currently only appends chains when callers == 0; change the logic in the tracer
loop to check chainReachesUserCode(current.chain, userPackages, t.pkgSep) as
soon as you compute head/headKey and before iterating callers: if it returns
true (and len(current.chain) > 1) append CallChain{Steps: current.chain} to
results and continue, so chains stop expanding once they hit user code rather
than only at root callers; update the block referencing head, headKey, callers,
current.chain, chainReachesUserCode, results and CallChain accordingly.

internal/callgraph/rust_parser.go (2)

620-627: ⚠️ Potential issue | 🟠 Major

Match only real receiver forms, not any parameter containing "self".

The substring check still treats self_ref: SelfRef or myself: Foo as receivers. That drops legitimate bindings from varTypes and marks non-method impl functions as methods.

Also applies to: 776-778

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/callgraph/rust_parser.go` around lines 620 - 627, The current
parseRustParameterBinding uses strings.Contains(clean, "self") which
misidentifies any parameter containing "self" as a receiver; change this to
detect only real Rust receiver forms (e.g. "self", "&self", "&mut self", "mut
self" and their optional type-annotated variants like "self: Type" or "&self:
Type"). Replace the substring check in parseRustParameterBinding with a strict
pattern match (regex or token check) that anchors to whole-token receiver forms
(handle optional leading & and mut and optional ": Type" after self), and apply
the same fix to the duplicate occurrence around the other function lines
referenced (776-778) so only true receivers set isSelf=true.

425-435: ⚠️ Potential issue | 🟠 Major

Keep imported module aliases as packages in aead::foo().

When analysis.Imports["aead"] == "ring", this branch still returns Package="ring", Type="aead". That loses free-function edges for use ring::aead; aead::seal_in_place(), which should resolve as Package="ring::aead", Type="".

💡 Minimal fix

 	if pkg, ok := analysis.Imports[prefix]; ok {
+		callee := FunctionID{Name: name}
+		if looksLikeRustTypeName(prefix) {
+			callee.Package = pkg
+			callee.Type = prefix
+		} else {
+			callee.Package = combineRustUsePrefix(pkg, prefix)
+		}
 		return &FunctionCall{
-			Callee:    FunctionID{Package: pkg, Type: prefix, Name: name},
+			Callee:    callee,
 			Raw:       content,
 			FilePath:  filePath,
 			Line:      line,
 			Arguments: args,
 		}
 	}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/callgraph/rust_parser.go` around lines 425 - 435, The current branch
that handles imported prefixes (analysis.Imports[prefix]) constructs
FunctionID{Package: pkg, Type: prefix, Name: name}, which misclassifies module
aliases like `use ring::aead; aead::seal_in_place()`—when
analysis.Imports["aead"] == "ring" you should instead treat the import as a
nested package (Package="ring::aead", Type="") so the call becomes a
free-function on the module; update the branch in rust_parser.go that returns a
FunctionCall to detect imported module aliases and set FunctionID.Package = pkg
+ "::" + prefix and FunctionID.Type = "" (keeping FunctionID.Name = name and the
other fields) rather than swapping pkg and prefix.

🧹 Nitpick comments (3)

internal/dependency/source_cache_test.go (1)

44-52: Reduce repeated test setup with a shared helper.

The TempDir + setTestHome + NewSourceCache sequence repeats across multiple tests; extracting it will make future changes safer and shorter.

♻️ Suggested refactor

+func newTestSourceCache(t *testing.T) *SourceCache {
+	t.Helper()
+	home := t.TempDir()
+	setTestHome(t, home)
+	cache, err := NewSourceCache()
+	if err != nil {
+		t.Fatalf("NewSourceCache: %v", err)
+	}
+	return cache
+}
+
 func TestSourceCache_ExtractZip(t *testing.T) {
-	home := t.TempDir()
-	setTestHome(t, home)
-	cache, err := NewSourceCache()
-	if err != nil {
-		t.Fatalf("NewSourceCache: %v", err)
-	}
+	cache := newTestSourceCache(t)

Also applies to: 80-85, 125-130, 148-153

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/dependency/source_cache_test.go` around lines 44 - 52, Multiple
tests repeat the pattern of creating a temp dir, calling setTestHome, and
creating a SourceCache; extract that into a shared helper (e.g.,
newTestSourceCache or setupSourceCache) that accepts *testing.T, calls
t.TempDir(), setTestHome(t, dir), invokes NewSourceCache(), fails the test on
error, and returns the created cache (and optionally a cleanup or the dir) so
TestNewSourceCacheAndCachedDir, and the other tests referring to the same
sequence (lines ~80-85, 125-130, 148-153) can call the helper instead of
duplicating TempDir + setTestHome + NewSourceCache.

internal/engine/findings_cache_test.go (1)

212-234: Consider adding a platform skip for chmod-based permission tests.

The os.Chmod(dir, 0o500) approach to simulate removal failures is Unix-specific. On Windows, this may not actually prevent file deletion, causing the test to pass or fail unpredictably. If cross-platform CI is expected, consider adding a skip:
if runtime.GOOS == "windows" {
    t.Skip("chmod-based permission tests are not reliable on Windows")
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/engine/findings_cache_test.go` around lines 212 - 234, The test
TestDiskFindingsCache_Get_RemoveCorruptedFileFailure uses os.Chmod(dir, 0o500)
which is unreliable on Windows; update the test to skip on Windows by checking
runtime.GOOS == "windows" at the start of the test (and add the runtime import
if missing) so the chmod-based permission simulation is only run on POSIX
systems; keep the existing chmod/setup/assert logic unchanged for non-Windows
platforms.

internal/callgraph/types.go (1)

148-149: Prefer a typed enum over raw strings for SourceNode.Type.

Given this is a core tracing model, a custom type + constants would prevent typo-driven state bugs (e.g., "CALL_RESLT" silently accepted).

♻️ Suggested shape

+type SourceNodeType string
+
+const (
+	SourceNodeValue      SourceNodeType = "VALUE"
+	SourceNodeVariable   SourceNodeType = "VARIABLE"
+	SourceNodeField      SourceNodeType = "FIELD"
+	SourceNodeParameter  SourceNodeType = "PARAMETER"
+	SourceNodeCallResult SourceNodeType = "CALL_RESULT"
+	SourceNodeExpression SourceNodeType = "EXPRESSION"
+)
+
 type SourceNode struct {
 	// Type classifies the origin: VALUE, VARIABLE, FIELD, PARAMETER, CALL_RESULT, EXPRESSION
-	Type string
+	Type SourceNodeType

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/callgraph/types.go` around lines 148 - 149, Replace the raw string
field SourceNode.Type with a strongly typed enum: define a new custom type (e.g.
SourceType) and declare constants for the allowed values (VALUE, VARIABLE,
FIELD, PARAMETER, CALL_RESULT, EXPRESSION); change the SourceNode.Type
declaration from string to this new SourceType and update any places that
construct, compare, or marshal/unmarshal types (constructors, setters,
switch/case comparisons, JSON handling) to use the constants so typos are caught
at compile time (look for symbol SourceNode.Type and update code that
reads/writes it).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/callgraph/bytecode_cache.go`:
- Around line 169-180: The current write uses tmp := path + ".tmp" which causes
races and Windows rename failures; instead create a unique temporary file in the
same directory (e.g., with os.CreateTemp or by appending a random suffix based
on bytecodeCacheKeyToFilename(key) and c.dir), write the data to that unique
temp file with 0o600 permissions, fsync if needed, then attempt an atomic rename
from that temp file to path; on rename failure ensure you remove the temp file
and, for Windows portability, fallback by attempting to remove the existing path
and retry the rename (or return a clear wrapped error) while always cleaning up
the temp file on both success and error so concurrent writers do not clobber
each other.

In `@internal/callgraph/go_parser.go`:
- Around line 401-413: The code is incorrectly setting FunctionID.Type to the
receiver variable name (operand), which prevents matching against method
declarations parsed by parseMethodDecl (they use the receiver type, e.g.,
Type="Cipher"); instead, stop encoding the variable name into
FunctionID.Type—set FunctionID.Type to the empty string (or nil) and keep the
method name in FunctionID.Name, and preserve the original operand by adding a
dedicated field (e.g., FunctionCall.ReceiverVar or FunctionID.ReceiverVar) to
store the receiver variable name for later resolution; update the construction
in this return site to use Type="" and populate ReceiverVar=operand so
parseMethodDecl matches by type while still retaining the runtime receiver name.

In `@internal/callgraph/java_parser_test.go`:
- Around line 34-67: The raw string variable src is left open, causing the
subsequent test function TestJavaParser_IncludeTestsIncludesTestFilesAndDirs
(and the following test) to be part of the string; close the src raw string (add
the terminating backtick) immediately after the Java fixture contents so that
TestJavaParser_IncludeTestsIncludesTestFilesAndDirs and the later tests are real
Go code, not part of src; verify the closing backtick appears before the func
TestJavaParser_IncludeTestsIncludesTestFilesAndDirs declaration and similarly
fix the other occurrence covering lines 69-96.

In `@internal/callgraph/java_parser.go`:
- Around line 1023-1037: The code strips qualifiers for javaNodeGenericType and
javaNodeScopedTypeIdentifier which loses package qualifiers needed by
resolveJavaVariableTypeCallee; update those case branches in java_parser.go so
they return the full textual content instead of truncating: in the
javaNodeGenericType branch return child.Content(src) (preserving generics like
"Map<String>") and in the javaNodeScopedTypeIdentifier branch return
child.Content(src) without applying strings.LastIndex trimming (preserving
qualifiers like "java.util.Map"). Ensure these changes target the case blocks
handling javaNodeGenericType and javaNodeScopedTypeIdentifier so
resolveJavaVariableTypeCallee can see the full qualified names.
- Around line 1536-1546: The FunctionID produced by functionIDFromQualifiedClass
uses the bare constructorMethodName which yields "<init>" and won't match parsed
constructor symbols keyed by arity; update the constructor default for
reflective zero-arg constructor calls to use the arity-suffixed name "<init>#0"
(i.e. set FunctionID.Name to "<init>#0" or change constructorMethodName to that
value) so reflective Class.newInstance paths resolve to the correct FunctionID
when constructing the FunctionID {Package, Type, Name} in
functionIDFromQualifiedClass.

In `@internal/callgraph/java_type_resolver.go`:
- Around line 1191-1204: The current normalizeArchiveEntryName function (and the
similar checks around lines 1358-1362 and 1398-1406) drops any entry whose base
name contains '$', which incorrectly excludes legitimate named inner classes
like Outer$Inner or Map$Entry; change the check so that you only reject entries
that are anonymous/local class markers (e.g., dollar followed immediately by one
or more digits like $1, $2), but allow entries where the segment after '$'
starts with a letter or valid identifier char to be indexed; update
normalizeArchiveEntryName (and the analogous checks) to detect anonymous-class
patterns (regex like '\$[0-9]+' or equivalent) and return false only for those,
leaving named-$ entries intact and preserving the '$' in returned entryName.
- Around line 795-800: The current classHierarchyParents construction appends
interfaces before the superclass so propagateInheritedMethods (which keeps the
first-seen signature) lets interface default methods override concrete
superclass methods; change the parent ordering so the superclass is placed
before interfaces (i.e., ensure info.superClass is added to the parents slice
before info.interfaces) when calling
normalizeJavaHierarchyParents(info.fullClassName, parents) so superclass methods
take precedence over interface defaults.

In `@internal/callgraph/python_parser.go`:
- Around line 282-284: parseFunctionDef currently passes body to extractCalls
without class context, causing methods like self.encrypt() to be recorded as
package-level functions; update the call to pass the current className (from
parseFunctionDef) into extractCalls and propagate that parameter through
extractCalls and the call-resolution path so that when resolving self.method()
you can tag calls with Type=className (ensure symbols: parseFunctionDef,
decl.Calls, extractCalls, and the internal self.method resolution receive and
use className); repeat the same change for the other occurrence around the
442-450 region so class context is preserved for all method call extractions.
- Around line 392-400: The code currently assumes any imported identifier call
is a constructor by returning a FunctionCall with FunctionID{Package: pkg, Type:
name, Name: constructorMethodName}; instead, detect whether the imported symbol
is actually a type/class before treating it as a constructor. Update the logic
in the block that checks analysis.Imports (the branch creating a FunctionCall
using FunctionID, constructorMethodName, raw, filePath, line, args) to inspect
the imported symbol's metadata (or a type registry) and only set
FunctionID.Type=name and Name=constructorMethodName when the imported symbol is
a class/type; for ordinary functions return a normal function call FunctionID
with Package=pkg and Name=name (leave Type empty). Ensure the change references
analysis.Imports, FunctionCall, FunctionID, and constructorMethodName so the
correct creation path is used.

In `@internal/callgraph/rust_parser_test.go`:
- Around line 59-81: The test
TestRustParser_IncludeTestsIncludesTestFilesAndDirs is mistakenly placed inside
a raw string fixture (opened at line ~48 and not closed until ~101), so the
include-tests case never runs and ParseDirectory writes Go into crypto.rs; fix
by closing the raw string literal that defines the fixture before the
TestRustParser_IncludeTestsIncludesTestFilesAndDirs declaration (or move the
test below the closing backtick), then re-run to ensure
NewRustParser(WithIncludeTests(true)), ParseDirectory(testDir,
"myproject::tests") and p.SkipDirs()["tests"] execute as a real test rather than
being part of the fixture string.

In `@internal/callgraph/types.go`:
- Around line 225-227: Update the stale doc comment for ParseFunctionID(s
string) to remove any reference to a nonexistent sep parameter and instead
describe the current behavior: ParseFunctionID takes a single string and splits
it into package/class and function name by finding the last '.' in the input
(treating that as the separator between fully-qualified type/package and the
function/method), and note that package paths themselves may contain '/' (as in
Go) while Java uses '.' in package/class names; ensure the comment references
the ParseFunctionID function name and clearly documents this splitting behavior.
- Around line 230-241: The parser branch that handles method-style IDs (the code
that checks strings.Index(s, ".(") and returns FunctionID{Package: pkg, Type:
typ, Name: name}) must reject cases where any component is empty; add validation
after extracting pkg, typ, and name in that branch (the FunctionID parsing logic
in types.go) to return an error if pkg=="" or typ=="" or name=="" instead of
returning a successful FunctionID, so malformed IDs like ".(Cipher).getInstance"
or "pkg.().m" are rejected and cannot seed invalid keys into
CallGraph.Functions/Callers.

In `@internal/cli/scan.go`:
- Around line 338-341: The call to resolveJavaRuntimeConfig currently runs
unconditionally and can fail for non-Java scans; move the
resolveJavaRuntimeConfig() invocation so it is executed lazily only when the
selected ecosystem is Java (i.e., after the code path that decides the
ecosystem/type for dependency scanning/export). Specifically, remove or defer
the current unconditional call and invoke resolveJavaRuntimeConfig() only inside
the Java branch (where you create the Java resolver or handle Java-specific
logic), preserving the existing error handling (returning the wrapped fmt.Errorf
on failure).

---

Duplicate comments:
In `@internal/callgraph/rust_parser.go`:
- Around line 620-627: The current parseRustParameterBinding uses
strings.Contains(clean, "self") which misidentifies any parameter containing
"self" as a receiver; change this to detect only real Rust receiver forms (e.g.
"self", "&self", "&mut self", "mut self" and their optional type-annotated
variants like "self: Type" or "&self: Type"). Replace the substring check in
parseRustParameterBinding with a strict pattern match (regex or token check)
that anchors to whole-token receiver forms (handle optional leading & and mut
and optional ": Type" after self), and apply the same fix to the duplicate
occurrence around the other function lines referenced (776-778) so only true
receivers set isSelf=true.
- Around line 425-435: The current branch that handles imported prefixes
(analysis.Imports[prefix]) constructs FunctionID{Package: pkg, Type: prefix,
Name: name}, which misclassifies module aliases like `use ring::aead;
aead::seal_in_place()`—when analysis.Imports["aead"] == "ring" you should
instead treat the import as a nested package (Package="ring::aead", Type="") so
the call becomes a free-function on the module; update the branch in
rust_parser.go that returns a FunctionCall to detect imported module aliases and
set FunctionID.Package = pkg + "::" + prefix and FunctionID.Type = "" (keeping
FunctionID.Name = name and the other fields) rather than swapping pkg and
prefix.

In `@internal/callgraph/tracer.go`:
- Around line 73-85: The BFS expansion currently only appends chains when
callers == 0; change the logic in the tracer loop to check
chainReachesUserCode(current.chain, userPackages, t.pkgSep) as soon as you
compute head/headKey and before iterating callers: if it returns true (and
len(current.chain) > 1) append CallChain{Steps: current.chain} to results and
continue, so chains stop expanding once they hit user code rather than only at
root callers; update the block referencing head, headKey, callers,
current.chain, chainReachesUserCode, results and CallChain accordingly.

---

Nitpick comments:
In `@internal/callgraph/types.go`:
- Around line 148-149: Replace the raw string field SourceNode.Type with a
strongly typed enum: define a new custom type (e.g. SourceType) and declare
constants for the allowed values (VALUE, VARIABLE, FIELD, PARAMETER,
CALL_RESULT, EXPRESSION); change the SourceNode.Type declaration from string to
this new SourceType and update any places that construct, compare, or
marshal/unmarshal types (constructors, setters, switch/case comparisons, JSON
handling) to use the constants so typos are caught at compile time (look for
symbol SourceNode.Type and update code that reads/writes it).

In `@internal/dependency/source_cache_test.go`:
- Around line 44-52: Multiple tests repeat the pattern of creating a temp dir,
calling setTestHome, and creating a SourceCache; extract that into a shared
helper (e.g., newTestSourceCache or setupSourceCache) that accepts *testing.T,
calls t.TempDir(), setTestHome(t, dir), invokes NewSourceCache(), fails the test
on error, and returns the created cache (and optionally a cleanup or the dir) so
TestNewSourceCacheAndCachedDir, and the other tests referring to the same
sequence (lines ~80-85, 125-130, 148-153) can call the helper instead of
duplicating TempDir + setTestHome + NewSourceCache.

In `@internal/engine/findings_cache_test.go`:
- Around line 212-234: The test
TestDiskFindingsCache_Get_RemoveCorruptedFileFailure uses os.Chmod(dir, 0o500)
which is unreliable on Windows; update the test to skip on Windows by checking
runtime.GOOS == "windows" at the start of the test (and add the runtime import
if missing) so the chmod-based permission simulation is only run on POSIX
systems; keep the existing chmod/setup/assert logic unchanged for non-Windows
platforms.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: baa5791c-7a3a-4ded-868d-7b9cde7e1a68

📥 Commits

Reviewing files that changed from the base of the PR and between 7a11bf6 and 30d2483.

📒 Files selected for processing (35)

internal/cache/metadata_test.go
internal/callgraph/builder.go
internal/callgraph/builder_extra_test.go
internal/callgraph/bytecode_cache.go
internal/callgraph/bytecode_cache_test.go
internal/callgraph/core_test.go
internal/callgraph/go_parser.go
internal/callgraph/java_parser.go
internal/callgraph/java_parser_extra_test.go
internal/callgraph/java_parser_test.go
internal/callgraph/java_type_resolver.go
internal/callgraph/java_type_resolver_test.go
internal/callgraph/parser_registry_extra_test.go
internal/callgraph/python_parser.go
internal/callgraph/rust_parser.go
internal/callgraph/rust_parser_extra_test.go
internal/callgraph/rust_parser_test.go
internal/callgraph/text_helpers.go
internal/callgraph/tracer.go
internal/callgraph/types.go
internal/callgraph/types_extra_test.go
internal/cli/scan.go
internal/dependency/maven_resolver.go
internal/dependency/source_cache_test.go
internal/engine/dependency_scanner.go
internal/engine/findings_cache.go
internal/engine/findings_cache_test.go
internal/javaruntime/config.go
internal/scan/export.go
internal/scan/root_module.go
internal/scanner/interface.go
internal/scanner/opengrep/scanner.go
internal/scanner/semgrep/scanner.go
internal/utils/fd_test.go
scanoss.json

✅ Files skipped from review due to trivial changes (7)

internal/cache/metadata_test.go
internal/callgraph/java_type_resolver_test.go
internal/engine/dependency_scanner.go
internal/dependency/maven_resolver.go
internal/engine/findings_cache.go
internal/callgraph/bytecode_cache_test.go
internal/callgraph/core_test.go

🚧 Files skipped from review as they are similar to previous changes (1)

internal/callgraph/text_helpers.go

github-actions · 2026-03-31T10:33:17Z

SCANOSS SCAN Completed 🚀

Detected components: 4
Undeclared components: 0
Declared components: 4
Detected files: 60
Detected files undeclared: 0
Detected files declared: 60
Licenses detected: 4
Licenses detected with copyleft: 2
Policies: ✅ 1 pass (1 total)

View more details on SCANOSS Action Summary

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

internal/callgraph/tracer.go (1)

145-149: Consider using strings.HasPrefix for clarity.

The manual prefix check is correct but could be simplified with the standard library function.

✨ Suggested simplification

+import "strings"
+
 // Check if pkg is a sub-package of any user package
 for userPkg := range userPackages {
 	prefix := userPkg + sep
-	if len(pkg) >= len(prefix) && pkg[:len(prefix)] == prefix {
+	if strings.HasPrefix(pkg, prefix) {
 		return true
 	}
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/callgraph/tracer.go` around lines 145 - 149, The loop in tracer.go
that checks prefixes manually (using prefix := userPkg + sep and
pkg[:len(prefix)] == prefix) should be simplified to use strings.HasPrefix(pkg,
prefix) for clarity and safety; update the code in the function containing the
userPackages loop to call strings.HasPrefix(pkg, prefix) and ensure the
"strings" package is imported if not already present, removing the manual
slice-based comparison.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/callgraph/tracer.go`:
- Around line 44-45: The function signature for Tracer.TraceBackLimited has two
consecutive int parameters (maxDepth int, maxChains int); update the signature
to combine their types into a single declaration (maxDepth, maxChains int) in
the Tracer.TraceBackLimited method so it satisfies gocritic's rule and fixes the
pipeline failure—ensure you update any call sites referencing TraceBackLimited
if necessary.
- Line 35: Remove the unused nolint directive by deleting the
"//nolint:gocognit" comment that sits above the BFS traversal comment in
tracer.go; ensure no other references to "//nolint:gocognit" remain in that
function so the linter no longer fails.

---

Nitpick comments:
In `@internal/callgraph/tracer.go`:
- Around line 145-149: The loop in tracer.go that checks prefixes manually
(using prefix := userPkg + sep and pkg[:len(prefix)] == prefix) should be
simplified to use strings.HasPrefix(pkg, prefix) for clarity and safety; update
the code in the function containing the userPackages loop to call
strings.HasPrefix(pkg, prefix) and ensure the "strings" package is imported if
not already present, removing the manual slice-based comparison.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a15621b5-3bee-447b-9609-c45154b1116a

📥 Commits

Reviewing files that changed from the base of the PR and between 30d2483 and 3352245.

📒 Files selected for processing (4)

internal/callgraph/core_test.go
internal/callgraph/tracer.go
internal/scan/export.go
internal/scan/scan_test.go

github-actions · 2026-03-31T11:26:28Z

SCANOSS SCAN Completed 🚀

Detected components: 4
Undeclared components: 0
Declared components: 4
Detected files: 60
Detected files undeclared: 0
Detected files declared: 60
Licenses detected: 4
Licenses detected with copyleft: 2
Policies: ✅ 1 pass (1 total)

View more details on SCANOSS Action Summary

github-actions · 2026-03-31T11:45:52Z

SCANOSS SCAN Completed 🚀

Detected components: 4
Undeclared components: 0
Declared components: 4
Detected files: 60
Detected files undeclared: 0
Detected files declared: 60
Licenses detected: 4
Licenses detected with copyleft: 2
Policies: ✅ 1 pass (1 total)

View more details on SCANOSS Action Summary

github-actions · 2026-03-31T11:48:35Z

SCANOSS SCAN Completed 🚀

Detected components: 4
Undeclared components: 0
Declared components: 4
Detected files: 60
Detected files undeclared: 0
Detected files declared: 60
Licenses detected: 4
Licenses detected with copyleft: 2
Policies: ✅ 1 pass (1 total)

View more details on SCANOSS Action Summary

github-actions · 2026-03-31T13:05:08Z

SCANOSS SCAN Completed 🚀

Detected components: 4
Undeclared components: 0
Declared components: 4
Detected files: 60
Detected files undeclared: 0
Detected files declared: 60
Licenses detected: 4
Licenses detected with copyleft: 2
Policies: ✅ 1 pass (1 total)

View more details on SCANOSS Action Summary

github-actions · 2026-03-31T13:18:03Z

SCANOSS SCAN Completed 🚀

Detected components: 4
Undeclared components: 0
Declared components: 4
Detected files: 60
Detected files undeclared: 0
Detected files declared: 60
Licenses detected: 4
Licenses detected with copyleft: 2
Policies: ✅ 1 pass (1 total)

View more details on SCANOSS Action Summary

github-actions · 2026-03-31T13:35:02Z

SCANOSS SCAN Completed 🚀

Detected components: 4
Undeclared components: 0
Declared components: 4
Detected files: 60
Detected files undeclared: 0
Detected files declared: 60
Licenses detected: 4
Licenses detected with copyleft: 2
Policies: ✅ 1 pass (1 total)

View more details on SCANOSS Action Summary

github-actions · 2026-03-31T13:44:57Z

SCANOSS SCAN Completed 🚀

Detected components: 4
Undeclared components: 0
Declared components: 4
Detected files: 60
Detected files undeclared: 0
Detected files declared: 60
Licenses detected: 4
Licenses detected with copyleft: 2
Policies: ✅ 1 pass (1 total)

View more details on SCANOSS Action Summary

github-actions · 2026-03-31T13:53:05Z

SCANOSS SCAN Completed 🚀

Detected components: 4
Undeclared components: 0
Declared components: 4
Detected files: 60
Detected files undeclared: 0
Detected files declared: 60
Licenses detected: 4
Licenses detected with copyleft: 2
Policies: ✅ 1 pass (1 total)

View more details on SCANOSS Action Summary

matiasdaloia added 3 commits March 3, 2026 11:31

feat: modify call chain schema

c526d9a

feat: initial support for java reflection, interfaces, etc

f5a85ae

matiasdaloia self-assigned this Mar 30, 2026

matiasdaloia added the enhancement New feature or request label Mar 30, 2026

coderabbitai bot reviewed Mar 30, 2026

View reviewed changes

Comment thread internal/callgraph/java_type_resolver.go

Comment thread internal/dependency/maven_resolver.go Outdated

Comment thread internal/engine/findings_cache.go Outdated

Comment thread internal/engine/findings_cache.go

coderabbitai bot reviewed Mar 31, 2026

View reviewed changes

Comment thread internal/callgraph/tracer.go Outdated

Comment thread internal/callgraph/tracer.go Outdated

feat: add support for dependency scanning

e49cec4

matiasdaloia force-pushed the feature/mdaloia/add-parallel-jar-processing branch from 25cac8e to e49cec4 Compare March 31, 2026 13:42

chore: fix lint errors + add test coverage

967841d

matiasdaloia added 2 commits April 9, 2026 10:51

feat: add entrypoint index to callgraph export

2baa9e3

chore: remove dead code

6da7d58

Conversation

matiasdaloia commented Mar 30, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions bot commented Mar 30, 2026

🔍 SCANOSS Code Similarity Detected

Uh oh!

github-actions bot commented Mar 30, 2026

SCANOSS SCAN Completed 🚀

Uh oh!

coderabbitai bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 31, 2026

🔍 SCANOSS Code Similarity Detected

Uh oh!

github-actions bot commented Mar 31, 2026

SCANOSS SCAN Completed 🚀

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 31, 2026

SCANOSS SCAN Completed 🚀

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 31, 2026

SCANOSS SCAN Completed 🚀

Uh oh!

coderabbitai bot left a comment

matiasdaloia commented Mar 30, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 30, 2026 •

edited

Loading