refactor: Close max_string_length CWE-770 gap + strip RuleMatch.type_kind from JSON by unclesp1d3r · Pull Request #304 · EvilBit-Labs/libmagic-rs

unclesp1d3r · 2026-05-29T00:46:35Z

Summary

Closes pre-1.0 findings from the comprehensive review and the post-PR review:

2A-H1 / CWE-770 — EvaluationConfig::max_string_length was documented as a working memory cap but was never threaded into the type-read dispatchers. Now wired through both read_typed_value_with_pattern (unflagged) and read_pattern_match (flagged string variants /c//C//w//W//T//f).
1B-H2 / 2A-M1 / CWE-200 — RuleMatch.type_kind: TypeKind leaked the parser AST into JSON output. Now #[serde(skip)]; Deserialize dropped from RuleMatch/EvaluationResult/EvaluationMetadata.
1B-C1 (architecture U4) — every public struct now carries #[non_exhaustive]. 13 structs covered: MagicDatabase, library + output EvaluationResult/EvaluationMetadata, MatchResult, EvaluationContext, RuleMatch, RegexFlags, StringFlags, SearchFlags, ValueTransform, MagicRule. New constructors added where needed (RuleMatch::new, library-side EvaluationResult::new/EvaluationMetadata::new, ValueTransform::new) to keep external migration ergonomic. ~56 struct-literal sites across tests/, benches/, and 18 doctest examples migrated to constructor or builder syntax.
SF-1 (silent-failure-hunter) — EvaluationContext::new defensively clamps max_string_length = 0 to the documented default with a warn! log. Closes the bypass path where a struct-literal or with_max_string_length(0) could silently disable the CWE-770 control. Pinned by regression test.
SF-2 (silent-failure-hunter) — replaced the buffer.get(..end).unwrap_or(buffer) fail-open fallback in the flagged-string scan_buffer with &buffer[..end] (panic-on-invariant-violation).
CA-1/2/3/4 (comment-analyzer) — documentation accuracy: CWE-400 row credit in security-assurance.md, S6.1 → S3.8 cross-reference correction, and stale pub/# Errors rustdoc trimmed.
TD-1 (type-design-analyzer) — dropped #[cfg(test)] on read_typed_value (it excluded benches and the planned v1.0 cargo-fuzz harness).

Implements U1, U2, U3, U4, U5, U6 from docs/plans/2026-05-28-001-refactor-pre-1.0-api-hardening-plan.md.

Decisions documented inline (not deferred)

TD-2 (max_string_length as Option<NonZeroUsize>): evaluated and intentionally not implemented. SF-1's runtime clamp guarantees max_string_length >= 1 at every read site, so the NonZeroUsize encoding would be load-bearing-redundant — the runtime guard is the security control; the type would just shift validation from runtime to compile time without changing the invariant. Documented in the SF-1+U4 commit body.

Commits in this PR

a492f01 — U1+U2+U5: thread max_string_length through dispatchers, regression tests, pub(crate) demotion
efc151d — U6: #[serde(skip)] on RuleMatch.type_kind, drop Deserialize from library-side types
d6af29f — U3: doc accuracy on max_string_length rustdoc + configuration.md + initial security-assurance.md §7.3
e7040af — review-fix cluster: SF-2 (defensive slice), CA-1 (CWE-400 row), CA-2 (S6.1→S3.8), CA-3/4 (stale rustdoc), TD-1 (drop #[cfg(test)]), new TS-1 test
d741318 — SF-1 (clamp invalid cap in EvaluationContext::new) + U4 (#[non_exhaustive] on 13 structs + ~56 site migration + new constructors)

Test plan

cargo nextest run --all-features — all unit + integration tests pass
cargo test --doc --all-features — 182 doctests pass (31 ignored)
cargo clippy --all-features --all-targets -- -D warnings — clean
cargo fmt --check — clean
First CI run was fully green (test, quality, audit, coverage, compatibility, cross-platform on Linux/macOS/Windows)
Verify CI green on the U4 commit (running now)

Breaking changes (for the v0.9 / v1.0 changelog)

Library-side EvaluationResult, EvaluationMetadata, RuleMatch no longer derive Deserialize (the documented JSON contract is output::JsonMatchResult).
All 13 public structs gained #[non_exhaustive]. External callers that constructed via struct-literal syntax need to migrate to the provided new(...)/with_*(...) builders or ..Default::default() patterns.
14 leaf read_* functions, the two type-read dispatchers, and coerce_value_to_type in src/evaluator/types/mod.rs are now pub(crate). External callers should go through MagicDatabase or the documented evaluator:: re-exports per GOTCHAS S4.1.
EvaluationContext::new is no longer const fn (it now contains the clamp + log).

Origin

Origin doc: .full-review/05-final-report.md — finding IDs 2A-H1, 3A-C2, 3B-C1 (Sprint 1) and 1B-C1, 1B-C2, 1B-H2, 2A-M1 (Sprint 2). All originally-confirmed Sprint 1 + Sprint 2 work now lands in this single PR.

…ispatchers (closes 2A-H1) `EvaluationConfig::max_string_length` was documented as a working memory cap in three places (`config.rs`, `evaluator/mod.rs`, `configuration.md`) and exposed via the accessor `EvaluationContext::max_string_length()`, but was never threaded into `read_typed_value_with_pattern` or `read_pattern_match`. A `0 string x %s` rule against a 1 GiB NUL-free buffer allocated the full buffer (capped only by the 1 GiB mmap limit) -- the documented CWE-770 control did nothing. Closes origin findings 2A-H1 (memory cap) and 3A-C2 (regression test coverage). Implements U1 + U2 + U5 from `docs/plans/2026-05-28-001-refactor-pre-1.0-api-hardening-plan.md`. Changes: * U1: Thread `max_string_length: usize` through `evaluate_single_rule_with_anchor`, `evaluate_value_rule`, `evaluate_pattern_rule`, `read_typed_value_with_pattern`, and `read_pattern_match`. The unflagged `(None, _)` String arm now passes `Some(max_string_length)` into `read_string`. The flagged-string arm in `read_pattern_match` builds `scan_buffer` with the cap as an upper bound using the same shape as the existing `max_length: Some(n)` arm (mirroring `buffer.get(..end).unwrap_or(buffer)`, not pre-slicing from `offset` -- a pre-slice would double-offset and silently break flagged-string matches at any non-zero offset). * U2: Six new regression tests in `tests/security_regression.rs` covering the unflagged scan-mode path, NUL-stops-before-cap, cap-larger-than-buffer, minimum-valid-cap (cap=1; cap=0 is rejected by `EvaluationConfig::validate`), and a flagged `/W` whitespace-walk smoke test. All six fail against the prior `main` and pass after U1. * U5: Demote the 14 leaf `read_*` re-exports plus `read_typed_value`, `read_typed_value_with_pattern`, and `coerce_value_to_type` from `pub` to `pub(crate)`. `read_pattern_match` was already `pub(crate)` at HEAD. Doctests in the `read_*` source files that imported via `libmagic_rs::evaluator::types::read_*` are marked `ignore` since those paths are no longer reachable externally; the public surface goes through `MagicDatabase` / the documented `evaluator::` re-exports per GOTCHAS S4.1. `read_typed_value` is kept as a 3-arg convenience for `#[cfg(test)]` callers (~30 inline test sites in `src/evaluator/types/`) and delegates to the 5-arg form with `DEFAULT_MAX_STRING_LENGTH = 8192` (matching `EvaluationConfig::default().max_string_length`). The engine path supplies the user-configured cap via the explicit parameter; the test default is informational and does not affect production paths. Remaining work from the plan: * U3: doc verification + security-assurance.md credit * U4: `#[non_exhaustive]` on 13 public structs + ~64 test-site migration * U6: `#[serde(skip)]` on `RuleMatch.type_kind` Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

…skip)] on type_kind (closes 1B-H2) Library-side `EvaluationResult` carries `RuleMatch.type_kind: TypeKind` for runtime needs (`format_magic_message` width-masking, `bit_width()` derivation in output formatting). Serializing the result directly via `serde_json::to_string` was leaking the full parser AST shape into JSON output -- CWE-200 information exposure documented in origin findings 1B-H2 and 2A-M1. The documented JSON contract is `output::json::JsonMatchResult` which already omits `type_kind`. This commit aligns the library-side type with that contract without changing the runtime API: * Add `#[serde(skip)]` to `RuleMatch.type_kind`. Rust-side consumers continue to access the field directly; only the serialized form is affected. * Drop `Deserialize` from `RuleMatch`, `EvaluationResult`, and `EvaluationMetadata`. A reconstructed `RuleMatch` would lack the buffer context it was produced against, so deserialization was never a meaningful operation. The only `Deserialize`-based test in the codebase (`src/output/mod.rs:973`) round-trips the **output-side** `EvaluationResult` (which contains `MatchResult`, not `RuleMatch`) and is unaffected. * Add two regression tests in `tests/json_integration_test.rs`: one asserting the JSON output contains no `type_kind` key, one asserting the Rust field access still works for runtime consumers. Implements U6 from `docs/plans/2026-05-28-001-refactor-pre-1.0-api-hardening-plan.md`. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

…00 countermeasure Aligns the four documentation surfaces 2A-H1 / 3B-C1 named (rustdoc on `EvaluationConfig.max_string_length`, accessor on `EvaluationContext::max_string_length`, `configuration.md` field reference, and `security-assurance.md` CWE-400 row) with the implementation U1 landed. * `src/config.rs` (`EvaluationConfig.max_string_length` rustdoc): describe the cap explicitly as bounding scan-mode `TypeKind::String` reads on both the unflagged and flagged variants, and list the two excluded paths (`PString` errors on overrun; `String16` has a hardcoded 16 KiB ceiling). * `src/evaluator/mod.rs` (`EvaluationContext::max_string_length` rustdoc): same shape -- describe the threading into both dispatchers and the exclusions. * `docs/src/configuration.md`: update the field reference table. * `docs/src/security-assurance.md`: - CWE-400 §5.1 row: credit `max_string_length` as a memory-exhaustion countermeasure and cross-reference §7.3 for the coverage gaps. - New §7.3 "max_string_length Coverage Gaps": describe the PString/String16 exclusions and mitigation guidance for embedders. This scoped update lands within the 2A-H1 / 3B-C1 documentation work declared in plan U3; the full 3B-H2 threat-model refresh remains a separate Sprint 5 PR per the origin report. Implements U3 from `docs/plans/2026-05-28-001-refactor-pre-1.0-api-hardening-plan.md`. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

coderabbitai · 2026-05-29T00:46:47Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: fdc3f28d-375a-40ff-818a-e91dc3c274a7

📥 Commits

Reviewing files that changed from the base of the PR and between d741318 and 2b81694.

⛔ Files ignored due to path filters (3)

docs/MAGIC_FORMAT.md is excluded by none and included by none
tests/json_integration_test.rs is excluded by none and included by none
tests/security_regression.rs is excluded by none and included by none

📒 Files selected for processing (3)

docs/src/evaluator.md
docs/src/library-api.md
src/evaluator/types/mod.rs

Summary by CodeRabbit

Documentation
- Clarified max_string_length semantics, exceptions, and coverage gaps; expanded security guidance and updated evaluator/library docs to reflect internalization of some type-reading helpers and JSON serialization notes.
Breaking Changes
- Output types are Serialize-only; certain internal fields are omitted from JSON.
- Several public structs marked non-exhaustive and new constructor APIs added; some prior struct-literal construction and a formerly const constructor behavior changed (string-length config is now clamped and may emit a warning).

Walkthrough

Threads EvaluationContext::max_string_length into evaluator rule workers and type readers, documents the cap’s scope and exceptions (PString, String16), narrows type-reader visibility, removes Deserialize from public outputs, and changes rustdoc fences to ignore doctests.

Changes

String-length enforcement and API tightening

Layer / File(s)	Summary
Documentation and configuration of max_string_length behavior `docs/src/configuration.md`, `docs/src/security-assurance.md`, `docs/src/library-api.md`, `src/config.rs`, `docs/src/evaluator.md`	Docs and config comment expanded to state that `max_string_length` caps scan-mode `TypeKind::String` reads (including `/c`/`/C`/`/w`/`/W`/`/T`/`/f` flags), and explicitly excludes `TypeKind::PString` (length-prefix overrun returns `TypeReadError::BufferOverrun`) and `TypeKind::String16` (hardcoded 8192-unit ceiling); adds §7.3 coverage-gaps and mitigation guidance.
Rustdoc doctest hygiene `src/evaluator/types/{date,float,numeric,string}.rs`, `src/evaluator/strength.rs`	Example code fences changed to ```ignore to prevent doctest execution; evaluator strength/docs examples rewritten to use `MagicRule::new(...)` / builder helpers.
Public API surface boundary enforcement `src/evaluator/types/mod.rs`	Re-exports of leaf type-reader functions are downgraded from `pub` to `pub(crate)`, reducing the public API surface for internal reader dispatchers.
Type reader signature and enforcement updates `src/evaluator/types/mod.rs`	Adds `DEFAULT_MAX_STRING_LENGTH`; `read_typed_value_with_pattern` and `read_pattern_match` accept `max_string_length`; unpatterned string scans use `Some(max_string_length)` and flagged-string pattern scans clamp their window using `max_string_length`; `coerce_value_to_type` made crate-internal.
Evaluator engine integration of max_string_length `src/evaluator/engine/mod.rs`, `src/evaluator/engine/tests/mod.rs`, `src/evaluator/mod.rs`	`evaluate_single_rule_with_anchor` accepts a `max_string_length` parameter wired from `context.max_string_length()`; dispatch now calls `evaluate_pattern_rule`/`evaluate_value_rule` with the cap; many `RuleMatch` struct literals replaced with `RuleMatch::new(...)`; legacy test helper updated to pass `DEFAULT_MAX_STRING_LENGTH`.
EvaluationContext and RuleMatch serialization changes `src/evaluator/mod.rs`	`EvaluationContext` gains `#[non_exhaustive]` and runtime clamping/logging in `new`; Serde imports limited to `Serialize`; `RuleMatch` now derives only `Serialize`, `type_kind` annotated `#[serde(skip)]`, and a `RuleMatch::new` constructor is provided.
Library output types and constructors `src/lib.rs`, `src/output/mod.rs`	`EvaluationMetadata` and `EvaluationResult` now derive only `Serialize` (removed `Deserialize`) and are `#[non_exhaustive]`; new constructors `EvaluationResult::new` and `EvaluationMetadata::new` added; rustdoc examples updated to use constructors and MatchResult helpers.
Parser AST flags and ValueTransform constructor `src/parser/ast.rs`	Adds `#[non_exhaustive]` to `RegexFlags`, `StringFlags`, `SearchFlags`; updates doc examples to builder/constructor style; adds `pub const fn ValueTransform::new(...)`.

Sequence Diagram(s)

sequenceDiagram
  participant ComponentA
  participant ComponentB
  ComponentA->>ComponentB: observable interaction

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

EvilBit-Labs/libmagic-rs#293: Modifies flagged TypeKind::String scan-window clamping in the same pattern-reading path (read_pattern_match / flagged-string window logic).
EvilBit-Labs/libmagic-rs#170: Introduced TypeKind::PString, which this PR documents as excluded from the max_string_length cap.
EvilBit-Labs/libmagic-rs#288: Related work on evaluator/type dispatch for TypeKind::String and flagged-string handling; overlaps the same dispatch functions updated here.

Suggested labels

evaluator, parser, output, memory-safety, testing, rust

Poem

A cap on strings, a careful tune,
Engine threads it, quiet as noon.
Docs note the corners PString keeps,
Output serial keeps secrets deep.
Tests aligned — the lib sleeps safe and soon.

🚥 Pre-merge checks | ✅ 10

✅ Passed checks (10 passed)

Check name	Status	Explanation
Title check	✅ Passed	Title follows Conventional Commits spec with refactor type, valid scope (omitted but multi-file refactor acceptable), and clear description of main changes: CWE-770 threading and RuleMatch JSON stripping.
Description check	✅ Passed	Description is comprehensive and directly related to the changeset, detailing security fixes (CWE-770, CWE-200), API hardening (13 structs marked non_exhaustive), silent-failure fixes, documentation corrections, and test coverage.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 85.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Memory-Safety-Check	✅ Passed	No unsafe code blocks introduced; all buffer access uses .get() bounds checking or offset guards. Added defensive offset validation and safe arithmetic patterns for scan_buffer operations.
Libmagic-Compatibility-Check	✅ Passed	PR maintains libmagic compatibility: magic file parsing unchanged, default 8192-byte string cap matches libmagic, JSON output format compatible, and all changes are internal API refinements.
Performance-Regression-Check	✅ Passed	No regressions: memmap2 already in use; max_string_length passed as usize (cheap); slice-based buffers; one-time validation; constructors are zero-cost refactoring.
Test-Coverage-Check	✅ Passed	Adequate test coverage: 6 max_string_length regression tests (unflagged, flagged, boundaries); 2 JSON type_kind tests; 4 proptest tests; 47 type-reading tests.
Error-Handling-Check	✅ Passed	Public APIs return Result with thiserror-based errors, zero unwrap()/expect() in library code, only safe unwrap_or() patterns with fallbacks in production code.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

Warning

Review ran into problems

🔥 Problems

These MCP integrations need to be re-authenticated in the Integrations settings: Linear, Notion

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

mergify · 2026-05-29T00:47:14Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Require conventional commit format per https://www.conventionalcommits.org/en/v1.0.0/. Skipped for bots.

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?!?:

🟢 Full CI must pass

Wonderful, this rule succeeded.

All CI checks must pass. Release-plz PRs are exempt because they only bump versions and changelogs (code was already tested on main), and GITHUB_TOKEN-triggered force-pushes suppress CI.

check-success = coverage
check-success = quality
check-success = test
check-success = test-cross-platform (macos-latest, macOS)
check-success = test-cross-platform (ubuntu-22.04, Linux)
check-success = test-cross-platform (ubuntu-latest, Linux)
check-success = test-cross-platform (windows-latest, Windows)

🟢 Do not merge outdated PRs

Wonderful, this rule succeeded.

Make sure PRs are within 10 commits of the base branch before merging

#commits-behind <= 10

dosubot · 2026-05-29T00:52:20Z

Related Knowledge

2 documents with suggested updates are ready for review.

libMagic-rs

evaluator `/libmagic-rs/blob/main/docs/src/evaluator.md` — ⏳ Awaiting Merge

library-api `/libmagic-rs/blob/main/docs/src/library-api.md` — ⏳ Awaiting Merge

^{How did I do? Any feedback?}

Copilot

Pull request overview

This PR hardens the evaluator’s string-reading behavior to actually enforce EvaluationConfig::max_string_length (closing a documented CWE-770 gap), and reduces JSON information exposure by preventing RuleMatch.type_kind (parser AST) from being serialized while also removing unused Deserialize derives from library-facing result types.

Changes:

Thread max_string_length through the evaluator engine into both type-read dispatchers (read_typed_value_with_pattern and read_pattern_match), and adjust scan-window behavior for flagged string rules.
Remove AST leakage in JSON by skipping RuleMatch.type_kind during serialization and dropping Deserialize on library result structs.
Add targeted regression tests and update docs to reflect the new/actual security and configuration semantics; demote various type-read helpers to pub(crate) and ignore affected doctests.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/security_regression.rs	Adds regression tests pinning `max_string_length` behavior on unflagged and flagged string paths.
tests/json_integration_test.rs	Adds tests ensuring `RuleMatch.type_kind` is not serialized to JSON and remains accessible in Rust.
src/lib.rs	Drops `Deserialize` from library-facing `EvaluationResult` / `EvaluationMetadata` and removes unused serde import.
src/config.rs	Updates `max_string_length` rustdoc to precisely describe what is (and isn’t) capped.
src/evaluator/mod.rs	Drops `Deserialize` from `RuleMatch`, adds `#[serde(skip)]` on `type_kind`, and expands docs.
src/evaluator/engine/mod.rs	Threads `max_string_length` through engine helpers into type-read dispatch.
src/evaluator/engine/tests/mod.rs	Updates legacy helper to pass the default max string cap used in tests.
src/evaluator/types/mod.rs	Applies `max_string_length` cap to scan-mode `string x` and flagged-string scan buffers; makes helpers `pub(crate)` and adjusts test-only defaults.
src/evaluator/types/string.rs	Marks doctest examples as ignored due to reduced public surface.
src/evaluator/types/numeric.rs	Marks doctest examples as ignored due to reduced public surface.
src/evaluator/types/float.rs	Marks doctest examples as ignored due to reduced public surface.
src/evaluator/types/date.rs	Marks doctest examples as ignored due to reduced public surface.
docs/src/configuration.md	Updates field reference table to reflect precise `max_string_length` behavior and exclusions.
docs/src/security-assurance.md	Documents `max_string_length` coverage gaps for `PString` and `String16`.

codecov · 2026-05-29T00:58:02Z

Codecov Report

❌ Patch coverage is 82.25806% with 22 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/lib.rs	50.00%	15 Missing ⚠️
src/evaluator/types/mod.rs	80.00%	4 Missing ⚠️
src/parser/ast.rs	0.00%	3 Missing ⚠️

📢 Thoughts on this report? Let us know!

…curacy) Addresses findings from /pr-review-toolkit:review-pr after PR open: * **SF-2 (silent-failure-hunter, P0):** flagged-string `scan_buffer` used `buffer.get(..end).unwrap_or(buffer)`, silently falling back to the full uncapped buffer if the documented invariant `end <= buffer.len()` ever broke. That's "fail open" on the CWE-770 control. Replace with `&buffer[..end]` so a future refactor that breaks the clamp panics loudly instead of silently expanding scope. * **TS-1 (pr-test-analyzer):** add `test_max_string_length_flagged_path_works_at_non_zero_offset`. Pins that the `scan_buffer` caps the buffer's UPPER bound (not pre-slicing from `offset`) -- a future swap of `&buffer[..end]` for `&buffer[offset..end]` would silently double-offset the comparator and the test catches that regression. * **CA-1 (comment-analyzer):** `docs/src/security-assurance.md:110` -- CWE-400 row mdformat reverted in the previous commit. Re-add the `max_string_length` mention with the §7.3 cross-reference. * **CA-2 (comment-analyzer):** `docs/src/security-assurance.md` and `src/config.rs` cited GOTCHAS S6.1 for the pstring buffer-clamp narrative. S6.1 is about multi-byte length-prefix BYTE ORDER; the load-bearing clamp documentation is in S3.8. * **CA-3 + CA-4 (comment-analyzer):** trim stale `pub(crate)`-era rustdoc on `read_typed_value_with_pattern` -- "external callers" advice is meaningless after the demote, and the `# Errors` claim about regex compile failures is unreachable from this dispatcher (rejected before compile; compile failures only via `read_pattern_match`). * **TD-1 (type-design-analyzer):** drop `#[cfg(test)]` from `read_typed_value` and `DEFAULT_MAX_STRING_LENGTH`. `#[cfg(test)]` excluded integration tests, benches, and the planned v1.0 cargo-fuzz harness; `pub(crate)` + `#[allow(dead_code)]` preserves the helper without that cost. Deferred to follow-up: * **SF-1 (config validation in EvaluationContext::new):** changing the constructor to return `Result` is a wider API change. * **TD-2 (Option<NonZeroUsize>):** type-system invariant strengthening with broader ripple. * **TS-2/3, SF-3, CA-5/6/7:** smaller polish items. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

Copilot

Pull request overview

Copilot reviewed 16 out of 17 changed files in this pull request and generated 7 comments.

coderabbitai

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/evaluator/types/mod.rs`:
- Around line 401-405: Replace the direct slice &buffer[..end] with a
bounds-checked get call: use buffer.get(..end) and handle the Option (e.g., let
scan_buffer: &[u8] = buffer.get(..end).unwrap_or(&[]); or match on
buffer.get(..end) to return an empty slice or appropriate fallback). Update the
declaration that creates scan_buffer (which references
max_length/max_string_length, offset, end) to use buffer.get(..end) so all
buffer access complies with the .get() rule.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0b5e4447-4f02-4247-ac14-27fb2409c979

📥 Commits

Reviewing files that changed from the base of the PR and between df68b30 and 1ebbcdb.

⛔ Files ignored due to path filters (2)

.gitignore is excluded by none and included by none
tests/security_regression.rs is excluded by none and included by none

📒 Files selected for processing (3)

docs/src/security-assurance.md
src/config.rs
src/evaluator/types/mod.rs

… structs + migrate test sites Closes the deferred items from PR #304 review per user direction "do not defer or leave residuals." * **SF-1 (silent-failure-hunter):** `EvaluationContext::new` now defensively clamps `max_string_length = 0` to `DEFAULT_MAX_STRING_LENGTH` (8192) with a `warn!` log. `EvaluationConfig::validate()` rejects 0, but struct-literal construction and the `with_max_string_length` builder bypass it. Without the clamp, an invalid 0 silently disabled the CWE-770 control. Non-breaking; embedders see a log warning if they hit the bypass path. Regression test in `tests/security_regression.rs`. * **U4 (architecture 1B-C1):** added `#[non_exhaustive]` to all 13 public structs (`MagicDatabase`, library + output `EvaluationResult` / `EvaluationMetadata`, `MatchResult`, `EvaluationContext`, `RuleMatch`, `RegexFlags`, `StringFlags`, `SearchFlags`, `ValueTransform`, `MagicRule`). Added `RuleMatch::new`, library-side `EvaluationResult::new` / `EvaluationMetadata::new`, and `ValueTransform::new` constructors to keep external migration ergonomic. Migrated ~56 struct-literal sites across 8 test/bench files and 18 doctest examples to use constructors / builder chains. A Python migration script handled the bulk; remaining sites (`children: vec![...]`, `level: N`, complex flag patterns) were edited by hand. * **TD-2 (type-design-analyzer):** evaluated `Option<NonZeroUsize>` for `max_string_length`. Decided not to thread it through the type system: SF-1's runtime clamp guarantees `max_string_length >= 1` at every read site, so the `NonZeroUsize` encoding would be load-bearing-redundant. The runtime guard is the security control; the type would just shift validation from runtime to compile time without changing the invariant. Documenting the decision here rather than leaving it as a residual. Test results after all changes: - `cargo test --all-features` passes (~360 tests + 182 doctests) - `cargo clippy --all-features --all-targets -- -D warnings` clean - `cargo fmt --check` clean Per the no-defer instruction, U4 ships in this PR rather than a follow-up. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

- read_pattern_match: add explicit `offset >= buffer.len()` guard so the flagged String path matches the documented BufferOverrun contract used by regex/search. Prevents incorrect Operator::NotEqual semantics when the read site is past EOF. - types/mod.rs SF-2: switch direct slice indexing `&buffer[..end]` to `buffer.get(..end).ok_or(BufferOverrun)?` per AGENTS.md "use .get()" rule. Preserves SF-2 fail-loud posture via typed error (rather than panic) if a future refactor breaks the clamp invariant. - security_regression tests: align doc-comments with the actual pattern string `" X"` (was inconsistently described as `"X "`). - security_regression tests: reword the "minimum cap" doc-comment to accurately describe validate()-must-be-called semantics + the SF-1 defense-in-depth clamp in EvaluationContext::new. - json_integration_test: replace substring `.contains("type_kind")` checks with structural serde_json::Value walk over `matches[*]` asserting key absence/presence. - docs/src/evaluator.md: correct `type_kind` description -- field is `pub` (public Rust API) but excluded from JSON via `#[serde(skip)]`. - docs/MAGIC_FORMAT.md, docs/src/library-api.md: mdformat-driven table cell-padding alignment (incidental). Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

Copilot

Pull request overview

Copilot reviewed 27 out of 28 changed files in this pull request and generated 2 comments.

+pub(crate) use date::{read_date, read_qdate};
+pub(crate) use float::{read_double, read_float};
+pub(crate) use numeric::{read_byte, read_long, read_quad, read_short};
+pub(crate) use regex::read_regex;


+    /// Construct a new library-side `EvaluationMetadata` from the four
+    /// always-set fields. `magic_file` and `timed_out` default to `None`
+    /// / `false`; use struct-update syntax with [`EvaluationMetadata::default()`]
+    /// to set them explicitly.


unclesp1d3r added 3 commits May 28, 2026 20:40

Copilot AI review requested due to automatic review settings May 29, 2026 00:46

Copilot started reviewing on behalf of unclesp1d3r May 29, 2026 00:46 View session

coderabbitai Bot added the rust Rust language features and idioms label May 29, 2026

docs: Dosu updates for PR #304

df68b30

Copilot AI reviewed May 29, 2026

View reviewed changes

Comment thread src/evaluator/types/mod.rs

Comment thread tests/json_integration_test.rs

Comment thread tests/security_regression.rs

Comment thread tests/security_regression.rs Outdated

coderabbitai Bot removed evaluator Rule evaluation engine and logic output Result formatting and output generation memory-safety Memory safety improvements and guarantees testing Test infrastructure and coverage rust Rust language features and idioms labels May 29, 2026

unclesp1d3r self-assigned this May 29, 2026

coderabbitai Bot previously approved these changes May 29, 2026

View reviewed changes

unclesp1d3r dismissed coderabbitai[bot]’s stale review via e7040af May 29, 2026 01:12

chore(.gitignore): add claude/scheduled_tasks.lock to ignore list

1ebbcdb

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

Copilot AI review requested due to automatic review settings May 29, 2026 02:28

Copilot started reviewing on behalf of unclesp1d3r May 29, 2026 02:28 View session

coderabbitai Bot added evaluator Rule evaluation engine and logic output Result formatting and output generation memory-safety Memory safety improvements and guarantees testing Test infrastructure and coverage rust Rust language features and idioms labels May 29, 2026

Copilot AI reviewed May 29, 2026

View reviewed changes

coderabbitai Bot reviewed May 29, 2026

View reviewed changes

Comment thread src/evaluator/types/mod.rs

dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels May 29, 2026

coderabbitai Bot removed evaluator Rule evaluation engine and logic output Result formatting and output generation memory-safety Memory safety improvements and guarantees testing Test infrastructure and coverage rust Rust language features and idioms labels May 29, 2026

Copilot AI review requested due to automatic review settings May 29, 2026 13:46

Copilot started reviewing on behalf of unclesp1d3r May 29, 2026 13:47 View session

Copilot AI reviewed May 29, 2026

View reviewed changes

coderabbitai Bot approved these changes May 29, 2026

View reviewed changes

Uh oh!

Conversation

unclesp1d3r commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Decisions documented inline (not deferred)

Commits in this PR

Test plan

Breaking changes (for the v0.9 / v1.0 changelog)

Origin

Uh oh!

coderabbitai Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Review ran into problems

Uh oh!

mergify Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Protections

🟢 Enforce conventional commit

🟢 Full CI must pass

🟢 Do not merge outdated PRs

Uh oh!

dosubot Bot commented May 29, 2026

evaluator /libmagic-rs/blob/main/docs/src/evaluator.md — ⏳ Awaiting Merge

library-api /libmagic-rs/blob/main/docs/src/library-api.md — ⏳ Awaiting Merge

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

unclesp1d3r commented May 29, 2026 •

edited

Loading

coderabbitai Bot commented May 29, 2026 •

edited

Loading

mergify Bot commented May 29, 2026 •

edited

Loading

evaluator `/libmagic-rs/blob/main/docs/src/evaluator.md` — ⏳ Awaiting Merge

library-api `/libmagic-rs/blob/main/docs/src/library-api.md` — ⏳ Awaiting Merge

codecov Bot commented May 29, 2026 •

edited

Loading