Skip to content

refactor: Close max_string_length CWE-770 gap + strip RuleMatch.type_kind from JSON#304

Open
unclesp1d3r wants to merge 8 commits into
mainfrom
refactor/pre-1.0-api-hardening
Open

refactor: Close max_string_length CWE-770 gap + strip RuleMatch.type_kind from JSON#304
unclesp1d3r wants to merge 8 commits into
mainfrom
refactor/pre-1.0-api-hardening

Conversation

@unclesp1d3r
Copy link
Copy Markdown
Member

@unclesp1d3r unclesp1d3r commented May 29, 2026

Summary

Closes pre-1.0 findings from the comprehensive review and the post-PR review:

  • 2A-H1 / CWE-770EvaluationConfig::max_string_length was documented as a working memory cap but was never threaded into the type-read dispatchers. Now wired through both read_typed_value_with_pattern (unflagged) and read_pattern_match (flagged string variants /c//C//w//W//T//f).
  • 1B-H2 / 2A-M1 / CWE-200RuleMatch.type_kind: TypeKind leaked the parser AST into JSON output. Now #[serde(skip)]; Deserialize dropped from RuleMatch/EvaluationResult/EvaluationMetadata.
  • 1B-C1 (architecture U4) — every public struct now carries #[non_exhaustive]. 13 structs covered: MagicDatabase, library + output EvaluationResult/EvaluationMetadata, MatchResult, EvaluationContext, RuleMatch, RegexFlags, StringFlags, SearchFlags, ValueTransform, MagicRule. New constructors added where needed (RuleMatch::new, library-side EvaluationResult::new/EvaluationMetadata::new, ValueTransform::new) to keep external migration ergonomic. ~56 struct-literal sites across tests/, benches/, and 18 doctest examples migrated to constructor or builder syntax.
  • SF-1 (silent-failure-hunter)EvaluationContext::new defensively clamps max_string_length = 0 to the documented default with a warn! log. Closes the bypass path where a struct-literal or with_max_string_length(0) could silently disable the CWE-770 control. Pinned by regression test.
  • SF-2 (silent-failure-hunter) — replaced the buffer.get(..end).unwrap_or(buffer) fail-open fallback in the flagged-string scan_buffer with &buffer[..end] (panic-on-invariant-violation).
  • CA-1/2/3/4 (comment-analyzer) — documentation accuracy: CWE-400 row credit in security-assurance.md, S6.1 → S3.8 cross-reference correction, and stale pub/# Errors rustdoc trimmed.
  • TD-1 (type-design-analyzer) — dropped #[cfg(test)] on read_typed_value (it excluded benches and the planned v1.0 cargo-fuzz harness).

Implements U1, U2, U3, U4, U5, U6 from docs/plans/2026-05-28-001-refactor-pre-1.0-api-hardening-plan.md.

Decisions documented inline (not deferred)

TD-2 (max_string_length as Option<NonZeroUsize>): evaluated and intentionally not implemented. SF-1's runtime clamp guarantees max_string_length >= 1 at every read site, so the NonZeroUsize encoding would be load-bearing-redundant — the runtime guard is the security control; the type would just shift validation from runtime to compile time without changing the invariant. Documented in the SF-1+U4 commit body.

Commits in this PR

  1. a492f01 — U1+U2+U5: thread max_string_length through dispatchers, regression tests, pub(crate) demotion
  2. efc151d — U6: #[serde(skip)] on RuleMatch.type_kind, drop Deserialize from library-side types
  3. d6af29f — U3: doc accuracy on max_string_length rustdoc + configuration.md + initial security-assurance.md §7.3
  4. e7040af — review-fix cluster: SF-2 (defensive slice), CA-1 (CWE-400 row), CA-2 (S6.1→S3.8), CA-3/4 (stale rustdoc), TD-1 (drop #[cfg(test)]), new TS-1 test
  5. d741318 — SF-1 (clamp invalid cap in EvaluationContext::new) + U4 (#[non_exhaustive] on 13 structs + ~56 site migration + new constructors)

Test plan

  • cargo nextest run --all-features — all unit + integration tests pass
  • cargo test --doc --all-features — 182 doctests pass (31 ignored)
  • cargo clippy --all-features --all-targets -- -D warnings — clean
  • cargo fmt --check — clean
  • First CI run was fully green (test, quality, audit, coverage, compatibility, cross-platform on Linux/macOS/Windows)
  • Verify CI green on the U4 commit (running now)

Breaking changes (for the v0.9 / v1.0 changelog)

  • Library-side EvaluationResult, EvaluationMetadata, RuleMatch no longer derive Deserialize (the documented JSON contract is output::JsonMatchResult).
  • All 13 public structs gained #[non_exhaustive]. External callers that constructed via struct-literal syntax need to migrate to the provided new(...)/with_*(...) builders or ..Default::default() patterns.
  • 14 leaf read_* functions, the two type-read dispatchers, and coerce_value_to_type in src/evaluator/types/mod.rs are now pub(crate). External callers should go through MagicDatabase or the documented evaluator:: re-exports per GOTCHAS S4.1.
  • EvaluationContext::new is no longer const fn (it now contains the clamp + log).

Origin

Origin doc: .full-review/05-final-report.md — finding IDs 2A-H1, 3A-C2, 3B-C1 (Sprint 1) and 1B-C1, 1B-C2, 1B-H2, 2A-M1 (Sprint 2). All originally-confirmed Sprint 1 + Sprint 2 work now lands in this single PR.

…ispatchers (closes 2A-H1)

`EvaluationConfig::max_string_length` was documented as a working memory
cap in three places (`config.rs`, `evaluator/mod.rs`, `configuration.md`)
and exposed via the accessor `EvaluationContext::max_string_length()`, but
was never threaded into `read_typed_value_with_pattern` or
`read_pattern_match`. A `0 string x %s` rule against a 1 GiB NUL-free
buffer allocated the full buffer (capped only by the 1 GiB mmap limit) --
the documented CWE-770 control did nothing.

Closes origin findings 2A-H1 (memory cap) and 3A-C2 (regression test
coverage). Implements U1 + U2 + U5 from
`docs/plans/2026-05-28-001-refactor-pre-1.0-api-hardening-plan.md`.

Changes:

* U1: Thread `max_string_length: usize` through `evaluate_single_rule_with_anchor`,
  `evaluate_value_rule`, `evaluate_pattern_rule`, `read_typed_value_with_pattern`,
  and `read_pattern_match`. The unflagged `(None, _)` String arm now passes
  `Some(max_string_length)` into `read_string`. The flagged-string arm in
  `read_pattern_match` builds `scan_buffer` with the cap as an upper bound
  using the same shape as the existing `max_length: Some(n)` arm (mirroring
  `buffer.get(..end).unwrap_or(buffer)`, not pre-slicing from `offset` -- a
  pre-slice would double-offset and silently break flagged-string matches
  at any non-zero offset).

* U2: Six new regression tests in `tests/security_regression.rs` covering
  the unflagged scan-mode path, NUL-stops-before-cap, cap-larger-than-buffer,
  minimum-valid-cap (cap=1; cap=0 is rejected by `EvaluationConfig::validate`),
  and a flagged `/W` whitespace-walk smoke test. All six fail against the
  prior `main` and pass after U1.

* U5: Demote the 14 leaf `read_*` re-exports plus `read_typed_value`,
  `read_typed_value_with_pattern`, and `coerce_value_to_type` from `pub`
  to `pub(crate)`. `read_pattern_match` was already `pub(crate)` at HEAD.
  Doctests in the `read_*` source files that imported via
  `libmagic_rs::evaluator::types::read_*` are marked `ignore` since those
  paths are no longer reachable externally; the public surface goes through
  `MagicDatabase` / the documented `evaluator::` re-exports per GOTCHAS S4.1.

`read_typed_value` is kept as a 3-arg convenience for `#[cfg(test)]` callers
(~30 inline test sites in `src/evaluator/types/`) and delegates to the
5-arg form with `DEFAULT_MAX_STRING_LENGTH = 8192` (matching
`EvaluationConfig::default().max_string_length`). The engine path supplies
the user-configured cap via the explicit parameter; the test default is
informational and does not affect production paths.

Remaining work from the plan:
* U3: doc verification + security-assurance.md credit
* U4: `#[non_exhaustive]` on 13 public structs + ~64 test-site migration
* U6: `#[serde(skip)]` on `RuleMatch.type_kind`

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…skip)] on type_kind (closes 1B-H2)

Library-side `EvaluationResult` carries `RuleMatch.type_kind: TypeKind`
for runtime needs (`format_magic_message` width-masking, `bit_width()`
derivation in output formatting). Serializing the result directly via
`serde_json::to_string` was leaking the full parser AST shape into JSON
output -- CWE-200 information exposure documented in origin findings
1B-H2 and 2A-M1.

The documented JSON contract is `output::json::JsonMatchResult` which
already omits `type_kind`. This commit aligns the library-side type with
that contract without changing the runtime API:

* Add `#[serde(skip)]` to `RuleMatch.type_kind`. Rust-side consumers
  continue to access the field directly; only the serialized form is
  affected.

* Drop `Deserialize` from `RuleMatch`, `EvaluationResult`, and
  `EvaluationMetadata`. A reconstructed `RuleMatch` would lack the
  buffer context it was produced against, so deserialization was never
  a meaningful operation. The only `Deserialize`-based test in the
  codebase (`src/output/mod.rs:973`) round-trips the **output-side**
  `EvaluationResult` (which contains `MatchResult`, not `RuleMatch`)
  and is unaffected.

* Add two regression tests in `tests/json_integration_test.rs`: one
  asserting the JSON output contains no `type_kind` key, one asserting
  the Rust field access still works for runtime consumers.

Implements U6 from
`docs/plans/2026-05-28-001-refactor-pre-1.0-api-hardening-plan.md`.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…00 countermeasure

Aligns the four documentation surfaces 2A-H1 / 3B-C1 named (rustdoc on
`EvaluationConfig.max_string_length`, accessor on
`EvaluationContext::max_string_length`, `configuration.md` field
reference, and `security-assurance.md` CWE-400 row) with the
implementation U1 landed.

* `src/config.rs` (`EvaluationConfig.max_string_length` rustdoc):
  describe the cap explicitly as bounding scan-mode `TypeKind::String`
  reads on both the unflagged and flagged variants, and list the two
  excluded paths (`PString` errors on overrun; `String16` has a
  hardcoded 16 KiB ceiling).

* `src/evaluator/mod.rs` (`EvaluationContext::max_string_length`
  rustdoc): same shape -- describe the threading into both dispatchers
  and the exclusions.

* `docs/src/configuration.md`: update the field reference table.

* `docs/src/security-assurance.md`:
  - CWE-400 §5.1 row: credit `max_string_length` as a memory-exhaustion
    countermeasure and cross-reference §7.3 for the coverage gaps.
  - New §7.3 "max_string_length Coverage Gaps": describe the
    PString/String16 exclusions and mitigation guidance for embedders.

This scoped update lands within the 2A-H1 / 3B-C1 documentation work
declared in plan U3; the full 3B-H2 threat-model refresh remains a
separate Sprint 5 PR per the origin report.

Implements U3 from
`docs/plans/2026-05-28-001-refactor-pre-1.0-api-hardening-plan.md`.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Copilot AI review requested due to automatic review settings May 29, 2026 00:46
@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. documentation Improvements or additions to documentation evaluator Rule evaluation engine and logic memory-safety Memory safety improvements and guarantees output Result formatting and output generation testing Test infrastructure and coverage labels May 29, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: fdc3f28d-375a-40ff-818a-e91dc3c274a7

📥 Commits

Reviewing files that changed from the base of the PR and between d741318 and 2b81694.

⛔ Files ignored due to path filters (3)
  • docs/MAGIC_FORMAT.md is excluded by none and included by none
  • tests/json_integration_test.rs is excluded by none and included by none
  • tests/security_regression.rs is excluded by none and included by none
📒 Files selected for processing (3)
  • docs/src/evaluator.md
  • docs/src/library-api.md
  • src/evaluator/types/mod.rs

Summary by CodeRabbit

  • Documentation

    • Clarified max_string_length semantics, exceptions, and coverage gaps; expanded security guidance and updated evaluator/library docs to reflect internalization of some type-reading helpers and JSON serialization notes.
  • Breaking Changes

    • Output types are Serialize-only; certain internal fields are omitted from JSON.
    • Several public structs marked non-exhaustive and new constructor APIs added; some prior struct-literal construction and a formerly const constructor behavior changed (string-length config is now clamped and may emit a warning).

Walkthrough

Threads EvaluationContext::max_string_length into evaluator rule workers and type readers, documents the cap’s scope and exceptions (PString, String16), narrows type-reader visibility, removes Deserialize from public outputs, and changes rustdoc fences to ignore doctests.

Changes

String-length enforcement and API tightening

Layer / File(s) Summary
Documentation and configuration of max_string_length behavior
docs/src/configuration.md, docs/src/security-assurance.md, docs/src/library-api.md, src/config.rs, docs/src/evaluator.md
Docs and config comment expanded to state that max_string_length caps scan-mode TypeKind::String reads (including /c//C//w//W//T//f flags), and explicitly excludes TypeKind::PString (length-prefix overrun returns TypeReadError::BufferOverrun) and TypeKind::String16 (hardcoded 8192-unit ceiling); adds §7.3 coverage-gaps and mitigation guidance.
Rustdoc doctest hygiene
src/evaluator/types/{date,float,numeric,string}.rs, src/evaluator/strength.rs
Example code fences changed to ```ignore to prevent doctest execution; evaluator strength/docs examples rewritten to use MagicRule::new(...) / builder helpers.
Public API surface boundary enforcement
src/evaluator/types/mod.rs
Re-exports of leaf type-reader functions are downgraded from pub to pub(crate), reducing the public API surface for internal reader dispatchers.
Type reader signature and enforcement updates
src/evaluator/types/mod.rs
Adds DEFAULT_MAX_STRING_LENGTH; read_typed_value_with_pattern and read_pattern_match accept max_string_length; unpatterned string scans use Some(max_string_length) and flagged-string pattern scans clamp their window using max_string_length; coerce_value_to_type made crate-internal.
Evaluator engine integration of max_string_length
src/evaluator/engine/mod.rs, src/evaluator/engine/tests/mod.rs, src/evaluator/mod.rs
evaluate_single_rule_with_anchor accepts a max_string_length parameter wired from context.max_string_length(); dispatch now calls evaluate_pattern_rule/evaluate_value_rule with the cap; many RuleMatch struct literals replaced with RuleMatch::new(...); legacy test helper updated to pass DEFAULT_MAX_STRING_LENGTH.
EvaluationContext and RuleMatch serialization changes
src/evaluator/mod.rs
EvaluationContext gains #[non_exhaustive] and runtime clamping/logging in new; Serde imports limited to Serialize; RuleMatch now derives only Serialize, type_kind annotated #[serde(skip)], and a RuleMatch::new constructor is provided.
Library output types and constructors
src/lib.rs, src/output/mod.rs
EvaluationMetadata and EvaluationResult now derive only Serialize (removed Deserialize) and are #[non_exhaustive]; new constructors EvaluationResult::new and EvaluationMetadata::new added; rustdoc examples updated to use constructors and MatchResult helpers.
Parser AST flags and ValueTransform constructor
src/parser/ast.rs
Adds #[non_exhaustive] to RegexFlags, StringFlags, SearchFlags; updates doc examples to builder/constructor style; adds pub const fn ValueTransform::new(...).

Sequence Diagram(s)

sequenceDiagram
  participant ComponentA
  participant ComponentB
  ComponentA->>ComponentB: observable interaction
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • EvilBit-Labs/libmagic-rs#293: Modifies flagged TypeKind::String scan-window clamping in the same pattern-reading path (read_pattern_match / flagged-string window logic).
  • EvilBit-Labs/libmagic-rs#170: Introduced TypeKind::PString, which this PR documents as excluded from the max_string_length cap.
  • EvilBit-Labs/libmagic-rs#288: Related work on evaluator/type dispatch for TypeKind::String and flagged-string handling; overlaps the same dispatch functions updated here.

Suggested labels

evaluator, parser, output, memory-safety, testing, rust

Poem

A cap on strings, a careful tune,
Engine threads it, quiet as noon.
Docs note the corners PString keeps,
Output serial keeps secrets deep.
Tests aligned — the lib sleeps safe and soon.

🚥 Pre-merge checks | ✅ 10
✅ Passed checks (10 passed)
Check name Status Explanation
Title check ✅ Passed Title follows Conventional Commits spec with refactor type, valid scope (omitted but multi-file refactor acceptable), and clear description of main changes: CWE-770 threading and RuleMatch JSON stripping.
Description check ✅ Passed Description is comprehensive and directly related to the changeset, detailing security fixes (CWE-770, CWE-200), API hardening (13 structs marked non_exhaustive), silent-failure fixes, documentation corrections, and test coverage.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 85.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Memory-Safety-Check ✅ Passed No unsafe code blocks introduced; all buffer access uses .get() bounds checking or offset guards. Added defensive offset validation and safe arithmetic patterns for scan_buffer operations.
Libmagic-Compatibility-Check ✅ Passed PR maintains libmagic compatibility: magic file parsing unchanged, default 8192-byte string cap matches libmagic, JSON output format compatible, and all changes are internal API refinements.
Performance-Regression-Check ✅ Passed No regressions: memmap2 already in use; max_string_length passed as usize (cheap); slice-based buffers; one-time validation; constructors are zero-cost refactoring.
Test-Coverage-Check ✅ Passed Adequate test coverage: 6 max_string_length regression tests (unflagged, flagged, boundaries); 2 JSON type_kind tests; 4 proptest tests; 47 type-reading tests.
Error-Handling-Check ✅ Passed Public APIs return Result with thiserror-based errors, zero unwrap()/expect() in library code, only safe unwrap_or() patterns with fallbacks in production code.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Warning

Review ran into problems

🔥 Problems

These MCP integrations need to be re-authenticated in the Integrations settings: Linear, Notion


Comment @coderabbitai help to get the list of available commands and usage tips.

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 29, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Require conventional commit format per https://www.conventionalcommits.org/en/v1.0.0/. Skipped for bots.

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?!?:

🟢 Full CI must pass

Wonderful, this rule succeeded.

All CI checks must pass. Release-plz PRs are exempt because they only bump versions and changelogs (code was already tested on main), and GITHUB_TOKEN-triggered force-pushes suppress CI.

  • check-success = coverage
  • check-success = quality
  • check-success = test
  • check-success = test-cross-platform (macos-latest, macOS)
  • check-success = test-cross-platform (ubuntu-22.04, Linux)
  • check-success = test-cross-platform (ubuntu-latest, Linux)
  • check-success = test-cross-platform (windows-latest, Windows)

🟢 Do not merge outdated PRs

Wonderful, this rule succeeded.

Make sure PRs are within 10 commits of the base branch before merging

  • #commits-behind <= 10

@coderabbitai coderabbitai Bot added the rust Rust language features and idioms label May 29, 2026
@dosubot
Copy link
Copy Markdown
Contributor

dosubot Bot commented May 29, 2026

Related Knowledge

2 documents with suggested updates are ready for review.

libMagic-rs

evaluator /libmagic-rs/blob/main/docs/src/evaluator.md — ⏳ Awaiting Merge
library-api /libmagic-rs/blob/main/docs/src/library-api.md — ⏳ Awaiting Merge

How did I do? Any feedback?

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the evaluator’s string-reading behavior to actually enforce EvaluationConfig::max_string_length (closing a documented CWE-770 gap), and reduces JSON information exposure by preventing RuleMatch.type_kind (parser AST) from being serialized while also removing unused Deserialize derives from library-facing result types.

Changes:

  • Thread max_string_length through the evaluator engine into both type-read dispatchers (read_typed_value_with_pattern and read_pattern_match), and adjust scan-window behavior for flagged string rules.
  • Remove AST leakage in JSON by skipping RuleMatch.type_kind during serialization and dropping Deserialize on library result structs.
  • Add targeted regression tests and update docs to reflect the new/actual security and configuration semantics; demote various type-read helpers to pub(crate) and ignore affected doctests.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/security_regression.rs Adds regression tests pinning max_string_length behavior on unflagged and flagged string paths.
tests/json_integration_test.rs Adds tests ensuring RuleMatch.type_kind is not serialized to JSON and remains accessible in Rust.
src/lib.rs Drops Deserialize from library-facing EvaluationResult / EvaluationMetadata and removes unused serde import.
src/config.rs Updates max_string_length rustdoc to precisely describe what is (and isn’t) capped.
src/evaluator/mod.rs Drops Deserialize from RuleMatch, adds #[serde(skip)] on type_kind, and expands docs.
src/evaluator/engine/mod.rs Threads max_string_length through engine helpers into type-read dispatch.
src/evaluator/engine/tests/mod.rs Updates legacy helper to pass the default max string cap used in tests.
src/evaluator/types/mod.rs Applies max_string_length cap to scan-mode string x and flagged-string scan buffers; makes helpers pub(crate) and adjusts test-only defaults.
src/evaluator/types/string.rs Marks doctest examples as ignored due to reduced public surface.
src/evaluator/types/numeric.rs Marks doctest examples as ignored due to reduced public surface.
src/evaluator/types/float.rs Marks doctest examples as ignored due to reduced public surface.
src/evaluator/types/date.rs Marks doctest examples as ignored due to reduced public surface.
docs/src/configuration.md Updates field reference table to reflect precise max_string_length behavior and exclusions.
docs/src/security-assurance.md Documents max_string_length coverage gaps for PString and String16.

Comment thread src/evaluator/types/mod.rs
Comment thread tests/json_integration_test.rs
Comment thread tests/security_regression.rs
Comment thread tests/security_regression.rs Outdated
@coderabbitai coderabbitai Bot removed evaluator Rule evaluation engine and logic output Result formatting and output generation memory-safety Memory safety improvements and guarantees testing Test infrastructure and coverage rust Rust language features and idioms labels May 29, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 29, 2026

Codecov Report

❌ Patch coverage is 82.25806% with 22 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/lib.rs 50.00% 15 Missing ⚠️
src/evaluator/types/mod.rs 80.00% 4 Missing ⚠️
src/parser/ast.rs 0.00% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

@unclesp1d3r unclesp1d3r self-assigned this May 29, 2026
coderabbitai[bot]
coderabbitai Bot previously approved these changes May 29, 2026
…curacy)

Addresses findings from /pr-review-toolkit:review-pr after PR open:

* **SF-2 (silent-failure-hunter, P0):** flagged-string `scan_buffer`
  used `buffer.get(..end).unwrap_or(buffer)`, silently falling back to
  the full uncapped buffer if the documented invariant
  `end <= buffer.len()` ever broke. That's "fail open" on the CWE-770
  control. Replace with `&buffer[..end]` so a future refactor that
  breaks the clamp panics loudly instead of silently expanding scope.

* **TS-1 (pr-test-analyzer):** add
  `test_max_string_length_flagged_path_works_at_non_zero_offset`. Pins
  that the `scan_buffer` caps the buffer's UPPER bound (not pre-slicing
  from `offset`) -- a future swap of `&buffer[..end]` for
  `&buffer[offset..end]` would silently double-offset the comparator
  and the test catches that regression.

* **CA-1 (comment-analyzer):** `docs/src/security-assurance.md:110` --
  CWE-400 row mdformat reverted in the previous commit. Re-add the
  `max_string_length` mention with the §7.3 cross-reference.

* **CA-2 (comment-analyzer):** `docs/src/security-assurance.md` and
  `src/config.rs` cited GOTCHAS S6.1 for the pstring buffer-clamp
  narrative. S6.1 is about multi-byte length-prefix BYTE ORDER; the
  load-bearing clamp documentation is in S3.8.

* **CA-3 + CA-4 (comment-analyzer):** trim stale `pub(crate)`-era
  rustdoc on `read_typed_value_with_pattern` -- "external callers"
  advice is meaningless after the demote, and the `# Errors` claim
  about regex compile failures is unreachable from this dispatcher
  (rejected before compile; compile failures only via
  `read_pattern_match`).

* **TD-1 (type-design-analyzer):** drop `#[cfg(test)]` from
  `read_typed_value` and `DEFAULT_MAX_STRING_LENGTH`. `#[cfg(test)]`
  excluded integration tests, benches, and the planned v1.0 cargo-fuzz
  harness; `pub(crate)` + `#[allow(dead_code)]` preserves the helper
  without that cost.

Deferred to follow-up:

* **SF-1 (config validation in EvaluationContext::new):** changing the
  constructor to return `Result` is a wider API change.
* **TD-2 (Option<NonZeroUsize>):** type-system invariant strengthening
  with broader ripple.
* **TS-2/3, SF-3, CA-5/6/7:** smaller polish items.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Copilot AI review requested due to automatic review settings May 29, 2026 02:28
@coderabbitai coderabbitai Bot added evaluator Rule evaluation engine and logic output Result formatting and output generation memory-safety Memory safety improvements and guarantees testing Test infrastructure and coverage rust Rust language features and idioms labels May 29, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 17 changed files in this pull request and generated 7 comments.

Comment thread docs/src/evaluator.md Outdated
Comment thread tests/security_regression.rs Outdated
Comment thread tests/security_regression.rs Outdated
Comment thread tests/security_regression.rs
Comment thread tests/security_regression.rs Outdated
Comment thread tests/security_regression.rs Outdated
Comment thread tests/security_regression.rs Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/evaluator/types/mod.rs`:
- Around line 401-405: Replace the direct slice &buffer[..end] with a
bounds-checked get call: use buffer.get(..end) and handle the Option (e.g., let
scan_buffer: &[u8] = buffer.get(..end).unwrap_or(&[]); or match on
buffer.get(..end) to return an empty slice or appropriate fallback). Update the
declaration that creates scan_buffer (which references
max_length/max_string_length, offset, end) to use buffer.get(..end) so all
buffer access complies with the .get() rule.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0b5e4447-4f02-4247-ac14-27fb2409c979

📥 Commits

Reviewing files that changed from the base of the PR and between df68b30 and 1ebbcdb.

⛔ Files ignored due to path filters (2)
  • .gitignore is excluded by none and included by none
  • tests/security_regression.rs is excluded by none and included by none
📒 Files selected for processing (3)
  • docs/src/security-assurance.md
  • src/config.rs
  • src/evaluator/types/mod.rs

Comment thread src/evaluator/types/mod.rs
… structs + migrate test sites

Closes the deferred items from PR #304 review per user direction "do not
defer or leave residuals."

* **SF-1 (silent-failure-hunter):** `EvaluationContext::new` now defensively
  clamps `max_string_length = 0` to `DEFAULT_MAX_STRING_LENGTH` (8192) with a
  `warn!` log. `EvaluationConfig::validate()` rejects 0, but struct-literal
  construction and the `with_max_string_length` builder bypass it. Without the
  clamp, an invalid 0 silently disabled the CWE-770 control. Non-breaking;
  embedders see a log warning if they hit the bypass path. Regression test in
  `tests/security_regression.rs`.

* **U4 (architecture 1B-C1):** added `#[non_exhaustive]` to all 13 public
  structs (`MagicDatabase`, library + output `EvaluationResult` /
  `EvaluationMetadata`, `MatchResult`, `EvaluationContext`, `RuleMatch`,
  `RegexFlags`, `StringFlags`, `SearchFlags`, `ValueTransform`, `MagicRule`).
  Added `RuleMatch::new`, library-side `EvaluationResult::new` /
  `EvaluationMetadata::new`, and `ValueTransform::new` constructors to keep
  external migration ergonomic. Migrated ~56 struct-literal sites across 8
  test/bench files and 18 doctest examples to use constructors / builder
  chains. A Python migration script handled the bulk; remaining sites
  (`children: vec![...]`, `level: N`, complex flag patterns) were edited by
  hand.

* **TD-2 (type-design-analyzer):** evaluated `Option<NonZeroUsize>` for
  `max_string_length`. Decided not to thread it through the type system: SF-1's
  runtime clamp guarantees `max_string_length >= 1` at every read site, so the
  `NonZeroUsize` encoding would be load-bearing-redundant. The runtime guard is
  the security control; the type would just shift validation from runtime to
  compile time without changing the invariant. Documenting the decision here
  rather than leaving it as a residual.

Test results after all changes:
- `cargo test --all-features` passes (~360 tests + 182 doctests)
- `cargo clippy --all-features --all-targets -- -D warnings` clean
- `cargo fmt --check` clean

Per the no-defer instruction, U4 ships in this PR rather than a follow-up.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
@dosubot dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels May 29, 2026
@coderabbitai coderabbitai Bot removed evaluator Rule evaluation engine and logic output Result formatting and output generation memory-safety Memory safety improvements and guarantees testing Test infrastructure and coverage rust Rust language features and idioms labels May 29, 2026
- read_pattern_match: add explicit `offset >= buffer.len()` guard so the
  flagged String path matches the documented BufferOverrun contract used
  by regex/search. Prevents incorrect Operator::NotEqual semantics when
  the read site is past EOF.
- types/mod.rs SF-2: switch direct slice indexing `&buffer[..end]` to
  `buffer.get(..end).ok_or(BufferOverrun)?` per AGENTS.md "use .get()"
  rule. Preserves SF-2 fail-loud posture via typed error (rather than
  panic) if a future refactor breaks the clamp invariant.
- security_regression tests: align doc-comments with the actual pattern
  string `" X"` (was inconsistently described as `"X "`).
- security_regression tests: reword the "minimum cap" doc-comment to
  accurately describe validate()-must-be-called semantics + the SF-1
  defense-in-depth clamp in EvaluationContext::new.
- json_integration_test: replace substring `.contains("type_kind")`
  checks with structural serde_json::Value walk over `matches[*]`
  asserting key absence/presence.
- docs/src/evaluator.md: correct `type_kind` description -- field is
  `pub` (public Rust API) but excluded from JSON via `#[serde(skip)]`.
- docs/MAGIC_FORMAT.md, docs/src/library-api.md: mdformat-driven table
  cell-padding alignment (incidental).

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Copilot AI review requested due to automatic review settings May 29, 2026 13:46
@coderabbitai coderabbitai Bot added parser Magic file parsing components and grammar evaluator Rule evaluation engine and logic output Result formatting and output generation memory-safety Memory safety improvements and guarantees testing Test infrastructure and coverage rust Rust language features and idioms labels May 29, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 28 changed files in this pull request and generated 2 comments.

Comment on lines +21 to +24
pub(crate) use date::{read_date, read_qdate};
pub(crate) use float::{read_double, read_float};
pub(crate) use numeric::{read_byte, read_long, read_quad, read_short};
pub(crate) use regex::read_regex;
Comment thread src/lib.rs
Comment on lines +746 to +749
/// Construct a new library-side `EvaluationMetadata` from the four
/// always-set fields. `magic_file` and `timed_out` default to `None`
/// / `false`; use struct-update syntax with [`EvaluationMetadata::default()`]
/// to set them explicitly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation evaluator Rule evaluation engine and logic memory-safety Memory safety improvements and guarantees output Result formatting and output generation parser Magic file parsing components and grammar rust Rust language features and idioms size:XL This PR changes 500-999 lines, ignoring generated files. testing Test infrastructure and coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants