Skip to content
Merged
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -150,3 +150,4 @@ todos/
.tessl/tiles/
.tessl/RULES.md
.claude/instincts/
.claude/scheduled_tasks.lock
18 changes: 7 additions & 11 deletions benches/regex_bench.rs
Original file line number Diff line number Diff line change
Expand Up @@ -30,20 +30,16 @@ use libmagic_rs::{EvaluationConfig, MagicRule, OffsetSpec, Operator, TypeKind, V
use std::hint::black_box;

fn regex_rule(pattern: &str) -> MagicRule {
MagicRule {
offset: OffsetSpec::Absolute(0),
typ: TypeKind::Regex {
MagicRule::new(
OffsetSpec::Absolute(0),
TypeKind::Regex {
flags: RegexFlags::default(),
count: RegexCount::Default,
},
op: Operator::Equal,
value: Value::String(pattern.to_string()),
message: "bench-match".to_string(),
children: vec![],
level: 0,
strength_modifier: None,
value_transform: None,
}
Operator::Equal,
Value::String(pattern.to_string()),
"bench-match".to_string(),
)
}

fn make_context() -> EvaluationContext {
Expand Down
20 changes: 10 additions & 10 deletions docs/MAGIC_FORMAT.md
Original file line number Diff line number Diff line change
Expand Up @@ -368,17 +368,17 @@ The range is MANDATORY (`NonZeroUsize`). Bare `search` and `search/0` are parse

Flags for `search` type modify comparison and anchor behavior. Most flags share semantics with `string` type flags; `/s` is search-specific.

| Flag | Description |
| ---- | -------------------------------------------------------------------------------------------------------------------- |
| `/s` | Anchor advance lands at match-START instead of match-END (required for TGA footer, sfnt name table) |
| `/c` | Case-insensitive match (lowercase pattern chars fold file bytes to lower; uppercase pattern chars are literal) |
| `/C` | Case-insensitive match (uppercase pattern chars fold file bytes to upper; lowercase pattern chars are literal) |
| `/w` | Whitespace-optional (pattern whitespace matches zero or more file whitespace) |
| Flag | Description |
| ---- | ---------------------------------------------------------------------------------------------------------------------- |
| `/s` | Anchor advance lands at match-START instead of match-END (required for TGA footer, sfnt name table) |
| `/c` | Case-insensitive match (lowercase pattern chars fold file bytes to lower; uppercase pattern chars are literal) |
| `/C` | Case-insensitive match (uppercase pattern chars fold file bytes to upper; lowercase pattern chars are literal) |
| `/w` | Whitespace-optional (pattern whitespace matches zero or more file whitespace) |
| `/W` | Whitespace-required-compact (pattern whitespace requires at least one file whitespace; additional whitespace consumed) |
| `/T` | Trim leading/trailing ASCII whitespace from pattern before comparison |
| `/f` | Full-word match (post-match word-boundary check; next byte must be EOF or non-word char) |
| `/t` | Force text test (MIME-output hint; no effect on comparison) |
| `/b` | Force binary test (MIME-output hint; no effect on comparison) |
| `/T` | Trim leading/trailing ASCII whitespace from pattern before comparison |
| `/f` | Full-word match (post-match word-boundary check; next byte must be EOF or non-word char) |
| `/t` | Force text test (MIME-output hint; no effect on comparison) |
| `/b` | Force binary test (MIME-output hint; no effect on comparison) |

**`/c` vs `/C` asymmetry:** The pattern character controls fold direction. `/c` with lowercase pattern chars folds the file byte to lowercase; uppercase pattern chars in the same pattern are compared literally. See String Flags section above for details.

Expand Down
14 changes: 7 additions & 7 deletions docs/src/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,13 @@ pub struct EvaluationConfig {

### Field Reference

| Field | Type | Default | Bounds | Purpose |
| --------------------- | ------------- | ------- | -------------- | ---------------------------------------------------------------------------------------------------------------------------- |
| `max_recursion_depth` | `u32` | 20 | 1 -- 1000 | Limits nested rule traversal depth to prevent stack overflow |
| `max_string_length` | `usize` | 8192 | 1 -- 1_048_576 | Caps bytes read for string types to prevent memory exhaustion |
| `stop_at_first_match` | `bool` | `true` | -- | When true, evaluation stops after the first matching top-level rule (children of that rule are still evaluated -- see below) |
| `enable_mime_types` | `bool` | `false` | -- | When true, maps file type descriptions to standard MIME types |
| `timeout_ms` | `Option<u64>` | `None` | 1 -- 300_000 | Per-file evaluation timeout in milliseconds; `None` disables |
| Field | Type | Default | Bounds | Purpose |
| --------------------- | ------------- | ------- | -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `max_recursion_depth` | `u32` | 20 | 1 -- 1000 | Limits nested rule traversal depth to prevent stack overflow |
| `max_string_length` | `usize` | 8192 | 1 -- 1_048_576 | Caps bytes read for `TypeKind::String` reads (both unflagged and flagged `/c`/`/C`/`/w`/`/W`/`/T`/`/f` variants). Does NOT apply to `TypeKind::PString` (returns `TypeReadError::BufferOverrun` on oversized prefix) or `TypeKind::String16` (hardcoded 8192-unit ceiling) |
| `stop_at_first_match` | `bool` | `true` | -- | When true, evaluation stops after the first matching top-level rule (children of that rule are still evaluated -- see below) |
| `enable_mime_types` | `bool` | `false` | -- | When true, maps file type descriptions to standard MIME types |
| `timeout_ms` | `Option<u64>` | `None` | 1 -- 300_000 | Per-file evaluation timeout in milliseconds; `None` disables |

### `stop_at_first_match` Semantics

Expand Down
26 changes: 4 additions & 22 deletions docs/src/evaluator.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ The evaluator module separates public interface from implementation:
- **`evaluator/offset/mod.rs`** - Offset resolution
- **`evaluator/operators/mod.rs`** - Operator application
- **`evaluator/types/`** - Type reading and coercion (organized as submodules as of v0.4.2)
- **`types/mod.rs`** - Public API surface: `read_typed_value`, `coerce_value_to_type`, re-exports type functions
- **`types/mod.rs`** - Internal type-reading API: `pub(crate)` dispatchers (`read_typed_value_with_pattern`, `read_pattern_match`, `coerce_value_to_type`) plus re-exports of leaf `read_*` functions
- **`types/numeric.rs`** - Numeric type handling: `read_byte`, `read_short`, `read_long`, `read_quad` with endianness and signedness support
- **`types/float.rs`** - Floating-point type handling: `read_float` (32-bit IEEE 754), `read_double` (64-bit IEEE 754) with endianness support
- **`types/date.rs`** - Date and timestamp type handling: `read_date` (32-bit Unix timestamps), `read_qdate` (64-bit Unix timestamps) with endianness and UTC/local time support
Expand Down Expand Up @@ -91,6 +91,8 @@ pub struct RuleMatch {

The `Value` type is from `parser::ast::Value` and represents the actual matched content according to the rule's type specification. Note that `Value` implements only `PartialEq` (not `Eq`) due to floating-point NaN semantics.

`RuleMatch` also carries a `pub type_kind: TypeKind` field used by the engine for width calculations and format substitution. The field is part of the public Rust API (accessible to consumers via field access) but is excluded from JSON serialization via `#[serde(skip)]` so the parser AST does not leak into structured output.

### Offset Resolution (`evaluator/offset.rs`)

- **Absolute offsets**: Direct file positions (`0`, `0x100`)
Expand Down Expand Up @@ -127,27 +129,7 @@ The types module is organized into submodules for numeric, floating-point, date/
- **Search**: Bounded literal pattern scan with flag support. `search/N` caps the scan window to `N` bytes from the offset; range is mandatory and non-zero (`NonZeroUsize`). Accepts nine flag suffixes (`/s`, `/c`, `/C`, `/w`, `/W`, `/T`, `/f`, `/t`, `/b`) that control scan behavior and anchor advancement. When only anchor-only flags (`/s`, `/t`, `/b`) are set or no flags are present, the SIMD-accelerated `memchr::memmem::find` fast path is used. When comparison-altering flags (`/c`, `/C`, `/w`, `/W`, `/T`, `/f`) are set, a byte-by-byte comparison through `compare_string_with_flags` is used. The `/s` flag sets the previous-match anchor for relative-offset children to match-START instead of match-END.
- **Bounds checking**: Prevents buffer overruns

```rust
// Non-pattern types use the 3-arg convenience wrapper:
pub fn read_typed_value(
buffer: &[u8],
offset: usize,
type_kind: &TypeKind,
) -> Result<Value, TypeReadError>

// Pattern-bearing types (Regex, Search) thread the rule's value operand
// through as the match pattern:
pub fn read_typed_value_with_pattern(
buffer: &[u8],
offset: usize,
type_kind: &TypeKind,
pattern: Option<&Value>,
) -> Result<Value, TypeReadError>
```

The engine uses `read_typed_value_with_pattern` uniformly and passes `Some(&rule.value)` for every rule; the convenience `read_typed_value` is a thin wrapper that forwards `pattern: None`. For pattern-bearing types a genuine "no match" is collapsed to `Value::String(String::new())` in the `read_typed_value_with_pattern` return so the back-compat `Value` shape is preserved; the engine instead calls `read_pattern_match` directly, which returns `Result<Option<Value>, _>` so zero-width matches (e.g. `^`, `a*`) can be distinguished from genuine misses.

The `read_byte` function signature changed in v0.2.0 to accept three parameters (`buffer`, `offset`, and `signed`) instead of two, allowing explicit control over signed vs unsigned byte interpretation.
The type-reading functions are internal (`pub(crate)`) engine helpers. External library users evaluate rules through `evaluate_rules` or `evaluate_rules_with_config`.

**Floating-Point Type Reading (`evaluator/types/float.rs`):**

Expand Down
14 changes: 7 additions & 7 deletions docs/src/library-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,13 +70,13 @@ When no rules match, the description defaults to `"data"` with confidence `0.0`.

Controls evaluation behavior with these fields:

| Field | Type | Default | Description |
| --------------------- | ------------- | ------- | ---------------------------------------- |
| `max_recursion_depth` | `u32` | `20` | Maximum depth for nested rule evaluation |
| `max_string_length` | `usize` | `8192` | Maximum bytes read for string types |
| `stop_at_first_match` | `bool` | `true` | Stop after the first matching rule |
| `enable_mime_types` | `bool` | `false` | Map descriptions to MIME types |
| `timeout_ms` | `Option<u64>` | `None` | Evaluation timeout in milliseconds |
| Field | Type | Default | Description |
| --------------------- | ------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `max_recursion_depth` | `u32` | `20` | Maximum depth for nested rule evaluation |
| `max_string_length` | `usize` | `8192` | Caps bytes read for `TypeKind::String` reads (both unflagged and with `/c`/`/C`/`/w`/`/W`/`/T`/`/f` flags); does NOT apply to `TypeKind::PString` or `TypeKind::String16` |
| `stop_at_first_match` | `bool` | `true` | Stop after the first matching rule |
| `enable_mime_types` | `bool` | `false` | Map descriptions to MIME types |
| `timeout_ms` | `Option<u64>` | `None` | Evaluation timeout in milliseconds |

### Presets

Expand Down
Loading
Loading