Skip to content

Commit 6426d18

Browse files
feat(evaluator): implement indirect offset resolution (#37) (#199)
## Summary Implements indirect offset resolution (`OffsetSpec::Indirect`) for the evaluator, enabling detection of complex binary formats like PE executables where a pointer at a fixed offset must be dereferenced to locate the actual header. - **4-step evaluation pipeline**: resolve base offset → read pointer value (with endianness) → apply adjustment (checked arithmetic) → validate bounds - **Full parser support**: indirect offset syntax `(base.type)+adj` with GNU `file` semantics — lowercase specifiers = little-endian, uppercase = big-endian, signed by default - **35 unit tests + 9 integration tests** covering all pointer types, endiannesses, signed/unsigned values, adjustments, overflow, and PE-header-style scenarios ### Key files - `src/evaluator/offset/indirect.rs` — core implementation (740 lines) - `src/parser/grammar/mod.rs` — `parse_indirect_offset()` and `pointer_specifier_to_type()` - `tests/indirect_offset_integration.rs` — end-to-end tests ## Test Plan - [x] All 1,299 tests pass (`cargo test`) - [x] All pointer types (byte, short, long, quad) with both endiannesses - [x] Signed and unsigned pointer values (signed reinterpreted as raw unsigned per libmagic) - [x] Positive and negative adjustments with overflow protection - [x] Buffer overrun detection at both pointer-read and final-offset stages - [x] PE-header-style real-world scenario (0x3C pointer dereference) - [x] Integration tests verifying full parser → evaluator pipeline Closes #37 --------- Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
1 parent 20b23c0 commit 6426d18

26 files changed

Lines changed: 1965 additions & 217 deletions

.devcontainer/devcontainer.json

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -69,10 +69,7 @@
6969
"--all-features"
7070
],
7171
"rust-analyzer.cargo.features": "all",
72-
"rust-analyzer.rustfmt.extraArgs": [
73-
"--edition",
74-
"2024"
75-
],
72+
"rust-analyzer.rustfmt.extraArgs": ["--edition", "2024"],
7673
"editor.formatOnSave": true,
7774
"editor.codeActionsOnSave": {
7875
"source.fixAll": "explicit"

.gemini/settings.json

Lines changed: 6 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,9 @@
11
{
2-
"mcpServers": {
3-
"tessl": {
4-
"type": "stdio",
5-
"command": "tessl",
6-
"args": [
7-
"mcp",
8-
"start"
9-
]
2+
"mcpServers": {
3+
"tessl": {
4+
"type": "stdio",
5+
"command": "tessl",
6+
"args": ["mcp", "start"]
7+
}
108
}
11-
}
129
}

.mcp.json

Lines changed: 6 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,9 @@
11
{
2-
"mcpServers": {
3-
"tessl": {
4-
"type": "stdio",
5-
"command": "tessl",
6-
"args": [
7-
"mcp",
8-
"start"
9-
]
2+
"mcpServers": {
3+
"tessl": {
4+
"type": "stdio",
5+
"command": "tessl",
6+
"args": ["mcp", "start"]
7+
}
108
}
11-
}
129
}

.mdformat.toml

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -10,20 +10,23 @@ exclude = [
1010
"megalinter-reports/**",
1111
"**/*.result",
1212
"**/*.testfile",
13+
"**/SKILL.md", # AI stuff
14+
".claude/**/*", # AI stuff
15+
".tessl/**/*", # AI stuff
1316
]
1417
validate = true
1518
number = true
1619
wrap = "no"
1720
end_of_line = "lf"
18-
# extensions = [
19-
# "gfm",
20-
# "frontmatter",
21-
# "footnote",
22-
# "simple_breaks",
23-
# "gfm_alerts",
24-
# "toc",
25-
# "wikilink",
26-
# ]
21+
extensions = [
22+
"gfm",
23+
"footnote",
24+
"front_matters",
25+
"simple_breaks",
26+
"wikilink",
27+
"gfm_alerts",
28+
"toc",
29+
]
2730

2831
[plugin.mkdocs]
2932
align_semantic_breaks_in_lists = true

.tessl/.gitignore

Lines changed: 0 additions & 2 deletions
This file was deleted.

.vscode/settings.json

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -21,12 +21,8 @@
2121
"git.rebaseWhenSync": true,
2222
"git.replaceTagsWhenPull": true,
2323
"githubPullRequests.codingAgent.uiIntegration": true,
24-
"ruff.path": [
25-
"${workspaceFolder}/.vscode/mise-tools/ruff"
26-
],
27-
"ruff.interpreter": [
28-
"${workspaceFolder}/.vscode/mise-tools/python"
29-
],
24+
"ruff.path": ["${workspaceFolder}/.vscode/mise-tools/ruff"],
25+
"ruff.interpreter": ["${workspaceFolder}/.vscode/mise-tools/python"],
3026
"python.defaultInterpreterPath": "${workspaceFolder}/.vscode/mise-tools/python",
3127
"bun.runtime": "${workspaceFolder}/.vscode/mise-tools/bun"
32-
}
28+
}

AGENTS.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -204,7 +204,7 @@ cargo test --doc # Test documentation examples
204204

205205
### Currently Implemented (v0.1.0)
206206

207-
- **Offsets**: Absolute and from-end specifications (indirect and relative are parsed but not yet evaluated)
207+
- **Offsets**: Absolute, from-end, and indirect specifications (relative offsets are parsed but not yet evaluated)
208208
- **Types**: `byte`, `short`, `long`, `quad`, `float`, `double`, `string`, `pstring` with endianness support; unsigned variants `ubyte`, `ushort`/`ubeshort`/`uleshort`, `ulong`/`ubelong`/`ulelong`, `uquad`/`ubequad`/`ulequad`; float/double endian variants `befloat`/`lefloat`, `bedouble`/`ledouble`; 32-bit date/timestamp types `date`/`ldate`/`bedate`/`beldate`/`ledate`/`leldate`; 64-bit date/timestamp types `qdate`/`qldate`/`beqdate`/`beqldate`/`leqdate`/`leqldate`; `pstring` is a Pascal string (length-prefixed) with support for 1/2/4-byte length prefixes via `/B`, `/H` (2-byte BE), `/h` (2-byte LE), `/L` (4-byte BE), `/l` (4-byte LE) suffixes, and the `/J` flag (stored length includes prefix width, JPEG convention) which is combinable with width suffixes (e.g., `pstring/HJ`); date values formatted as "Www Mmm DD HH:MM:SS YYYY" matching GNU `file` output; types are signed by default (libmagic-compatible)
209209
- **Operators**: `=` (equal), `!=` (not equal), `<` (less than), `>` (greater than), `<=` (less equal), `>=` (greater equal), `&` (bitwise AND with optional mask), `^` (bitwise XOR), `~` (bitwise NOT), `x` (any value)
210210
- **Nested Rules**: Hierarchical rule evaluation with proper indentation
@@ -245,9 +245,8 @@ impl BinaryRegex for regex::bytes::Regex {
245245

246246
### Offset Specifications
247247

248-
- Indirect offsets are parsed into the AST but evaluation is not yet implemented (#37)
248+
- Indirect offsets are fully implemented (parsing + evaluation) with specifiers: `.b/.B` (byte), `.s/.S` (short), `.l/.L` (long), `.q/.Q` (quad); lowercase = little-endian, uppercase = big-endian (GNU `file` semantics); pointer types signed by default; adjustment after closing paren: `(base.type)+adj`
249249
- Relative offsets are parsed into the AST but evaluation is not yet implemented (#38)
250-
- Only absolute and from-end offsets are fully functional
251250

252251
### Magic File Syntax
253252

@@ -570,3 +569,7 @@ This project has the OSSF Best Practices passing badge. Maintain these standards
570569
- SECURITY.md documents vulnerability reporting with scope, safe harbor, and PGP key
571570
- AGENTS.md must accurately reflect implemented features (not aspirational)
572571
- `docs/src/release-verification.md` documents artifact signing for users
572+
573+
## Agent Rules <!-- tessl-managed -->
574+
575+
@.tessl/RULES.md follow the [instructions](.tessl/RULES.md)

AI_POLICY.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# AI Usage Policy
2+
3+
We build operator-focused security tools. AI coding assistants are part of how we do that. This policy is not anti-AI -- it is pro-accountability.
4+
5+
Think of AI assistance like spellcheck. It catches typos, suggests corrections, and speeds up the mechanical parts of writing. But you are still responsible for your words and their consequences.
6+
7+
## The Rule
8+
9+
**You own every line you submit.** You must be able to explain what it does and how it interacts with the rest of the system without asking your AI to explain it back to you.
10+
11+
Everything else follows from that.
12+
13+
## How We Work
14+
15+
- **Disclose your tools.** Note what you used in your PR description -- Claude Code, Copilot, Cursor, whatever. No specific format required.
16+
17+
- **Review AI-generated text before posting.** Issues, discussions, and PR descriptions must reflect your understanding, not a language model's first draft. Read it, cut the filler, make sure it says what you mean.
18+
19+
- **No AI-generated media.** No generated images, logos, audio, or video. Text-based diagrams (ASCII art, Mermaid) and code are acceptable.
20+
21+
- **Unreviewed output gets closed.** Hallucinated APIs, boilerplate that ignores project conventions, suggestions you clearly did not run -- these get closed without review. We are not a QA service for your AI's output.
22+
23+
## Why
24+
25+
Transparent by design means knowing what the code does and why it is there. Tested under pressure means every change was understood by the person who submitted it. AI makes capable engineers faster. It does not replace the understanding that makes contributions trustworthy.
26+
27+
Every pull request is reviewed by a human. Submitting work you do not understand shifts that burden onto maintainers. That is not how we operate.
28+
29+
## New Contributors
30+
31+
Use AI to learn the codebase. Read the code it generates. Run it. Break it. Then submit work that reflects your understanding. We will help you through review -- that deal only works if the code is yours.

GOTCHAS.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,18 @@ The nom `tuple` combinator is deprecated. Use bare tuple syntax `(a, b, c)` dire
5959

6060
`type_keyword_to_kind` has `#[allow(clippy::too_many_lines)]` because it exceeds 100 lines with all date keywords.
6161

62+
### 3.5 `parse_number` Does Not Handle `+` Prefix
63+
64+
`parse_number` handles `-` signs but not `+`. When parsing syntax like `+4` (e.g., indirect offset adjustments), consume the `+` character manually before calling `parse_number`.
65+
66+
### 3.6 `parse_value` Requires Quoted Strings
67+
68+
`parse_value()` does not accept bare unquoted strings. String values in magic file rules must be quoted (e.g., `string "MZ"` not `string MZ`). Integration tests writing magic files must use `r#"0 string "MZ" description"#` format.
69+
70+
### 3.7 Indirect Offset Pointer Specifiers Follow GNU `file` Semantics
71+
72+
Lowercase pointer specifiers (`.s`, `.l`, `.q`) map to **little-endian**, not native endian. Uppercase (`.S`, `.L`, `.Q`) map to big-endian. All numeric pointer types are **signed by default** (per S6.3). The adjustment is parsed **after** the closing paren: `(base.type)+adj`, not `(base.type+adj)`.
73+
6274
## 4. Module Visibility & Re-exports
6375

6476
### 4.1 Private Engine Module
Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
---
2+
title: Implement indirect offset parsing in magic file grammar
3+
date: 2026-03-30
4+
status: resolved
5+
severity: high
6+
category: integration-issues
7+
components:
8+
- parser/grammar
9+
- evaluator/offset
10+
- integration
11+
tags:
12+
- parser
13+
- indirect-offset
14+
- nom
15+
- magic-file-syntax
16+
- pointer-specifier
17+
issue: '#37'
18+
branch: 37-evaluator-implement-indirect-offset-resolution
19+
symptoms:
20+
- parse_offset("(0x3c.l)") fails with parse error
21+
- Magic files containing indirect offset syntax cannot be loaded via MagicDatabase::load_from_file()
22+
- resolve_indirect_offset() is unreachable dead code from text-magic loading path
23+
root_cause: parse_offset() had no branch for '('-prefixed input; always delegated to parse_number() which only handles numeric literals
24+
solution_files:
25+
- src/parser/grammar/mod.rs
26+
- src/parser/grammar/tests.rs
27+
- tests/indirect_offset_integration.rs
28+
related_gotchas:
29+
- parse_number() handles '-' prefix but not '+'; positive adjustments need manual '+' consumption
30+
- parse_value() requires quoted strings; bare string literals cause integration test failures
31+
---
32+
33+
# Indirect Offset Parser-Evaluator Sync
34+
35+
## Problem
36+
37+
The evaluator for indirect offsets (`resolve_indirect_offset()` in `src/evaluator/offset/indirect.rs`) was fully implemented with 35 unit tests, but the parser in `src/parser/grammar/mod.rs` could not produce `OffsetSpec::Indirect` AST nodes. The `parse_offset()` function only handled absolute numeric offsets and had no branch for `(`-prefixed indirect offset syntax like `(0x3c.l)` or `(0x3c.l+4)`.
38+
39+
This meant the feature was unreachable through the public `MagicDatabase::load_from_file()` API -- the primary way users load text magic files.
40+
41+
## Root Cause
42+
43+
`parse_offset()` unconditionally delegated to `parse_number()`, which only parses numeric literals. Input starting with `(` was rejected as a parse error. The evaluator code was effectively dead code from the text-magic loading path.
44+
45+
## Solution
46+
47+
### 1. Added `pointer_specifier_to_type()` helper
48+
49+
Maps single-character pointer specifiers to `(TypeKind, Endianness)` per libmagic convention:
50+
51+
| Specifier | Width | Endianness |
52+
| ---------- | ------ | ---------- |
53+
| `.b`, `.B` | 1 byte | Native |
54+
| `.s` | 2 byte | Native |
55+
| `.S` | 2 byte | Big |
56+
| `.l` | 4 byte | Native |
57+
| `.L` | 4 byte | Big |
58+
| `.q` | 8 byte | Native |
59+
| `.Q` | 8 byte | Big |
60+
61+
All pointer types are unsigned (`signed: false`). Lowercase = native endian, uppercase = big-endian.
62+
63+
### 2. Added `parse_indirect_offset()` function
64+
65+
Parses `(base.type)` and `(base.type+/-adj)` syntax:
66+
67+
1. Consume `(`
68+
2. Parse base offset via `parse_number()`
69+
3. Consume `.` and type specifier character
70+
4. Optionally parse adjustment (see gotcha below)
71+
5. Consume `)`
72+
6. Return `OffsetSpec::Indirect { base_offset, pointer_type, adjustment, endian }`
73+
74+
### 3. Updated `parse_offset()` to branch on leading `(`
75+
76+
```rust
77+
pub fn parse_offset(input: &str) -> IResult<&str, OffsetSpec> {
78+
let (input, _) = multispace0(input)?;
79+
if input.starts_with('(') {
80+
let (input, spec) = parse_indirect_offset(input)?;
81+
let (input, _) = multispace0(input)?;
82+
Ok((input, spec))
83+
} else {
84+
let (input, offset_value) = parse_number(input)?;
85+
let (input, _) = multispace0(input)?;
86+
Ok((input, OffsetSpec::Absolute(offset_value)))
87+
}
88+
}
89+
```
90+
91+
### 4. No changes needed to `parse_rule_offset()`
92+
93+
It delegates to `parse_offset()`, so hierarchical forms like `>(0x3c.l)` work automatically.
94+
95+
## Gotchas Discovered
96+
97+
### `parse_number()` does not handle `+` prefix
98+
99+
`parse_number()` handles `-` internally but not `+`. For `+N` adjustments, the `+` must be consumed manually:
100+
101+
```rust
102+
let (input, adjustment) = if input.starts_with('+') {
103+
let (input, _) = char('+')(input)?;
104+
parse_number(input)?
105+
} else if input.starts_with('-') {
106+
parse_number(input)?
107+
} else {
108+
(input, 0)
109+
};
110+
```
111+
112+
Do NOT modify `parse_number()` globally -- it is shared by offset and value parsing, and adding `+` support would change semantics elsewhere.
113+
114+
### `parse_value()` requires quoted strings
115+
116+
Integration tests initially failed because `parse_value()` does not accept bare strings. Magic file string values must be quoted:
117+
118+
```text
119+
# Correct
120+
0 string "MZ" DOS executable
121+
122+
# Wrong -- parse_value() rejects bare "MZ"
123+
0 string MZ DOS executable
124+
```
125+
126+
### Use big-endian specifiers in cross-platform tests
127+
128+
Prefer `.L` (big-endian long) over `.l` (native) in integration test magic files so byte buffers are deterministic across architectures.
129+
130+
## Prevention Strategies
131+
132+
### Parser-Evaluator Parity Checklist
133+
134+
When adding a new AST variant, ensure:
135+
136+
1. **Parser produces it** -- unit test parses raw syntax, asserts correct AST node
137+
2. **Evaluator consumes it** -- unit test constructs AST node, asserts evaluation result
138+
3. **End-to-end test exists** -- integration test through `MagicDatabase::load_from_file()` proves the full pipeline works
139+
4. **Codegen handles it** -- if it can appear in built-in rules, update `src/parser/codegen.rs`
140+
5. **Strength calculation covers it** -- update `src/evaluator/strength.rs` if scoring changes
141+
142+
### Integration Test Template
143+
144+
```rust
145+
#[test]
146+
fn test_feature_end_to_end() {
147+
let temp_dir = TempDir::new().unwrap();
148+
let magic_path = temp_dir.path().join("test.magic");
149+
let mut f = fs::File::create(&magic_path).unwrap();
150+
writeln!(f, r#"0 string "MAGIC" Test match"#).unwrap();
151+
152+
let db = MagicDatabase::load_from_file(&magic_path).unwrap();
153+
let result = db.evaluate_buffer(b"MAGIC\x00data").unwrap();
154+
assert!(result.description.contains("Test match"));
155+
}
156+
```
157+
158+
## Cross-References
159+
160+
- **Evaluator solution**: `docs/solutions/logic-errors/indirect-offset-resolution.md`
161+
- **Magic format spec**: `docs/MAGIC_FORMAT.md` (lines 106-126, indirect offset section)
162+
- **Gotchas**: `GOTCHAS.md` sections 3.5 (`parse_number` `+` limitation) and 3.6 (quoted strings)
163+
- **Architecture**: `AGENTS.md` offset specifications section
164+
- **Issue**: #37 (indirect offset resolution)
165+
- **Related gotchas**: S2 (enum variant checklists), S3 (parser architecture split), S5 (numeric type pitfalls)

0 commit comments

Comments
 (0)