-
-
Notifications
You must be signed in to change notification settings - Fork 1
feat(evaluator): implement indirect offset resolution (#37) #199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
unclesp1d3r
merged 29 commits into
main
from
37-evaluator-implement-indirect-offset-resolution
Mar 30, 2026
Merged
Changes from all commits
Commits
Show all changes
29 commits
Select commit
Hold shift + click to select a range
0e592fd
chore(deps): update libmagic-rs name and dependencies versions
unclesp1d3r 281b643
chore(deps): update cargo-binstall to version 1.17.7 and rust to 1.94.1
unclesp1d3r d10cdb6
docs(agents): add agent rules section and reference to RULES.md
unclesp1d3r 4469117
docs(policy): add AI usage policy to clarify accountability and guide…
unclesp1d3r 11cd856
chore(ci): update mise-action to version 4.0.1 for improved functiona…
unclesp1d3r ad857ae
feat(offset): implement indirect offset resolution functionality
unclesp1d3r 014fa8a
docs(ast): enhance PString documentation with structure and behavior …
unclesp1d3r 25754b0
docs(evaluator): clarify length interpretation in type reading section
unclesp1d3r 082b017
docs(parser): improve formatting for date and timestamp types
unclesp1d3r f420dfc
chore(gitignore): remove unnecessary entries from .gitignore
unclesp1d3r 66c0723
feat(offset): implement parsing and evaluation for indirect offsets
unclesp1d3r a779736
feat(parser): implement parsing for indirect offset specifications
unclesp1d3r f24206b
test(offset): add integration tests for indirect offset resolution
unclesp1d3r 96b0a1e
feat(parser): implement indirect offset parsing for magic file grammar
unclesp1d3r a9e90eb
feat(evaluator): implement indirect offset resolution for binary formats
unclesp1d3r 2e8b034
docs(gotchas): document limitations of parse_number and parse_value f…
unclesp1d3r 94d8b91
docs(agents): clarify indirect offset specifications and GNU semantics
unclesp1d3r 9bd7759
docs(gotchas): document indirect offset pointer specifiers and GNU se…
unclesp1d3r b926bec
fix(parser): correct indirect offset parser for GNU file semantics
unclesp1d3r 5ea12f2
feat(parser): update indirect offset parsing to align with GNU semantics
unclesp1d3r c481d42
Merge branch 'main' into 37-evaluator-implement-indirect-offset-resol…
mergify[bot] 2befffa
feat(deps): update bun and cargo-binstall versions in mise.lock
unclesp1d3r 41763f2
refactor: address PR review feedback for indirect offset implementation
unclesp1d3r 585ad21
fix: split oversized table-driven test to satisfy clippy too_many_lines
unclesp1d3r 8ca4bd2
chore(deps): update mdformat version to 1.0.0 with new arguments
unclesp1d3r aa5a249
chore(deps): update bun and cargo-binstall versions in mise.lock
unclesp1d3r 25f35b1
style(settings): format JSON files for consistency and readability
unclesp1d3r e0a7dd2
style(devcontainer): format rust-analyzer extraArgs for consistency
unclesp1d3r 36b7444
chore(format): update mdformat configuration for improved consistency
unclesp1d3r File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,12 +1,9 @@ | ||
| { | ||
| "mcpServers": { | ||
| "tessl": { | ||
| "type": "stdio", | ||
| "command": "tessl", | ||
| "args": [ | ||
| "mcp", | ||
| "start" | ||
| ] | ||
| "mcpServers": { | ||
| "tessl": { | ||
| "type": "stdio", | ||
| "command": "tessl", | ||
| "args": ["mcp", "start"] | ||
| } | ||
| } | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,12 +1,9 @@ | ||
| { | ||
| "mcpServers": { | ||
| "tessl": { | ||
| "type": "stdio", | ||
| "command": "tessl", | ||
| "args": [ | ||
| "mcp", | ||
| "start" | ||
| ] | ||
| "mcpServers": { | ||
| "tessl": { | ||
| "type": "stdio", | ||
| "command": "tessl", | ||
| "args": ["mcp", "start"] | ||
| } | ||
| } | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,31 @@ | ||
| # AI Usage Policy | ||
|
|
||
| We build operator-focused security tools. AI coding assistants are part of how we do that. This policy is not anti-AI -- it is pro-accountability. | ||
|
|
||
| Think of AI assistance like spellcheck. It catches typos, suggests corrections, and speeds up the mechanical parts of writing. But you are still responsible for your words and their consequences. | ||
|
|
||
| ## The Rule | ||
|
|
||
| **You own every line you submit.** You must be able to explain what it does and how it interacts with the rest of the system without asking your AI to explain it back to you. | ||
|
|
||
| Everything else follows from that. | ||
|
|
||
| ## How We Work | ||
|
|
||
| - **Disclose your tools.** Note what you used in your PR description -- Claude Code, Copilot, Cursor, whatever. No specific format required. | ||
|
|
||
| - **Review AI-generated text before posting.** Issues, discussions, and PR descriptions must reflect your understanding, not a language model's first draft. Read it, cut the filler, make sure it says what you mean. | ||
|
|
||
| - **No AI-generated media.** No generated images, logos, audio, or video. Text-based diagrams (ASCII art, Mermaid) and code are acceptable. | ||
|
|
||
| - **Unreviewed output gets closed.** Hallucinated APIs, boilerplate that ignores project conventions, suggestions you clearly did not run -- these get closed without review. We are not a QA service for your AI's output. | ||
|
|
||
| ## Why | ||
|
|
||
| Transparent by design means knowing what the code does and why it is there. Tested under pressure means every change was understood by the person who submitted it. AI makes capable engineers faster. It does not replace the understanding that makes contributions trustworthy. | ||
|
|
||
| Every pull request is reviewed by a human. Submitting work you do not understand shifts that burden onto maintainers. That is not how we operate. | ||
|
|
||
| ## New Contributors | ||
|
|
||
| Use AI to learn the codebase. Read the code it generates. Run it. Break it. Then submit work that reflects your understanding. We will help you through review -- that deal only works if the code is yours. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
165 changes: 165 additions & 0 deletions
165
docs/solutions/integration-issues/indirect-offset-parser-evaluator-sync.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,165 @@ | ||
| --- | ||
| title: Implement indirect offset parsing in magic file grammar | ||
| date: 2026-03-30 | ||
| status: resolved | ||
| severity: high | ||
| category: integration-issues | ||
| components: | ||
| - parser/grammar | ||
| - evaluator/offset | ||
| - integration | ||
| tags: | ||
| - parser | ||
| - indirect-offset | ||
| - nom | ||
| - magic-file-syntax | ||
| - pointer-specifier | ||
| issue: '#37' | ||
| branch: 37-evaluator-implement-indirect-offset-resolution | ||
| symptoms: | ||
| - parse_offset("(0x3c.l)") fails with parse error | ||
| - Magic files containing indirect offset syntax cannot be loaded via MagicDatabase::load_from_file() | ||
| - resolve_indirect_offset() is unreachable dead code from text-magic loading path | ||
| root_cause: parse_offset() had no branch for '('-prefixed input; always delegated to parse_number() which only handles numeric literals | ||
| solution_files: | ||
| - src/parser/grammar/mod.rs | ||
| - src/parser/grammar/tests.rs | ||
| - tests/indirect_offset_integration.rs | ||
| related_gotchas: | ||
| - parse_number() handles '-' prefix but not '+'; positive adjustments need manual '+' consumption | ||
| - parse_value() requires quoted strings; bare string literals cause integration test failures | ||
| --- | ||
|
|
||
| # Indirect Offset Parser-Evaluator Sync | ||
|
|
||
| ## Problem | ||
|
|
||
| The evaluator for indirect offsets (`resolve_indirect_offset()` in `src/evaluator/offset/indirect.rs`) was fully implemented with 35 unit tests, but the parser in `src/parser/grammar/mod.rs` could not produce `OffsetSpec::Indirect` AST nodes. The `parse_offset()` function only handled absolute numeric offsets and had no branch for `(`-prefixed indirect offset syntax like `(0x3c.l)` or `(0x3c.l+4)`. | ||
|
|
||
| This meant the feature was unreachable through the public `MagicDatabase::load_from_file()` API -- the primary way users load text magic files. | ||
|
|
||
| ## Root Cause | ||
|
|
||
| `parse_offset()` unconditionally delegated to `parse_number()`, which only parses numeric literals. Input starting with `(` was rejected as a parse error. The evaluator code was effectively dead code from the text-magic loading path. | ||
|
|
||
| ## Solution | ||
|
|
||
| ### 1. Added `pointer_specifier_to_type()` helper | ||
|
|
||
| Maps single-character pointer specifiers to `(TypeKind, Endianness)` per libmagic convention: | ||
|
|
||
| | Specifier | Width | Endianness | | ||
| | ---------- | ------ | ---------- | | ||
| | `.b`, `.B` | 1 byte | Native | | ||
| | `.s` | 2 byte | Native | | ||
| | `.S` | 2 byte | Big | | ||
| | `.l` | 4 byte | Native | | ||
| | `.L` | 4 byte | Big | | ||
| | `.q` | 8 byte | Native | | ||
| | `.Q` | 8 byte | Big | | ||
|
|
||
| All pointer types are unsigned (`signed: false`). Lowercase = native endian, uppercase = big-endian. | ||
|
|
||
| ### 2. Added `parse_indirect_offset()` function | ||
|
|
||
| Parses `(base.type)` and `(base.type+/-adj)` syntax: | ||
|
|
||
| 1. Consume `(` | ||
| 2. Parse base offset via `parse_number()` | ||
| 3. Consume `.` and type specifier character | ||
| 4. Optionally parse adjustment (see gotcha below) | ||
| 5. Consume `)` | ||
| 6. Return `OffsetSpec::Indirect { base_offset, pointer_type, adjustment, endian }` | ||
|
|
||
| ### 3. Updated `parse_offset()` to branch on leading `(` | ||
|
|
||
| ```rust | ||
| pub fn parse_offset(input: &str) -> IResult<&str, OffsetSpec> { | ||
| let (input, _) = multispace0(input)?; | ||
| if input.starts_with('(') { | ||
| let (input, spec) = parse_indirect_offset(input)?; | ||
| let (input, _) = multispace0(input)?; | ||
| Ok((input, spec)) | ||
| } else { | ||
| let (input, offset_value) = parse_number(input)?; | ||
| let (input, _) = multispace0(input)?; | ||
| Ok((input, OffsetSpec::Absolute(offset_value))) | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| ### 4. No changes needed to `parse_rule_offset()` | ||
|
|
||
| It delegates to `parse_offset()`, so hierarchical forms like `>(0x3c.l)` work automatically. | ||
|
|
||
| ## Gotchas Discovered | ||
|
|
||
| ### `parse_number()` does not handle `+` prefix | ||
|
|
||
| `parse_number()` handles `-` internally but not `+`. For `+N` adjustments, the `+` must be consumed manually: | ||
|
|
||
| ```rust | ||
| let (input, adjustment) = if input.starts_with('+') { | ||
| let (input, _) = char('+')(input)?; | ||
| parse_number(input)? | ||
| } else if input.starts_with('-') { | ||
| parse_number(input)? | ||
| } else { | ||
| (input, 0) | ||
| }; | ||
| ``` | ||
|
|
||
| Do NOT modify `parse_number()` globally -- it is shared by offset and value parsing, and adding `+` support would change semantics elsewhere. | ||
|
|
||
| ### `parse_value()` requires quoted strings | ||
|
|
||
| Integration tests initially failed because `parse_value()` does not accept bare strings. Magic file string values must be quoted: | ||
|
|
||
| ```text | ||
| # Correct | ||
| 0 string "MZ" DOS executable | ||
|
|
||
| # Wrong -- parse_value() rejects bare "MZ" | ||
| 0 string MZ DOS executable | ||
| ``` | ||
|
|
||
| ### Use big-endian specifiers in cross-platform tests | ||
|
|
||
| Prefer `.L` (big-endian long) over `.l` (native) in integration test magic files so byte buffers are deterministic across architectures. | ||
|
|
||
| ## Prevention Strategies | ||
|
|
||
| ### Parser-Evaluator Parity Checklist | ||
|
|
||
| When adding a new AST variant, ensure: | ||
|
|
||
| 1. **Parser produces it** -- unit test parses raw syntax, asserts correct AST node | ||
| 2. **Evaluator consumes it** -- unit test constructs AST node, asserts evaluation result | ||
| 3. **End-to-end test exists** -- integration test through `MagicDatabase::load_from_file()` proves the full pipeline works | ||
| 4. **Codegen handles it** -- if it can appear in built-in rules, update `src/parser/codegen.rs` | ||
| 5. **Strength calculation covers it** -- update `src/evaluator/strength.rs` if scoring changes | ||
|
|
||
| ### Integration Test Template | ||
|
|
||
| ```rust | ||
| #[test] | ||
| fn test_feature_end_to_end() { | ||
| let temp_dir = TempDir::new().unwrap(); | ||
| let magic_path = temp_dir.path().join("test.magic"); | ||
| let mut f = fs::File::create(&magic_path).unwrap(); | ||
| writeln!(f, r#"0 string "MAGIC" Test match"#).unwrap(); | ||
|
|
||
| let db = MagicDatabase::load_from_file(&magic_path).unwrap(); | ||
| let result = db.evaluate_buffer(b"MAGIC\x00data").unwrap(); | ||
| assert!(result.description.contains("Test match")); | ||
| } | ||
| ``` | ||
|
|
||
| ## Cross-References | ||
|
|
||
| - **Evaluator solution**: `docs/solutions/logic-errors/indirect-offset-resolution.md` | ||
| - **Magic format spec**: `docs/MAGIC_FORMAT.md` (lines 106-126, indirect offset section) | ||
| - **Gotchas**: `GOTCHAS.md` sections 3.5 (`parse_number` `+` limitation) and 3.6 (quoted strings) | ||
| - **Architecture**: `AGENTS.md` offset specifications section | ||
| - **Issue**: #37 (indirect offset resolution) | ||
| - **Related gotchas**: S2 (enum variant checklists), S3 (parser architecture split), S5 (numeric type pitfalls) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.