feat(evaluator): implement indirect offset resolution (#37)#199
feat(evaluator): implement indirect offset resolution (#37)#199unclesp1d3r merged 29 commits intomainfrom
Conversation
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…lines Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…lity Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…details Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…unctions Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…mantics Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
|
Important Review skippedAuto reviews are limited based on label configuration. 🏷️ Required labels (at least one) (3)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR implements full indirect offset resolution (evaluating pointer-chasing offsets like Changes
Sequence DiagramsequenceDiagram
participant Parser as Parser
participant Evaluator as Evaluator
participant Buffer as Buffer
participant TypeReader as Type Reader
Parser->>Parser: Detect '(' prefix in offset
Parser->>Parser: Parse (base_offset.pointer_spec)
Parser->>Parser: Parse optional +/- adjustment
Parser->>Evaluator: Return OffsetSpec::Indirect{base_offset, pointer_type, adjustment, endian}
Evaluator->>Evaluator: resolve_offset(OffsetSpec::Indirect)
Evaluator->>Evaluator: Resolve base_offset to absolute index
Evaluator->>Buffer: Read at base_offset position
Buffer->>TypeReader: Read pointer value (Byte/Short/Long/Quad)
TypeReader->>TypeReader: Apply endianness (lowercase=LE, uppercase=BE)
TypeReader->>TypeReader: Handle signedness (default signed)
TypeReader->>Evaluator: Return pointer value
Evaluator->>Evaluator: Apply adjustment (+/- arithmetic)
Evaluator->>Evaluator: Bounds check final offset
Evaluator->>Parser: Return final offset or error
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 8 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (8 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Require conventional commit format per https://www.conventionalcommits.org/en/v1.0.0/. Skipped for bots.
🟢 Full CI must passWonderful, this rule succeeded.All CI checks must pass. Release-plz PRs are exempt because they only bump versions and changelogs (code was already tested on main), and GITHUB_TOKEN-triggered force-pushes suppress CI.
🟢 Do not merge outdated PRsWonderful, this rule succeeded.Make sure PRs are within 10 commits of the base branch before merging
|
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
docs/src/parser.md (1)
38-58:⚠️ Potential issue | 🟡 MinorParser docs still advertise indirect offsets as unimplemented.
Both the
parse_offsetsection and the limitations list still describe indirect offsets as planned / not yet implemented, but this PR addsparse_indirect_offset()plus evaluator support. Please update those sections to document the accepted indirect grammar and supported pointer specifiers so the guide matches the shipped behavior.As per coding guidelines,
docs/**: Review documentation quality, completeness, and accuracy. Ensure comprehensive API documentation, usage examples, and migration guides.Also applies to: 393-401
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/src/parser.md` around lines 38 - 58, Update the parse_offset docs to reflect that indirect offsets are implemented: replace the "planned" notes with a description of the indirect grammar accepted by parse_indirect_offset (include example strings like "@0x10", "@(reg+4)" or whatever pointer syntax your parser accepts), list supported pointer specifiers (e.g., '@', '&', register/label forms) and show at least one example and its return (e.g., parse_indirect_offset("@0x10") -> Ok(...)). Mark "Indirect offset parsing" as implemented in the Features checklist, update any references in the limitations section (lines ~393-401) to document evaluator support and accepted pointer specifiers, and ensure you reference the functions parse_offset and parse_indirect_offset in the prose so readers can find implementation details.docs/src/evaluator.md (1)
9-14:⚠️ Potential issue | 🟡 MinorMark indirect offset resolution as supported in this guide.
The overview, offset-resolution summary, and implementation checklist still omit or leave indirect offsets unchecked even though
src/evaluator/offset/indirect.rsnow implements them. Right now the evaluator docs understate the feature set exposed by this PR.As per coding guidelines,
docs/**: Review documentation quality, completeness, and accuracy. Ensure comprehensive API documentation, usage examples, and migration guides.Also applies to: 92-100, 549-561
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/src/evaluator.md` around lines 9 - 14, Update the evaluator documentation to mark indirect offset resolution as supported: edit the overview, the offset-resolution summary table/section, and the implementation checklist to indicate "indirect offsets" (or "indirect offset resolution") is implemented; reference the implemented module/file indirect.rs and any API names that expose indirect resolution so readers know it's available; ensure examples and the list (the steps like Load file, Resolve offsets, Read typed values, Apply operators, Process children, Collect results) and any checklist or migration notes explicitly mention indirect offsets as supported and include a short usage note or link to the indirect resolver implementation for further details.docs/src/magic-format.md (1)
97-107:⚠️ Potential issue | 🟠 MajorDocument
)+adj, not+adj)for indirect offsets.The implemented grammar is
(base.type)+adj, with the adjustment after the closing), and it accepts uppercase specifiers for big-endian pointer reads. The current reference shows(base.type+adj)and lowercase-only specifiers, so readers following this page will write syntax the parser rejects.As per coding guidelines,
docs/**: Review documentation quality, completeness, and accuracy. Ensure comprehensive API documentation, usage examples, and migration guides.Also applies to: 590-593
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/src/magic-format.md` around lines 97 - 107, Update the indirect offset syntax examples and type list to match the implemented grammar: change any occurrences of "(base.type+adj)" to "(base.type)+adj" (i.e., the adjustment appears after the closing parenthesis) and note that the parser accepts uppercase type specifiers for big-endian pointer reads; also update the types section listing ".b .s .l .q" to mention their uppercase equivalents (".B .S .L .Q") are valid. Ensure all other occurrences of the old syntax and lowercase-only type documentation (including the similar block referenced later) are corrected to the "(base.type)+adj" form and mention uppercase specifiers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@AGENTS.md`:
- Around line 207-208: Update AGENTS.md to accurately describe implemented
offset support: remove or reword the claim that indirect offsets are fully
implemented and instead state that only absolute and from-end offsets are
functional and that relative offsets are parsed but not evaluated; mention the
parser behavior in src/parser/grammar/mod.rs which currently only branches on
indirect '(' and numeric absolute forms to justify the limitation, and apply the
same correction to the other referenced bullet block (lines 248-250) so the
document matches the current implementation rather than aspirational features.
In `@docs/src/ast-structures.md`:
- Around line 245-250: The fenced code block showing the example currently lacks
a language tag (MD040); update the triple-backtick fence that surrounds the
snippet containing "0 pstring JPEG" and "0 pstring/H JPEG" to include
the language identifier (use "text") by replacing ``` with ```text so
markdownlint stops flagging it.
In `@src/evaluator/offset/indirect.rs`:
- Around line 35-41: OffsetSpec::Indirect carries endian twice but
read_pointer() only uses the outer endian for Short/Long/Quad; add a consistency
check in the pattern match where OffsetSpec::Indirect is destructured (the
binding of base_offset, pointer_type, adjustment, endian) to verify that any
inner/endian metadata (the one inside pointer_type for Short/Long/Quad) matches
the outer endian, and return an error (or panic) when they contradict, or
alternatively derive the read endianness from a single canonical field and
remove/ignore the duplicate; update the same logic for the second occurrence
around the 80-84 area so all indirect pointer paths enforce the
single-source-of-truth endianness.
- Around line 147-735: Many tests in the tests module duplicate permutations
(sizes/endian/signed) and push the file beyond the repo size cap; refactor by
converting repetitive one-assertion tests into table-driven tests that iterate
over a Vec/array of cases (use a struct Case { name: &str, buffer: Vec<u8> or
&'static [u8], spec_params... } and call resolve_indirect_offset per case)
reusing the existing indirect() helper and asserting with the case name in the
failure message, and/or split the huge tests module into smaller submodules
(e.g., pointer_read_tests, adjustment_tests, unsupported_type_tests) to keep
each source file under ~500–600 lines; update references to
resolve_indirect_offset, indirect, and test case names accordingly.
In `@src/parser/grammar/mod.rs`:
- Around line 158-175: The match arm in pointer_specifier_to_type currently
collapses 'b' and 'B' into the same tuple, losing the distinction between
lowercase and uppercase pointer specifiers; update pointer_specifier_to_type so
'b' returns Some((TypeKind::Byte { signed: true }, Endianness::Little)) and 'B'
returns Some((TypeKind::Byte { signed: true }, Endianness::Big)) (i.e., split
the 'b' | 'B' arm into two arms) so OffsetSpec::Indirect's endian field
preserves the original `.b` vs `.B` spelling in the AST and round-trips
correctly.
In `@src/parser/grammar/tests.rs`:
- Around line 284-289: Move the indirect-offset test suite out of the oversized
tests.rs into a new focused test module file named indirect_offset_tests.rs:
create the new file (e.g., src/parser/grammar/tests/indirect_offset_tests.rs),
add "use super::*;" at top and copy the indirect-offset test functions/comments
unchanged (they reference parsing behaviors like indirect offset parsing and any
test functions in tests.rs that mention "indirect offset" / GNU file semantics),
then in the original tests.rs replace the moved block with a single module
declaration (mod indirect_offset_tests;) so the tests compile and run; ensure
the new file is under the same parent cfg(test) test module scope so visibility
from super::* remains correct.
---
Outside diff comments:
In `@docs/src/evaluator.md`:
- Around line 9-14: Update the evaluator documentation to mark indirect offset
resolution as supported: edit the overview, the offset-resolution summary
table/section, and the implementation checklist to indicate "indirect offsets"
(or "indirect offset resolution") is implemented; reference the implemented
module/file indirect.rs and any API names that expose indirect resolution so
readers know it's available; ensure examples and the list (the steps like Load
file, Resolve offsets, Read typed values, Apply operators, Process children,
Collect results) and any checklist or migration notes explicitly mention
indirect offsets as supported and include a short usage note or link to the
indirect resolver implementation for further details.
In `@docs/src/magic-format.md`:
- Around line 97-107: Update the indirect offset syntax examples and type list
to match the implemented grammar: change any occurrences of "(base.type+adj)" to
"(base.type)+adj" (i.e., the adjustment appears after the closing parenthesis)
and note that the parser accepts uppercase type specifiers for big-endian
pointer reads; also update the types section listing ".b .s .l .q" to mention
their uppercase equivalents (".B .S .L .Q") are valid. Ensure all other
occurrences of the old syntax and lowercase-only type documentation (including
the similar block referenced later) are corrected to the "(base.type)+adj" form
and mention uppercase specifiers.
In `@docs/src/parser.md`:
- Around line 38-58: Update the parse_offset docs to reflect that indirect
offsets are implemented: replace the "planned" notes with a description of the
indirect grammar accepted by parse_indirect_offset (include example strings like
"@0x10", "@(reg+4)" or whatever pointer syntax your parser accepts), list
supported pointer specifiers (e.g., '@', '&', register/label forms) and show at
least one example and its return (e.g., parse_indirect_offset("@0x10") ->
Ok(...)). Mark "Indirect offset parsing" as implemented in the Features
checklist, update any references in the limitations section (lines ~393-401) to
document evaluator support and accepted pointer specifiers, and ensure you
reference the functions parse_offset and parse_indirect_offset in the prose so
readers can find implementation details.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 4045876d-1fd3-4d8f-9d19-86b4d404478b
⛔ Files ignored due to path filters (6)
.tessl/.gitignoreis excluded by none and included by nonedocs/solutions/integration-issues/indirect-offset-parser-evaluator-sync.mdis excluded by none and included by nonedocs/solutions/logic-errors/indirect-offset-gnu-file-semantics.mdis excluded by none and included by nonedocs/solutions/logic-errors/indirect-offset-resolution.mdis excluded by none and included by nonemise.lockis excluded by!**/*.lockand included by nonetests/indirect_offset_integration.rsis excluded by none and included by none
📒 Files selected for processing (12)
AGENTS.mdAI_POLICY.mdGOTCHAS.mddocs/src/ast-structures.mddocs/src/evaluator.mddocs/src/magic-format.mddocs/src/parser.mdsrc/evaluator/offset/indirect.rssrc/evaluator/offset/mod.rssrc/parser/grammar/mod.rssrc/parser/grammar/tests.rstessl.json
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- Fix AGENTS.md: remove contradictory "only absolute and from-end are fully functional" line since indirect offsets are now implemented - Fix ast-structures.md: add `text` language tag to fenced code block (MD040) - Split .b/.B byte pointer specifiers into separate match arms to preserve GNU file endianness distinction in the AST - Add debug_assert for endian consistency between inner TypeKind and outer OffsetSpec::Indirect endian field - Consolidate indirect.rs tests into table-driven format (735 -> 538 lines) - Extract indirect offset parser tests into dedicated submodule Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
Documentation Updates 3 document(s) were updated by changes in this PR: architectureView Changes@@ -71,7 +71,7 @@
- ✅ **File parsing**: Complete magic file parsing with `parse_text_magic_file()`
- ✅ **Hierarchy building**: Parent-child relationships via `build_rule_hierarchy()`
- ✅ **Format detection**: Text, directory, and binary format detection
-- 📋 **Indirect offsets**: Pointer dereferencing patterns
+- ✅ **Indirect offsets**: Pointer dereferencing patterns — fully implemented with `.b/.B`, `.s/.S`, `.l/.L`, `.q/.Q` specifiers; lowercase=little-endian, uppercase=big-endian; adjustment after closing paren: `(base.type)+adj`
### 2. AST Data Structures (`src/parser/ast.rs`)
@@ -164,7 +164,7 @@
- `offset/`: Offset resolution submodule
- `mod.rs`: Dispatcher (`resolve_offset`) and re-exports
- `absolute.rs`: `OffsetError`, `resolve_absolute_offset`
- - `indirect.rs`: `resolve_indirect_offset` stub (issue #37)
+ - `indirect.rs`: `resolve_indirect_offset` — indirect offset resolution, fully implemented.
- `relative.rs`: `resolve_relative_offset` stub (issue #38)
- `operators/`: Operator application submodule
- `mod.rs`: Dispatcher (`apply_operator`) and re-exports
@@ -185,7 +185,7 @@
- ✅ **Recursion Limiting**: Prevent stack overflow from deep nesting
- ✅ **Signedness Coercion**: Automatic value coercion for signed type comparisons (e.g., `0xff` → `-1` for signed byte)
- ✅ **Comparison Operators**: Full support for `<`, `>`, `<=`, `>=` with numeric and lexicographic ordering
-- 📋 **Indirect Offsets**: Pointer dereferencing (planned)
+- ✅ **Indirect Offsets**: Pointer dereferencing — 4-step pipeline: resolve base offset → read pointer value → apply adjustment (checked arithmetic) → validate bounds.
### 4. I/O Module (`src/io/`)
compatibilityView Changes@@ -73,7 +73,7 @@
| ------------------ | -------- | ------ | ----------- | ---------------------------- |
| Basic patterns | ✅ | ✅ | Complete | String, numeric matching |
| Hierarchical rules | ✅ | 🔄 | In Progress | Parent-child relationships |
-| Indirect offsets | ✅ | 📋 | Planned | Pointer dereferencing |
+| Indirect offsets | ✅ | ✅ | Complete | Pointer types byte/short/long/quad; lowercase specifiers = LE, uppercase = BE; optional adjustment with checked arithmetic; buffer bounds validation |
| Relative offsets | ✅ | 📋 | Planned | Position-relative addressing |
| Search patterns | ✅ | 📋 | Planned | Pattern searching in ranges |
| Bitwise operations | ✅ | ✅ | Complete | AND, OR operations |MAGIC_FORMATView Changes@@ -115,14 +115,20 @@
Indirect offset syntax:
- `(base.type)` - Read pointer at base, interpret as type
-- `(base.type+adj)` - Add adjustment to pointer value
-
-Types for indirect offsets:
-
-- `.b` - byte (1 byte)
-- `.s` - short (2 bytes)
-- `.l` - long (4 bytes)
-- `.q` - quad (8 bytes)
+- `(base.type)+adj` - Add adjustment to pointer value after dereferencing
+
+Pointer type specifiers (lowercase = little-endian, uppercase = big-endian; all signed by default):
+
+| Specifier | Size | Endianness |
+| --------- | ------- | ------------- |
+| `.b` | 1 byte | little-endian |
+| `.B` | 1 byte | big-endian |
+| `.s` | 2 bytes | little-endian |
+| `.S` | 2 bytes | big-endian |
+| `.l` | 4 bytes | little-endian |
+| `.L` | 4 bytes | big-endian |
+| `.q` | 8 bytes | little-endian |
+| `.Q` | 8 bytes | big-endian |
### Relative Offset
@@ -557,7 +563,7 @@
- Absolute offsets
- Relative offsets
-- Indirect offsets (basic)
+- Indirect offsets
- Byte, short, long, quad types (8-bit, 16-bit, 32-bit, 64-bit integers)
- String types (`string`, `pstring`)
- Date and timestamp types (32-bit and 64-bit Unix timestamps)How did I do? Any feedback? |
Summary
Implements indirect offset resolution (
OffsetSpec::Indirect) for the evaluator, enabling detection of complex binary formats like PE executables where a pointer at a fixed offset must be dereferenced to locate the actual header.(base.type)+adjwith GNUfilesemantics — lowercase specifiers = little-endian, uppercase = big-endian, signed by defaultKey files
src/evaluator/offset/indirect.rs— core implementation (740 lines)src/parser/grammar/mod.rs—parse_indirect_offset()andpointer_specifier_to_type()tests/indirect_offset_integration.rs— end-to-end testsTest Plan
cargo test)Closes #37