|
| 1 | +--- |
| 2 | +title: Implement indirect offset parsing in magic file grammar |
| 3 | +date: 2026-03-30 |
| 4 | +status: resolved |
| 5 | +severity: high |
| 6 | +category: integration-issues |
| 7 | +components: |
| 8 | + - parser/grammar |
| 9 | + - evaluator/offset |
| 10 | + - integration |
| 11 | +tags: |
| 12 | + - parser |
| 13 | + - indirect-offset |
| 14 | + - nom |
| 15 | + - magic-file-syntax |
| 16 | + - pointer-specifier |
| 17 | +issue: '#37' |
| 18 | +branch: 37-evaluator-implement-indirect-offset-resolution |
| 19 | +symptoms: |
| 20 | + - parse_offset("(0x3c.l)") fails with parse error |
| 21 | + - Magic files containing indirect offset syntax cannot be loaded via MagicDatabase::load_from_file() |
| 22 | + - resolve_indirect_offset() is unreachable dead code from text-magic loading path |
| 23 | +root_cause: parse_offset() had no branch for '('-prefixed input; always delegated to parse_number() which only handles numeric literals |
| 24 | +solution_files: |
| 25 | + - src/parser/grammar/mod.rs |
| 26 | + - src/parser/grammar/tests.rs |
| 27 | + - tests/indirect_offset_integration.rs |
| 28 | +related_gotchas: |
| 29 | + - parse_number() handles '-' prefix but not '+'; positive adjustments need manual '+' consumption |
| 30 | + - parse_value() requires quoted strings; bare string literals cause integration test failures |
| 31 | +--- |
| 32 | + |
| 33 | +# Indirect Offset Parser-Evaluator Sync |
| 34 | + |
| 35 | +## Problem |
| 36 | + |
| 37 | +The evaluator for indirect offsets (`resolve_indirect_offset()` in `src/evaluator/offset/indirect.rs`) was fully implemented with 35 unit tests, but the parser in `src/parser/grammar/mod.rs` could not produce `OffsetSpec::Indirect` AST nodes. The `parse_offset()` function only handled absolute numeric offsets and had no branch for `(`-prefixed indirect offset syntax like `(0x3c.l)` or `(0x3c.l+4)`. |
| 38 | + |
| 39 | +This meant the feature was unreachable through the public `MagicDatabase::load_from_file()` API -- the primary way users load text magic files. |
| 40 | + |
| 41 | +## Root Cause |
| 42 | + |
| 43 | +`parse_offset()` unconditionally delegated to `parse_number()`, which only parses numeric literals. Input starting with `(` was rejected as a parse error. The evaluator code was effectively dead code from the text-magic loading path. |
| 44 | + |
| 45 | +## Solution |
| 46 | + |
| 47 | +### 1. Added `pointer_specifier_to_type()` helper |
| 48 | + |
| 49 | +Maps single-character pointer specifiers to `(TypeKind, Endianness)` per libmagic convention: |
| 50 | + |
| 51 | +| Specifier | Width | Endianness | |
| 52 | +| ---------- | ------ | ---------- | |
| 53 | +| `.b`, `.B` | 1 byte | Native | |
| 54 | +| `.s` | 2 byte | Native | |
| 55 | +| `.S` | 2 byte | Big | |
| 56 | +| `.l` | 4 byte | Native | |
| 57 | +| `.L` | 4 byte | Big | |
| 58 | +| `.q` | 8 byte | Native | |
| 59 | +| `.Q` | 8 byte | Big | |
| 60 | + |
| 61 | +All pointer types are unsigned (`signed: false`). Lowercase = native endian, uppercase = big-endian. |
| 62 | + |
| 63 | +### 2. Added `parse_indirect_offset()` function |
| 64 | + |
| 65 | +Parses `(base.type)` and `(base.type+/-adj)` syntax: |
| 66 | + |
| 67 | +1. Consume `(` |
| 68 | +2. Parse base offset via `parse_number()` |
| 69 | +3. Consume `.` and type specifier character |
| 70 | +4. Optionally parse adjustment (see gotcha below) |
| 71 | +5. Consume `)` |
| 72 | +6. Return `OffsetSpec::Indirect { base_offset, pointer_type, adjustment, endian }` |
| 73 | + |
| 74 | +### 3. Updated `parse_offset()` to branch on leading `(` |
| 75 | + |
| 76 | +```rust |
| 77 | +pub fn parse_offset(input: &str) -> IResult<&str, OffsetSpec> { |
| 78 | + let (input, _) = multispace0(input)?; |
| 79 | + if input.starts_with('(') { |
| 80 | + let (input, spec) = parse_indirect_offset(input)?; |
| 81 | + let (input, _) = multispace0(input)?; |
| 82 | + Ok((input, spec)) |
| 83 | + } else { |
| 84 | + let (input, offset_value) = parse_number(input)?; |
| 85 | + let (input, _) = multispace0(input)?; |
| 86 | + Ok((input, OffsetSpec::Absolute(offset_value))) |
| 87 | + } |
| 88 | +} |
| 89 | +``` |
| 90 | + |
| 91 | +### 4. No changes needed to `parse_rule_offset()` |
| 92 | + |
| 93 | +It delegates to `parse_offset()`, so hierarchical forms like `>(0x3c.l)` work automatically. |
| 94 | + |
| 95 | +## Gotchas Discovered |
| 96 | + |
| 97 | +### `parse_number()` does not handle `+` prefix |
| 98 | + |
| 99 | +`parse_number()` handles `-` internally but not `+`. For `+N` adjustments, the `+` must be consumed manually: |
| 100 | + |
| 101 | +```rust |
| 102 | +let (input, adjustment) = if input.starts_with('+') { |
| 103 | + let (input, _) = char('+')(input)?; |
| 104 | + parse_number(input)? |
| 105 | +} else if input.starts_with('-') { |
| 106 | + parse_number(input)? |
| 107 | +} else { |
| 108 | + (input, 0) |
| 109 | +}; |
| 110 | +``` |
| 111 | + |
| 112 | +Do NOT modify `parse_number()` globally -- it is shared by offset and value parsing, and adding `+` support would change semantics elsewhere. |
| 113 | + |
| 114 | +### `parse_value()` requires quoted strings |
| 115 | + |
| 116 | +Integration tests initially failed because `parse_value()` does not accept bare strings. Magic file string values must be quoted: |
| 117 | + |
| 118 | +```text |
| 119 | +# Correct |
| 120 | +0 string "MZ" DOS executable |
| 121 | +
|
| 122 | +# Wrong -- parse_value() rejects bare "MZ" |
| 123 | +0 string MZ DOS executable |
| 124 | +``` |
| 125 | + |
| 126 | +### Use big-endian specifiers in cross-platform tests |
| 127 | + |
| 128 | +Prefer `.L` (big-endian long) over `.l` (native) in integration test magic files so byte buffers are deterministic across architectures. |
| 129 | + |
| 130 | +## Prevention Strategies |
| 131 | + |
| 132 | +### Parser-Evaluator Parity Checklist |
| 133 | + |
| 134 | +When adding a new AST variant, ensure: |
| 135 | + |
| 136 | +1. **Parser produces it** -- unit test parses raw syntax, asserts correct AST node |
| 137 | +2. **Evaluator consumes it** -- unit test constructs AST node, asserts evaluation result |
| 138 | +3. **End-to-end test exists** -- integration test through `MagicDatabase::load_from_file()` proves the full pipeline works |
| 139 | +4. **Codegen handles it** -- if it can appear in built-in rules, update `src/parser/codegen.rs` |
| 140 | +5. **Strength calculation covers it** -- update `src/evaluator/strength.rs` if scoring changes |
| 141 | + |
| 142 | +### Integration Test Template |
| 143 | + |
| 144 | +```rust |
| 145 | +#[test] |
| 146 | +fn test_feature_end_to_end() { |
| 147 | + let temp_dir = TempDir::new().unwrap(); |
| 148 | + let magic_path = temp_dir.path().join("test.magic"); |
| 149 | + let mut f = fs::File::create(&magic_path).unwrap(); |
| 150 | + writeln!(f, r#"0 string "MAGIC" Test match"#).unwrap(); |
| 151 | + |
| 152 | + let db = MagicDatabase::load_from_file(&magic_path).unwrap(); |
| 153 | + let result = db.evaluate_buffer(b"MAGIC\x00data").unwrap(); |
| 154 | + assert!(result.description.contains("Test match")); |
| 155 | +} |
| 156 | +``` |
| 157 | + |
| 158 | +## Cross-References |
| 159 | + |
| 160 | +- **Evaluator solution**: `docs/solutions/logic-errors/indirect-offset-resolution.md` |
| 161 | +- **Magic format spec**: `docs/MAGIC_FORMAT.md` (lines 106-126, indirect offset section) |
| 162 | +- **Gotchas**: `GOTCHAS.md` sections 3.5 (`parse_number` `+` limitation) and 3.6 (quoted strings) |
| 163 | +- **Architecture**: `AGENTS.md` offset specifications section |
| 164 | +- **Issue**: #37 (indirect offset resolution) |
| 165 | +- **Related gotchas**: S2 (enum variant checklists), S3 (parser architecture split), S5 (numeric type pitfalls) |
0 commit comments