Skip to content

Commit 5279d9b

Browse files
githubrobbiclaude
andcommitted
fix(mft): apply snapshot/restore pattern to directory index merge
Fixes remaining LIVE parser parity mismatches by applying the same snapshot/restore pattern used for $DATA to directory index sizes. Root cause: Directory index merging used unconditional += at direct_index_extension.rs:742-743, causing data loss when IOCP delivered extension records before base records. When extension arrives before base: - Extension adds dir_index to first_stream.size (0 + ext = ext) ✓ - Base overwrites with = SizeInfo {...}, losing extension data ✗ Fix: Check if first_stream.size is empty (0, 0): - If empty → write extension's dir_index values - Otherwise → accumulate using saturating_add This mirrors the proven fix from commit e90aade that reduced mismatches from 16,517 → 422. Expected to resolve remaining small directory size deltas (+51, +11 bytes). Changes: - Apply snapshot/restore to dir_index merge (direct_index_extension.rs:737-766) - Add chaos test harness for reproducing LIVE out-of-order scenarios - Add regression tests for extension-before-base merging - Add CHAOS_TEST_HARNESS.md documentation Validation: - All 116 tests pass (OFFLINE correctness preserved) - Code formatted and linted (ultra-strict) - Ready for Windows LIVE validation Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
1 parent 1edc6b1 commit 5279d9b

File tree

7 files changed

+875
-4
lines changed

7 files changed

+875
-4
lines changed

CHAOS_TEST_HARNESS.md

Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
# Chaos Test Harness - Deterministic MFT Out-of-Order Processing
2+
3+
## Overview
4+
5+
The chaos test harness (`crates/uffs-mft/src/io/readers/parallel/tests_chaos.rs`) simulates the out-of-order record processing that occurs in Windows LIVE parsing due to:
6+
- **IOCP overlapped I/O**: Chunks can complete in any order
7+
- **Parallel rayon parsing**: Extension records may be processed before their base records
8+
9+
This allows reproducible testing of race conditions and merge bugs **without requiring Windows**.
10+
11+
## Architecture
12+
13+
```
14+
Offline MFT File
15+
16+
Split into chunks (8MB default)
17+
18+
Reorder chunks (controlled chaos)
19+
20+
Process through same pipeline as LIVE
21+
22+
MftIndex output
23+
```
24+
25+
## Chaos Strategies
26+
27+
1. **Random** - Seeded shuffle (most realistic)
28+
- Uses ChaCha8Rng for deterministic randomization
29+
- Same seed → same chunk order → reproducible failures
30+
31+
2. **Reverse** - Process chunks in reverse order
32+
- Simple but effective for testing
33+
- Guaranteed extension-before-base for end-of-drive files
34+
35+
3. **Interleaved** - Swap adjacent chunks
36+
- Controlled chaos
37+
- Good for boundary conditions
38+
39+
## Usage
40+
41+
### Running Tests
42+
43+
```bash
44+
# Run all chaos tests (requires offline MFT)
45+
cargo test -p uffs-mft -- chaos --ignored --nocapture
46+
47+
# Run specific strategy
48+
cargo test -p uffs-mft -- test_random_order_d_drive --ignored --nocapture
49+
cargo test -p uffs-mft -- test_reverse_order_d_drive --ignored --nocapture
50+
cargo test -p uffs-mft -- test_interleaved_order_d_drive --ignored --nocapture
51+
```
52+
53+
### Requirements
54+
55+
- **Offline MFT**: `/Users/rnio/uffs_data/drive_d/D_mft.bin`
56+
- **Platform**: macOS (cross-platform testing)
57+
- **Dependencies**: `rand`, `rand_chacha` (dev dependencies)
58+
59+
### Test Output
60+
61+
Each test shows:
62+
- Total chunks processed
63+
- Chunk reordering statistics
64+
- Extension-before-base occurrences
65+
- Final record count
66+
- Success/failure status
67+
68+
Example output:
69+
```
70+
✅ RANDOM-ORDER parsing completed (seed=42)
71+
Chunks processed: 128
72+
Extension-before-base: 47 occurrences
73+
Total records: 1,234,567
74+
```
75+
76+
## Finding Bugs
77+
78+
### Comparing with Reference
79+
80+
```bash
81+
# 1. Run chaos test
82+
cargo test -p uffs-mft -- test_random_order_d_drive --ignored --nocapture > chaos_output.txt
83+
84+
# 2. Compare with C++ reference
85+
# The chaos harness outputs can be compared with:
86+
# /Users/rnio/uffs_data/drive_d/cpp_d.txt
87+
88+
# 3. Look for discrepancies in:
89+
# - Directory sizes
90+
# - Extension record counts
91+
# - Data run totals
92+
```
93+
94+
### Debugging Specific FRS
95+
96+
The harness logs extension-before-base events:
97+
```rust
98+
tracing::debug!(frs = ext_rec.frs, "Extension arrived before base");
99+
```
100+
101+
Use `RUST_LOG=debug` to see these:
102+
```bash
103+
RUST_LOG=uffs_mft=debug cargo test -p uffs-mft -- test_random_order_d_drive --ignored --nocapture 2>&1 | grep "Extension arrived"
104+
```
105+
106+
## Customizing Tests
107+
108+
### Different Chunk Sizes
109+
110+
```rust
111+
let chaos_reader = ChaosMftReader::new(
112+
ChaosStrategy::Random { seed: 42 },
113+
2 * 1024 * 1024, // 2MB chunks (more fine-grained chaos)
114+
);
115+
```
116+
117+
### Different Seeds
118+
119+
```rust
120+
ChaosStrategy::Random { seed: 123456 } // Try different seeds
121+
```
122+
123+
### Custom Strategies
124+
125+
Add new variants to `ChaosStrategy`:
126+
```rust
127+
enum ChaosStrategy {
128+
// ...
129+
BlockSwap { block_size: usize }, // Swap N-chunk blocks
130+
DelayedExtensions, // Always process extensions last
131+
}
132+
```
133+
134+
## Known Issues
135+
136+
1. **Memory usage**: Large MFTs with small chunks use more memory
137+
2. **Performance**: Chaos tests are slower than normal parsing (~2-3x)
138+
3. **Determinism**: Only applies to chunk order, not within-chunk rayon parallelism
139+
140+
## Integration with CI
141+
142+
These tests are `#[ignore]` by default (require offline MFT). To run in CI:
143+
144+
```bash
145+
# In .github/workflows/ci.yml
146+
- name: Chaos tests
147+
if: env.HAS_OFFLINE_MFT == 'true'
148+
run: cargo test -p uffs-mft -- chaos --ignored
149+
```
150+
151+
## References
152+
153+
- LIVE parser: `crates/uffs-mft/src/parse/direct_index.rs`
154+
- Extension merger: `crates/uffs-mft/src/parse/direct_index_extension.rs`
155+
- Parallel reader: `crates/uffs-mft/src/io/readers/parallel/`
156+
- C++ reference: `_trash/cpp_*.txt`
157+
158+
## Troubleshooting
159+
160+
**Test panics with "offline MFT not found"**
161+
- Ensure `/Users/rnio/uffs_data/drive_d/D_mft.bin` exists
162+
- Or update the path in the test
163+
164+
**Compilation errors**
165+
- Run `cargo check -p uffs-mft --tests`
166+
- Ensure `rand` and `rand_chacha` are in `[dev-dependencies]`
167+
168+
**No output**
169+
- Add `--nocapture` flag
170+
- Use `RUST_LOG=info` or `RUST_LOG=debug`
171+
172+
**Non-deterministic results**
173+
- Rayon parallelism within chunks is not controlled
174+
- Use single-threaded mode: `RAYON_NUM_THREADS=1`

crates/uffs-mft/Cargo.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,8 @@ crossbeam-channel = "0.5.15"
6868
criterion.workspace = true
6969
proptest.workspace = true
7070
tokio = { workspace = true, features = ["test-util", "macros"] }
71+
rand = "0.8.5"
72+
rand_chacha = "0.3.1"
7173

7274
[[bench]]
7375
name = "mft_read"

crates/uffs-mft/src/io/readers/parallel/mod.rs

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,9 @@ mod to_index_parallel;
1515
#[cfg(test)]
1616
mod tests;
1717

18+
#[cfg(test)]
19+
mod tests_chaos;
20+
1821
pub struct ReadParseTiming {
1922
/// Time spent in I/O operations (reading chunks from disk).
2023
/// This is the cumulative time spent in `ReadFile` calls.

0 commit comments

Comments
 (0)