Skip to content

Merge fix-f-drive-parity: single-pass MFT pipeline matching C++ architecture#2

Merged
githubrobbi merged 8 commits intomainfrom
fix-f-drive-parity
Mar 14, 2026
Merged

Merge fix-f-drive-parity: single-pass MFT pipeline matching C++ architecture#2
githubrobbi merged 8 commits intomainfrom
fix-f-drive-parity

Conversation

@githubrobbi
Copy link
Owner

Summary

This PR merges the fix-f-drive-parity branch which implements:

  • Single-pass MFT pipeline matching C++ architecture
  • Extension record merging for multi-attribute files
  • Proper handling of hard links, ADS, and sparse files
  • F: drive parity fixes

Testing

  • Cross-platform CI passes
  • Ready for Windows parity testing

githubrobbi and others added 8 commits March 14, 2026 03:58
Un-deprecate and modernize parse_record_to_index() and parse_extension_to_index()
to handle ALL attribute types that parse_record_full() handles. This restores the
single-pass C++-style inline parsing approach as the primary path.

Key Changes:
- Remove #[deprecated] annotations from both parsers
- Add complete attribute handling:
  - $REPARSE_POINT - extract reparse tag, add as stream
  - $INDEX_ROOT, $INDEX_ALLOCATION, $BITMAP - directory index handling
  - $OBJECT_ID, $VOLUME_NAME, $VOLUME_INFORMATION, $PROPERTY_SET
  - $EA, $EA_INFORMATION, $LOGGED_UTILITY_STREAM
  - $SECURITY_DESCRIPTOR, $ATTRIBUTE_LIST
  - Unknown attribute types (default: case in C++)
- Set reparse_tag and total_stream_count in FileRecord
- Handle directory size from accumulated $I30 attributes
- Merge directory index sizes in extension records

Files Modified:
- crates/uffs-mft/src/io/parser/index.rs (845 LOC)
- crates/uffs-mft/src/io/parser/index_extension.rs (727 LOC)
- scripts/ci/file_size_exceptions.txt (add exceptions for large parsers)

Achieves feature parity with parse_record_full() + MftRecordMerger pipeline.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implements new file-based reader path using direct-to-index parser from
Wave 1. This provides a single-pass parsing path that works on both
Windows and macOS, bypassing the old multi-pass pipeline.

Key changes:
- Added load_raw_to_index_direct() in reader/persistence.rs
- Copied direct-to-index parsers to cross-platform parse/ module
- Added env var switch in commands/load.rs (UFFS_LEGACY_PARSE=1 for old path)
- Both parser paths available for validation and comparison

New path: raw MFT → fixup → parse_record_to_index() → MftIndex
Old path: raw MFT → parse_record_full() → MftRecordMerger → from_parsed_records()

The new direct parser eliminates intermediate ParsedRecord allocations
and matches the C++ single-pass architecture.

All tests pass (105/105). F-drive parity verification requires Windows.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…parsers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the multi-pass pipeline (parse_record_full → MftRecordMerger →
from_parsed_records) with single-pass direct-to-index parsing in the
SlidingIocpInline path.

Changes:
- Pre-allocate MftIndex upfront instead of MftRecordMerger
- Call parse_record_to_index() directly during I/O completions
- Eliminate intermediate ParsedRecord allocation and merge phase
- Simplify logging (no separate io_ms/merge_ms split)

This completes Wave 3 — the IOCP path now uses the same zero-copy
parser as the file-based reader (Wave 2), eliminating redundant
allocations and improving parity across code paths.

Tests: 105/105 pass, just check clean, just lint-prod clean

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…eline

Wave 4 cleanup: Remove legacy multi-pass pipeline from hot path.

- Added UFFS_LEGACY_PARSE=1 environment variable to force legacy pipeline
  - When set, forces SlidingIocp mode instead of SlidingIocpInline
  - Allows debugging/comparison with old parse_record_full path

- Documented legacy pipeline components with clear markers:
  - parse_record_full(): part of old parse → merger → from_parsed_records
  - MftRecordMerger: merges extension records in legacy path
  - from_parsed_records(): final stage of legacy multi-pass pipeline
  - read_all_parallel_with_progress(): uses legacy pipeline

- Legacy pipeline still used by:
  - Legacy read modes (Parallel, Pipelined, PipelinedParallel, SlidingIocp)
  - File-based readers (load_raw_to_index_with_options)
  - Tests and diagnostic tools
  - UFFS_LEGACY_PARSE=1 escape hatch

- Hot path (SlidingIocpInline) bypasses legacy pipeline entirely:
  - Uses direct-to-index parsers (parse_record_to_index)
  - Builds index incrementally during I/O
  - Creates parent placeholders on-demand
  - No intermediate Vec<ParsedRecord> allocation

Verified: just check, just lint-prod, cargo test -p uffs-mft all pass

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add wait_ms/parse_ms/overlap_pct metrics to the sliding window IOCP
reader for measuring I/O overlap effectiveness on Windows. The IOCP
sliding window already provides optimal overlap (matching C++ design),
so no structural changes needed — just observability.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add `MftIndex::with_capacity_optimized()` that pre-allocates ALL vectors
based on MFT bitmap statistics to eliminate Vec resizing during the hot
parse loop. This matches the C++ pre-allocation strategy.

Pre-allocation ratios (matching C++ ntfs_index_accessors.hpp lines 525-544):
- records: estimated_records + 5% safety margin
- frs_to_idx: max_frs + 1 (sparse lookup array)
- names: estimated_records * 23 (~23 chars avg)
- links: estimated_records / 16 (6% have hardlinks)
- streams: estimated_records / 4 (25% have ADS)
- internal_streams: estimated_records / 20 (5% internal)
- children: estimated_records * 3/2 (dirs have multiple children)

Added MftBitmap::max_frs_in_use() to scan backwards and find the highest
in-use FRS number, used for frs_to_idx sizing.

IOCP direct-to-index reader now uses optimized pre-allocation.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace Vec<u16> with SmallVec<[u16; 64]> for UTF-16 filename decoding
in both direct-to-index parsers. This avoids heap allocation for typical
filenames (<= 64 chars), reducing per-record overhead in the hot parse loop.

Matches the optimization already present in the full parser (parse/full.rs).

Files modified:
- crates/uffs-mft/src/io/parser/index.rs (IOCP hot path)
- crates/uffs-mft/src/parse/direct_index.rs (file-based hot path)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@githubrobbi githubrobbi merged commit f231a35 into main Mar 14, 2026
4 of 6 checks passed
@githubrobbi githubrobbi deleted the fix-f-drive-parity branch March 14, 2026 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant