Merge fix-f-drive-parity: single-pass MFT pipeline matching C++ architecture by githubrobbi · Pull Request #2 · githubrobbi/UltraFastFileSearch

githubrobbi · 2026-03-14T12:34:44Z

Summary

This PR merges the fix-f-drive-parity branch which implements:

Single-pass MFT pipeline matching C++ architecture
Extension record merging for multi-attribute files
Proper handling of hard links, ADS, and sparse files
F: drive parity fixes

Testing

Cross-platform CI passes
Ready for Windows parity testing

Un-deprecate and modernize parse_record_to_index() and parse_extension_to_index() to handle ALL attribute types that parse_record_full() handles. This restores the single-pass C++-style inline parsing approach as the primary path. Key Changes: - Remove #[deprecated] annotations from both parsers - Add complete attribute handling: - $REPARSE_POINT - extract reparse tag, add as stream - $INDEX_ROOT, $INDEX_ALLOCATION, $BITMAP - directory index handling - $OBJECT_ID, $VOLUME_NAME, $VOLUME_INFORMATION, $PROPERTY_SET - $EA, $EA_INFORMATION, $LOGGED_UTILITY_STREAM - $SECURITY_DESCRIPTOR, $ATTRIBUTE_LIST - Unknown attribute types (default: case in C++) - Set reparse_tag and total_stream_count in FileRecord - Handle directory size from accumulated $I30 attributes - Merge directory index sizes in extension records Files Modified: - crates/uffs-mft/src/io/parser/index.rs (845 LOC) - crates/uffs-mft/src/io/parser/index_extension.rs (727 LOC) - scripts/ci/file_size_exceptions.txt (add exceptions for large parsers) Achieves feature parity with parse_record_full() + MftRecordMerger pipeline. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Implements new file-based reader path using direct-to-index parser from Wave 1. This provides a single-pass parsing path that works on both Windows and macOS, bypassing the old multi-pass pipeline. Key changes: - Added load_raw_to_index_direct() in reader/persistence.rs - Copied direct-to-index parsers to cross-platform parse/ module - Added env var switch in commands/load.rs (UFFS_LEGACY_PARSE=1 for old path) - Both parser paths available for validation and comparison New path: raw MFT → fixup → parse_record_to_index() → MftIndex Old path: raw MFT → parse_record_full() → MftRecordMerger → from_parsed_records() The new direct parser eliminates intermediate ParsedRecord allocations and matches the C++ single-pass architecture. All tests pass (105/105). F-drive parity verification requires Windows. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

…parsers Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace the multi-pass pipeline (parse_record_full → MftRecordMerger → from_parsed_records) with single-pass direct-to-index parsing in the SlidingIocpInline path. Changes: - Pre-allocate MftIndex upfront instead of MftRecordMerger - Call parse_record_to_index() directly during I/O completions - Eliminate intermediate ParsedRecord allocation and merge phase - Simplify logging (no separate io_ms/merge_ms split) This completes Wave 3 — the IOCP path now uses the same zero-copy parser as the file-based reader (Wave 2), eliminating redundant allocations and improving parity across code paths. Tests: 105/105 pass, just check clean, just lint-prod clean Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

…eline Wave 4 cleanup: Remove legacy multi-pass pipeline from hot path. - Added UFFS_LEGACY_PARSE=1 environment variable to force legacy pipeline - When set, forces SlidingIocp mode instead of SlidingIocpInline - Allows debugging/comparison with old parse_record_full path - Documented legacy pipeline components with clear markers: - parse_record_full(): part of old parse → merger → from_parsed_records - MftRecordMerger: merges extension records in legacy path - from_parsed_records(): final stage of legacy multi-pass pipeline - read_all_parallel_with_progress(): uses legacy pipeline - Legacy pipeline still used by: - Legacy read modes (Parallel, Pipelined, PipelinedParallel, SlidingIocp) - File-based readers (load_raw_to_index_with_options) - Tests and diagnostic tools - UFFS_LEGACY_PARSE=1 escape hatch - Hot path (SlidingIocpInline) bypasses legacy pipeline entirely: - Uses direct-to-index parsers (parse_record_to_index) - Builds index incrementally during I/O - Creates parent placeholders on-demand - No intermediate Vec<ParsedRecord> allocation Verified: just check, just lint-prod, cargo test -p uffs-mft all pass Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Add wait_ms/parse_ms/overlap_pct metrics to the sliding window IOCP reader for measuring I/O overlap effectiveness on Windows. The IOCP sliding window already provides optimal overlap (matching C++ design), so no structural changes needed — just observability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add `MftIndex::with_capacity_optimized()` that pre-allocates ALL vectors based on MFT bitmap statistics to eliminate Vec resizing during the hot parse loop. This matches the C++ pre-allocation strategy. Pre-allocation ratios (matching C++ ntfs_index_accessors.hpp lines 525-544): - records: estimated_records + 5% safety margin - frs_to_idx: max_frs + 1 (sparse lookup array) - names: estimated_records * 23 (~23 chars avg) - links: estimated_records / 16 (6% have hardlinks) - streams: estimated_records / 4 (25% have ADS) - internal_streams: estimated_records / 20 (5% internal) - children: estimated_records * 3/2 (dirs have multiple children) Added MftBitmap::max_frs_in_use() to scan backwards and find the highest in-use FRS number, used for frs_to_idx sizing. IOCP direct-to-index reader now uses optimized pre-allocation. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Replace Vec<u16> with SmallVec<[u16; 64]> for UTF-16 filename decoding in both direct-to-index parsers. This avoids heap allocation for typical filenames (<= 64 chars), reducing per-record overhead in the hot parse loop. Matches the optimization already present in the full parser (parse/full.rs). Files modified: - crates/uffs-mft/src/io/parser/index.rs (IOCP hot path) - crates/uffs-mft/src/parse/direct_index.rs (file-based hot path) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

githubrobbi and others added 8 commits March 14, 2026 03:58

fix(lint): correct module-level lint expectations in direct-to-index …

56e2b17

…parsers Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

githubrobbi merged commit f231a35 into main Mar 14, 2026
4 of 6 checks passed

githubrobbi deleted the fix-f-drive-parity branch March 14, 2026 12:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge fix-f-drive-parity: single-pass MFT pipeline matching C++ architecture#2

Merge fix-f-drive-parity: single-pass MFT pipeline matching C++ architecture#2
githubrobbi merged 8 commits intomainfrom
fix-f-drive-parity

githubrobbi commented Mar 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

githubrobbi commented Mar 14, 2026

Summary

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant