Merge fix-f-drive-parity: single-pass MFT pipeline matching C++ architecture#2
Merged
githubrobbi merged 8 commits intomainfrom Mar 14, 2026
Merged
Merge fix-f-drive-parity: single-pass MFT pipeline matching C++ architecture#2githubrobbi merged 8 commits intomainfrom
githubrobbi merged 8 commits intomainfrom
Conversation
Un-deprecate and modernize parse_record_to_index() and parse_extension_to_index() to handle ALL attribute types that parse_record_full() handles. This restores the single-pass C++-style inline parsing approach as the primary path. Key Changes: - Remove #[deprecated] annotations from both parsers - Add complete attribute handling: - $REPARSE_POINT - extract reparse tag, add as stream - $INDEX_ROOT, $INDEX_ALLOCATION, $BITMAP - directory index handling - $OBJECT_ID, $VOLUME_NAME, $VOLUME_INFORMATION, $PROPERTY_SET - $EA, $EA_INFORMATION, $LOGGED_UTILITY_STREAM - $SECURITY_DESCRIPTOR, $ATTRIBUTE_LIST - Unknown attribute types (default: case in C++) - Set reparse_tag and total_stream_count in FileRecord - Handle directory size from accumulated $I30 attributes - Merge directory index sizes in extension records Files Modified: - crates/uffs-mft/src/io/parser/index.rs (845 LOC) - crates/uffs-mft/src/io/parser/index_extension.rs (727 LOC) - scripts/ci/file_size_exceptions.txt (add exceptions for large parsers) Achieves feature parity with parse_record_full() + MftRecordMerger pipeline. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implements new file-based reader path using direct-to-index parser from Wave 1. This provides a single-pass parsing path that works on both Windows and macOS, bypassing the old multi-pass pipeline. Key changes: - Added load_raw_to_index_direct() in reader/persistence.rs - Copied direct-to-index parsers to cross-platform parse/ module - Added env var switch in commands/load.rs (UFFS_LEGACY_PARSE=1 for old path) - Both parser paths available for validation and comparison New path: raw MFT → fixup → parse_record_to_index() → MftIndex Old path: raw MFT → parse_record_full() → MftRecordMerger → from_parsed_records() The new direct parser eliminates intermediate ParsedRecord allocations and matches the C++ single-pass architecture. All tests pass (105/105). F-drive parity verification requires Windows. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…parsers Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the multi-pass pipeline (parse_record_full → MftRecordMerger → from_parsed_records) with single-pass direct-to-index parsing in the SlidingIocpInline path. Changes: - Pre-allocate MftIndex upfront instead of MftRecordMerger - Call parse_record_to_index() directly during I/O completions - Eliminate intermediate ParsedRecord allocation and merge phase - Simplify logging (no separate io_ms/merge_ms split) This completes Wave 3 — the IOCP path now uses the same zero-copy parser as the file-based reader (Wave 2), eliminating redundant allocations and improving parity across code paths. Tests: 105/105 pass, just check clean, just lint-prod clean Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…eline Wave 4 cleanup: Remove legacy multi-pass pipeline from hot path. - Added UFFS_LEGACY_PARSE=1 environment variable to force legacy pipeline - When set, forces SlidingIocp mode instead of SlidingIocpInline - Allows debugging/comparison with old parse_record_full path - Documented legacy pipeline components with clear markers: - parse_record_full(): part of old parse → merger → from_parsed_records - MftRecordMerger: merges extension records in legacy path - from_parsed_records(): final stage of legacy multi-pass pipeline - read_all_parallel_with_progress(): uses legacy pipeline - Legacy pipeline still used by: - Legacy read modes (Parallel, Pipelined, PipelinedParallel, SlidingIocp) - File-based readers (load_raw_to_index_with_options) - Tests and diagnostic tools - UFFS_LEGACY_PARSE=1 escape hatch - Hot path (SlidingIocpInline) bypasses legacy pipeline entirely: - Uses direct-to-index parsers (parse_record_to_index) - Builds index incrementally during I/O - Creates parent placeholders on-demand - No intermediate Vec<ParsedRecord> allocation Verified: just check, just lint-prod, cargo test -p uffs-mft all pass Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add wait_ms/parse_ms/overlap_pct metrics to the sliding window IOCP reader for measuring I/O overlap effectiveness on Windows. The IOCP sliding window already provides optimal overlap (matching C++ design), so no structural changes needed — just observability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add `MftIndex::with_capacity_optimized()` that pre-allocates ALL vectors based on MFT bitmap statistics to eliminate Vec resizing during the hot parse loop. This matches the C++ pre-allocation strategy. Pre-allocation ratios (matching C++ ntfs_index_accessors.hpp lines 525-544): - records: estimated_records + 5% safety margin - frs_to_idx: max_frs + 1 (sparse lookup array) - names: estimated_records * 23 (~23 chars avg) - links: estimated_records / 16 (6% have hardlinks) - streams: estimated_records / 4 (25% have ADS) - internal_streams: estimated_records / 20 (5% internal) - children: estimated_records * 3/2 (dirs have multiple children) Added MftBitmap::max_frs_in_use() to scan backwards and find the highest in-use FRS number, used for frs_to_idx sizing. IOCP direct-to-index reader now uses optimized pre-allocation. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace Vec<u16> with SmallVec<[u16; 64]> for UTF-16 filename decoding in both direct-to-index parsers. This avoids heap allocation for typical filenames (<= 64 chars), reducing per-record overhead in the hot parse loop. Matches the optimization already present in the full parser (parse/full.rs). Files modified: - crates/uffs-mft/src/io/parser/index.rs (IOCP hot path) - crates/uffs-mft/src/parse/direct_index.rs (file-based hot path) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR merges the fix-f-drive-parity branch which implements:
Testing