Document Purpose: Detailed gap analysis and implementation roadmap to achieve 100% feature parity between the reference baseline (
old_cpp_reference/uffs/UltraFastFileSearch-code/file.cppin the optional repo-root, gitignored local-only tree) and the Rust implementation (crates/uffs-mft/).
| Category | Reference Features | Rust Implemented | Parity |
|---|---|---|---|
| I/O Layer | 5 | 5 | ✅ 100% |
| Data Extraction | 10 | 10 | ✅ 100% |
| Data Representation (MFT) | 5 | 5 | ✅ 100% |
| Overall (uffs-mft) | 20 | 20 | ✅ 100% |
Status: MFT reading implementation completed on 2026-01-16. All MFT extraction features implemented.
| Crate | Responsibility | Status |
|---|---|---|
uffs-mft |
MFT reading, DataFrame building, CSV/Parquet/RAW persistence | ✅ 100% |
uffs-core |
Post-processing (tree structure, treesize, bulkiness), queries | ✅ 100% |
Design Principle:
uffs-mftdoes pure MFT reading and storage. No post-processing. Tree calculations and derived metrics belong inuffs-corequery engine.
| ID | Task | Status | File(s) |
|---|---|---|---|
| 1.1 | Add ExtensionAttributes struct to hold pending attributes |
[x] | io.rs |
| 1.2 | Add ParseResult enum (Base/Extension/Skip) |
[x] | io.rs |
| 1.3 | Create parse_record_full() for full parsing |
[x] | io.rs |
| 1.4 | Create MftRecordMerger for two-pass processing |
[x] | io.rs |
| 1.5 | Merge extension attributes into base records | [x] | io.rs |
| ID | Task | Status | File(s) |
|---|---|---|---|
| 2.1 | Add NameInfo struct matching the baseline |
[x] | ntfs.rs |
| 2.2 | Add names: Vec<NameInfo> to ParsedRecord |
[x] | io.rs |
| 2.3 | Add name_count() method to ParsedRecord |
[x] | io.rs |
| 2.4 | Collect ALL $FILE_NAME attributes (except DOS-only) |
[x] | io.rs |
| 2.5 | Add name_count column to DataFrame |
[x] | reader.rs |
| ID | Task | Status | File(s) |
|---|---|---|---|
| 3.1 | Add StreamInfo struct matching the baseline |
[x] | ntfs.rs |
| 3.2 | Add streams: Vec<StreamInfo> to ParsedRecord |
[x] | io.rs |
| 3.3 | Add stream_count() method to ParsedRecord |
[x] | io.rs |
| 3.4 | Parse ALL $DATA attributes (named and unnamed) |
[x] | io.rs |
| 3.5 | Extract stream name from attribute header | [x] | io.rs |
| 3.6 | Add stream_count column to DataFrame |
[x] | reader.rs |
| ID | Task | Status | File(s) |
|---|---|---|---|
| 4.1 | Add allocated_size field to ParsedRecord |
[x] | io.rs |
| 4.2 | Add allocated_size to StreamInfo |
[x] | ntfs.rs |
| 4.3 | Handle CompressionUnit for compressed files |
[x] | io.rs |
| 4.4 | Add allocated_size column to DataFrame |
[x] | reader.rs |
| ID | Task | Status | File(s) |
|---|---|---|---|
| 5.1 | Add ExtendedStandardInfo struct with 18 boolean flags |
[x] | ntfs.rs |
| 5.2 | Add from_attributes() constructor |
[x] | ntfs.rs |
| 5.3 | Add to_raw_flags() method |
[x] | ntfs.rs |
| 5.4 | Parse all flags from $STANDARD_INFORMATION |
[x] | io.rs |
| 5.5 | Add 11 individual flag columns to DataFrame | [x] | reader.rs |
| ID | Task | Status | File(s) | Notes |
|---|---|---|---|---|
| 6.1 | Add NodeInfo struct |
[x] | uffs-core/tree.rs |
Post-processing |
| 6.2 | Build TreeIndex from DataFrame |
[x] | uffs-core/tree.rs |
Query-time |
| 6.3 | Calculate treesize |
[x] | uffs-core/tree.rs |
Lazy computation |
| 6.4 | Add bulkiness calculation |
[x] | uffs-core/tree.rs |
Derived metric |
| 6.5 | Add tree columns via query | [x] | uffs-core/tree.rs |
On-demand |
| 6.6 | Add descendants column |
[x] | uffs-core/tree.rs |
Count of items |
| 6.7 | Add tree_allocated column |
[x] | uffs-core/tree.rs |
Sum of allocated |
Architecture Decision: Tree structure is post-processing, not MFT reading. Moved to
uffs-corecrate which handles queries and derived calculations.
| ID | Task | Status | File(s) |
|---|---|---|---|
| 8.1 | Add save_raw_mft() to save complete MFT bytes |
[x] | raw.rs, reader.rs |
| 8.2 | Add load_raw_mft() to load saved MFT bytes |
[x] | raw.rs, reader.rs |
| 8.3 | Handle fragmented MFT (reassemble extents) | [x] | reader.rs |
| 8.4 | Add save-raw / load-raw CLI commands |
[x] | uffs-cli/commands.rs |
| 8.5 | Compress raw MFT (optional, zstd) | [x] | raw.rs |
Purpose: Allow saving/loading the complete raw MFT bytes for offline analysis without requiring admin privileges or access to the original volume.
Implementation: New
raw.rsmodule with:
RawMftHeader- 64-byte header with magic, version, flags, sizesRawMftData- Loaded raw MFT with record iterationsave_raw_mft()/load_raw_mft()- File I/O with optional zstd compressionMftReader::read_raw()- Read MFT as raw bytes (handles fragmented MFT)MftReader::save_raw_to_file()- Convenience method to read and saveMftReader::load_raw_to_dataframe()- Load saved MFT and parse to DataFrame
| ID | Task | Status | File(s) |
|---|---|---|---|
| 7.1 | Add mft_changed field to ExtendedStandardInfo |
[x] | ntfs.rs |
| 7.2 | Extract from $STANDARD_INFORMATION |
[x] | io.rs |
| 7.3 | Add mft_changed column to DataFrame |
[x] | reader.rs |
Current Rust Behavior:
// crates/uffs-mft/src/io.rs:535-538
if !header.is_base_record() {
return None; // ❌ Discards extension records entirely
}Reference Behavior:
// old_cpp_reference/uffs/UltraFastFileSearch-code/file.cpp:2371-2372
unsigned int const frs_base = frsh->BaseFileRecordSegment
? static_cast<unsigned int>(frsh->BaseFileRecordSegment) : frs;
Records::iterator base_record = this->at(frs_base);
// All attributes from extension record are merged into base_recordWhy This Matters:
- NTFS files can span multiple MFT records when they have many attributes
- Large files with many alternate data streams use extension records
- Files with very long names or many hard links use extension records
- Without merging, these files have INCOMPLETE data
Implementation Strategy:
- First pass: Parse all records, storing extension records separately
- Second pass: Merge extension attributes into their base records
- Use
base_file_record_segmentfield to find the base record
New Structs Required:
/// Attributes extracted from an extension record, pending merge.
pub struct ExtensionAttributes {
/// The base FRS this extension belongs to.
pub base_frs: u64,
/// Additional file names found in this extension.
pub names: Vec<NameInfo>,
/// Additional streams found in this extension.
pub streams: Vec<StreamInfo>,
/// Size contributions from this extension.
pub size_delta: u64,
pub allocated_delta: u64,
}Current Rust Behavior:
// crates/uffs-mft/src/io.rs:663-671
let is_better_name = match namespace {
1 | 3 => true, // Win32 or Win32+DOS
0 => result.name.is_empty(), // POSIX only if no name yet
2 => result.name.is_empty() && namespace != 1, // DOS only if no better name
_ => false,
};
if !is_better_name && !result.name.is_empty() {
return; // ❌ Discards additional names (hard links)
}Reference Behavior:
// old_cpp_reference/uffs/UltraFastFileSearch-code/file.cpp:2394-2419
if (fn->Flags != 0x02 /* FILE_NAME_DOS */) {
// Push existing first_name to nameinfos list
if (LinkInfos::value_type *const si = this->nameinfo(&*base_record)) {
this->nameinfos.push_back(base_record->first_name);
base_record->first_name.next_entry = link_index;
}
// Store new name
info->name.offset(...);
info->parent = frs_parent;
++base_record->name_count; // ✅ Tracks ALL names
}Why This Matters:
- Hard links are common in Windows (e.g.,
C:\Windows\System32has many) - Each hard link has a DIFFERENT parent directory
- Without tracking all names, files appear to be in only one location
- Security/forensics tools need to see all file locations
Implementation Strategy:
- Store ALL
$FILE_NAMEattributes (except DOS-only namespace=2) - Add
names: Vec<NameInfo>toParsedRecord - Each
NameInfocontains: name, parent_frs, namespace - DataFrame can either:
- Expand to multiple rows (one per name) - easier for queries
- Store as List column - more compact
New Structs Required:
/// Information about a single file name (hard link).
#[derive(Debug, Clone)]
pub struct NameInfo {
/// The file name.
pub name: String,
/// Parent directory FRS.
pub parent_frs: u64,
/// Namespace (0=POSIX, 1=Win32, 2=DOS, 3=Win32+DOS).
pub namespace: u8,
}Current Rust Behavior:
// crates/uffs-mft/src/io.rs:583-596
Some(AttributeType::Data) => {
if attr_header.is_non_resident != 0 {
// Only extracts size from unnamed $DATA
let data_size = ...;
result.size = data_size as u64;
}
}
// ❌ Named streams (ADS) are completely ignoredReference Behavior:
// old_cpp_reference/uffs/UltraFastFileSearch-code/file.cpp:2464-2512
StreamInfo *info = NULL;
// ... creates StreamInfo for EACH $DATA attribute
info->type_name_id = ah->Type >> 4;
info->name.length = ah->NameLength; // ✅ Stores stream name
info->name.offset(...);
++base_record->stream_count; // ✅ Tracks ALL streamsWhy This Matters:
- Alternate Data Streams are used for:
Zone.Identifier- marks files downloaded from internet- Thumbnails and metadata
- Malware hiding (security concern!)
- Without ADS support, these are invisible to searches
- Critical for security/forensics use cases
Implementation Strategy:
- Parse ALL
$DATAattributes, not just unnamed - Extract stream name from attribute header
- Add
streams: Vec<StreamInfo>toParsedRecord - Each
StreamInfocontains: name, size, allocated_size, is_sparse
New Structs Required:
/// Information about a single data stream.
#[derive(Debug, Clone)]
pub struct StreamInfo {
/// Stream name (empty for default stream).
pub name: String,
/// Logical size in bytes.
pub size: u64,
/// Allocated size on disk.
pub allocated_size: u64,
/// Whether this stream is sparse.
pub is_sparse: bool,
/// Attribute type (0x80 for $DATA, 0xC0 for $REPARSE_POINT, etc.).
pub attribute_type: u8,
}Current Rust Behavior:
// crates/uffs-mft/src/io.rs:589-593
let data_size = i64::from_le_bytes(...); // Only DataSize
result.size = data_size as u64;
// ❌ No allocated_size, no compressed_size handlingReference Behavior:
// old_cpp_reference/uffs/UltraFastFileSearch-code/file.cpp:2517-2525
info->allocated += ah->IsNonResident
? ah->NonResident.CompressionUnit
? static_cast<file_size_type>(ah->NonResident.CompressedSize)
: static_cast<file_size_type>(ah->NonResident.AllocatedSize)
: 0;
info->length += ah->IsNonResident
? static_cast<file_size_type>(ah->NonResident.DataSize)
: ah->Resident.ValueLength;
info->bulkiness += info->allocated;Why This Matters:
allocated_sizeshows actual disk usage (important for disk analysis)compressed_sizeis needed for compressed files (NTFS compression)- Without these, disk usage calculations are wrong
bulkinesshelps identify fragmented files
Implementation Strategy:
- Add
allocated_sizeandcompressed_sizetoParsedRecord - Check
CompressionUnitto determine which size to use - Sum sizes across all streams for total file size
Fields to Add:
pub struct ParsedRecord {
// ... existing fields ...
/// Allocated size on disk (physical).
pub allocated_size: u64,
/// Compressed size (for compressed files).
pub compressed_size: u64,
/// Bulkiness metric (for fragmentation analysis).
pub bulkiness: u64,
}Current Rust Behavior:
// crates/uffs-mft/src/io.rs:634
result.flags = (si.file_attributes & 0xFFFF) as u16;
// Stores as single u16 bitmaskReference Behavior:
// old_cpp_reference/uffs/UltraFastFileSearch-code/file.cpp:1920-1935 (StandardInfo struct)
struct StandardInfo {
unsigned long long created, written, accessed : 0x40 - 6;
is_readonly : 1, is_archive : 1, is_system : 1, is_hidden : 1,
is_offline : 1, is_notcontentidx : 1, is_noscrubdata : 1,
is_integretystream : 1, is_pinned : 1, is_unpinned : 1,
is_directory : 1, is_compressed : 1, is_encrypted : 1,
is_sparsefile : 1, is_reparsepoint : 1;
};Why This Matters:
- Individual boolean columns are easier to query in Polars
df.filter(col("is_hidden") & col("is_system"))is cleaner than bit manipulation- Some flags are not in the standard 16-bit mask (e.g.,
is_pinned,is_unpinned)
Implementation Strategy:
- Extract all 15+ flags from
$STANDARD_INFORMATION - Add individual boolean fields to
ParsedRecord - Add individual boolean columns to DataFrame
Flags to Extract:
| Flag | Bit | Description |
|---|---|---|
is_readonly |
0x0001 | Read-only file |
is_hidden |
0x0002 | Hidden file |
is_system |
0x0004 | System file |
is_archive |
0x0020 | Archive flag |
is_directory |
0x0010 | Directory (from record flags) |
is_device |
0x0040 | Device |
is_normal |
0x0080 | Normal file |
is_temporary |
0x0100 | Temporary file |
is_sparse |
0x0200 | Sparse file |
is_reparse |
0x0400 | Reparse point |
is_compressed |
0x0800 | Compressed |
is_offline |
0x1000 | Offline |
is_notcontentidx |
0x2000 | Not content indexed |
is_encrypted |
0x4000 | Encrypted |
is_integritystream |
0x8000 | Integrity stream |
is_virtual |
0x10000 | Virtual |
is_noscrubdata |
0x20000 | No scrub data |
is_pinned |
0x80000 | Pinned |
is_unpinned |
0x100000 | Unpinned |
Current Rust Behavior:
- Only stores
parent_frsfor each record - No child tracking
- No tree traversal capability
Reference Behavior:
// old_cpp_reference/uffs/UltraFastFileSearch-code/file.cpp:2407-2417
if (frs_parent != frs_base) {
Records::iterator const parent = this->at(frs_parent, &base_record);
ChildInfo *const child_info = &this->childinfos.back();
child_info->record_number = frs_base;
child_info->name_index = base_record->name_count;
child_info->next_entry = parent->first_child;
parent->first_child = child_index;
}Why This Matters:
- Enables efficient tree traversal from root
- Required for
treesizecalculation (items in subtree) - Useful for directory size calculations
Implementation Strategy:
- Build child index as HashMap<parent_frs, Vec<child_frs>>
- Post-process to calculate
treesizefor each directory - Add
treesizecolumn to DataFrame
Note: This is marked OPTIONAL because:
- Polars can do parent-child joins efficiently
- Tree traversal can be done with recursive queries
- The main use case (file search) doesn't need tree structure
Current Rust Behavior:
- Extracts:
created,modified,accessed - Missing:
mft_changed(when MFT record was last modified)
Reference Behavior:
- Has access to all 4 timestamps from
$STANDARD_INFORMATION
Implementation Strategy:
- Add
mft_changedfield toParsedRecord - Extract from
$STANDARD_INFORMATION.LastChangeTime - Add
mft_changedcolumn to DataFrame
┌─────────────┬───────────┬─────────────────────────────────────┐
│ Column │ Type │ Description │
├─────────────┼───────────┼─────────────────────────────────────┤
│ frs │ UInt64 │ File Record Segment number │
│ parent_frs │ UInt64 │ Parent directory FRS │
│ name │ String │ File name (ONE only) │
│ size │ UInt64 │ Logical file size │
│ created │ Datetime │ Creation time │
│ modified │ Datetime │ Modification time │
│ accessed │ Datetime │ Access time │
│ flags │ UInt16 │ File attributes bitmask │
└─────────────┴───────────┴─────────────────────────────────────┘
┌──────────────────┬────────────────┬─────────────────────────────────────┐
│ Column │ Type │ Description │
├──────────────────┼────────────────┼─────────────────────────────────────┤
│ frs │ UInt64 │ File Record Segment number │
│ parent_frs │ UInt64 │ Primary parent directory FRS │
│ name │ String │ Primary file name │
│ name_count │ UInt16 │ Number of hard links │
│ names │ List[Struct] │ All names [{name, parent, ns}] │
│ stream_count │ UInt16 │ Number of data streams │
│ streams │ List[Struct] │ All streams [{name, size, alloc}] │
│ size │ UInt64 │ Logical file size │
│ allocated_size │ UInt64 │ Physical allocation │
│ compressed_size │ UInt64 │ Compressed size (if applicable) │
│ bulkiness │ UInt64 │ Fragmentation metric │
│ treesize │ UInt32 │ Items in subtree (directories) │
│ created │ Datetime │ Creation time │
│ modified │ Datetime │ Modification time │
│ accessed │ Datetime │ Access time │
│ mft_changed │ Datetime │ MFT record change time │
│ is_readonly │ Boolean │ Read-only flag │
│ is_hidden │ Boolean │ Hidden flag │
│ is_system │ Boolean │ System flag │
│ is_archive │ Boolean │ Archive flag │
│ is_directory │ Boolean │ Directory flag │
│ is_compressed │ Boolean │ Compressed flag │
│ is_encrypted │ Boolean │ Encrypted flag │
│ is_sparse │ Boolean │ Sparse file flag │
│ is_reparse │ Boolean │ Reparse point flag │
│ is_offline │ Boolean │ Offline flag │
│ is_notcontentidx │ Boolean │ Not content indexed flag │
└──────────────────┴────────────────┴─────────────────────────────────────┘
File: crates/uffs-mft/src/io.rs
Step 1.1: Add ExtensionRecord struct
/// Attributes extracted from an extension record.
#[derive(Debug, Clone, Default)]
pub struct ExtensionAttributes {
/// The base FRS this extension belongs to.
pub base_frs: u64,
/// The extension's own FRS.
pub extension_frs: u64,
/// File names found in this extension.
pub names: Vec<NameInfo>,
/// Streams found in this extension.
pub streams: Vec<StreamInfo>,
}Step 1.2: Modify parse_record() signature
/// Result of parsing an MFT record.
pub enum ParseResult {
/// A base record with all its data.
Base(ParsedRecord),
/// An extension record with attributes to merge.
Extension(ExtensionAttributes),
/// Record is not in use or invalid.
Skip,
}
pub fn parse_record(data: &[u8], frs: u64) -> ParseResult {
// ... existing validation ...
if !header.is_base_record() {
// Parse extension record instead of skipping
return parse_extension_record(data, frs, header);
}
// ... rest of base record parsing ...
}Step 1.3: Create MftRecordMerger
/// Merges extension record attributes into base records.
pub struct MftRecordMerger {
/// Base records indexed by FRS.
base_records: HashMap<u64, ParsedRecord>,
/// Pending extension attributes.
extensions: Vec<ExtensionAttributes>,
}
impl MftRecordMerger {
pub fn new() -> Self { ... }
pub fn add_result(&mut self, result: ParseResult) {
match result {
ParseResult::Base(record) => {
self.base_records.insert(record.frs, record);
}
ParseResult::Extension(ext) => {
self.extensions.push(ext);
}
ParseResult::Skip => {}
}
}
pub fn merge(mut self) -> Vec<ParsedRecord> {
// Merge all extensions into their base records
for ext in self.extensions {
if let Some(base) = self.base_records.get_mut(&ext.base_frs) {
base.names.extend(ext.names);
base.streams.extend(ext.streams);
}
}
self.base_records.into_values().collect()
}
}File: crates/uffs-mft/src/ntfs.rs
Step 2.1: Add NameInfo struct
/// Information about a single file name (hard link).
#[derive(Debug, Clone)]
pub struct NameInfo {
/// The file name.
pub name: String,
/// Parent directory FRS.
pub parent_frs: u64,
/// Namespace (0=POSIX, 1=Win32, 2=DOS, 3=Win32+DOS).
pub namespace: u8,
}File: crates/uffs-mft/src/io.rs
Step 2.2-2.4: Update ParsedRecord and parsing
pub struct ParsedRecord {
pub frs: u64,
/// All file names (hard links). First is "primary".
pub names: Vec<NameInfo>,
/// Convenience: primary parent FRS.
pub parent_frs: u64,
/// Convenience: primary name.
pub name: String,
// ... other fields ...
}
fn parse_file_name(...) {
// Don't skip additional names!
if namespace != 2 { // Not DOS-only
let name_info = NameInfo {
name: extract_name(...),
parent_frs: fn_attr.parent_directory,
namespace,
};
result.names.push(name_info);
// Update primary name if this is better
if is_better_name {
result.name = name_info.name.clone();
result.parent_frs = name_info.parent_frs;
}
}
}File: crates/uffs-mft/src/reader.rs
Step 2.5: Add names column to DataFrame
// Option A: Expand to multiple rows (one per name)
// Easier for queries but larger DataFrame
// Option B: Store as List column (recommended)
use polars::prelude::*;
let names_series = Series::new(
"names".into(),
parsed_records.iter().map(|r| {
Series::new("".into(), r.names.iter().map(|n| &n.name).collect::<Vec<_>>())
}).collect::<Vec<_>>()
);File: crates/uffs-mft/src/ntfs.rs
Step 3.1: Add StreamInfo struct
/// Information about a single data stream.
#[derive(Debug, Clone)]
pub struct StreamInfo {
/// Stream name (empty for default stream).
pub name: String,
/// Logical size in bytes.
pub size: u64,
/// Allocated size on disk.
pub allocated_size: u64,
/// Whether this stream is sparse.
pub is_sparse: bool,
/// Attribute type code.
pub attribute_type: u32,
}File: crates/uffs-mft/src/io.rs
Step 3.3-3.5: Parse all $DATA attributes
fn parse_data_attribute(
data: &[u8],
attr_offset: usize,
header: &AttributeRecordHeader,
result: &mut ParsedRecord,
) {
// Extract stream name from attribute header
let name = if header.name_length > 0 {
let name_offset = attr_offset + header.name_offset as usize;
let name_bytes = &data[name_offset..name_offset + header.name_length as usize * 2];
String::from_utf16_lossy(
&name_bytes.chunks(2)
.map(|c| u16::from_le_bytes([c[0], c[1]]))
.collect::<Vec<_>>()
)
} else {
String::new()
};
let (size, allocated_size) = if header.is_non_resident != 0 {
// Non-resident: get sizes from header
let nr = parse_non_resident_header(...);
(nr.data_size, nr.allocated_size)
} else {
// Resident: size is value_length, allocated is 0
(header.value_length as u64, 0)
};
let stream_info = StreamInfo {
name,
size,
allocated_size,
is_sparse: (header.flags & 0x8000) != 0,
attribute_type: header.type_code,
};
result.streams.push(stream_info);
// Update primary size from default stream
if stream_info.name.is_empty() && stream_info.attribute_type == 0x80 {
result.size = stream_info.size;
result.allocated_size = stream_info.allocated_size;
}
}Each phase should include unit tests:
#[cfg(test)]
mod tests {
use super::*;
// Phase 1: Extension record merging
#[test]
fn test_extension_record_detection() {
let mut header = FileRecordSegmentHeader::default();
header.base_file_record_segment = 0x12345;
assert!(!header.is_base_record());
}
#[test]
fn test_extension_merge() {
let mut merger = MftRecordMerger::new();
merger.add_result(ParseResult::Base(base_record));
merger.add_result(ParseResult::Extension(ext_attrs));
let records = merger.merge();
assert_eq!(records[0].names.len(), 2);
}
// Phase 2: Hard links
#[test]
fn test_multiple_names() {
let record = parse_record(&data_with_two_names, 100);
assert_eq!(record.names.len(), 2);
}
// Phase 3: Alternate data streams
#[test]
fn test_alternate_data_stream() {
let record = parse_record(&data_with_ads, 100);
assert!(record.streams.iter().any(|s| s.name == "Zone.Identifier"));
}
}Test with real MFT data:
#[tokio::test]
#[cfg(windows)]
async fn test_hard_link_detection() {
// Create a file with hard link
// Read MFT
// Verify both names are found
}
#[tokio::test]
#[cfg(windows)]
async fn test_ads_detection() {
// Create a file with ADS
// Read MFT
// Verify stream is found
}| Phase | Complexity | Estimated Time | Priority |
|---|---|---|---|
| Phase 1: Extension Records | High | 4-6 hours | 🔴 Critical |
| Phase 2: Hard Links | Medium | 2-3 hours | 🔴 Critical |
| Phase 3: Alternate Streams | Medium | 2-3 hours | 🔴 Critical |
| Phase 4: Extended Sizes | Low | 1-2 hours | 🟡 Important |
| Phase 5: Extended Flags | Low | 1-2 hours | 🟡 Important |
| Phase 6: Directory Tree | High | 4-6 hours | ✅ Complete |
| Phase 7: MFT Change Time | Low | 0.5 hours | 🟢 Moderate |
| Total | 15-23 hours |
The implementation is complete when:
- ✅ Extension records are merged into base records
- ✅ All hard links are captured (multiple names per file)
- ✅ All alternate data streams are captured
- ✅ Allocated and compressed sizes are tracked
- ✅ All 15+ attribute flags are extracted
- ✅ DataFrame schema matches target schema
- ✅ All unit tests pass
- ✅ Integration tests with real MFT data pass
- ✅ Performance is not significantly degraded
-
Reference baseline:
old_cpp_reference/uffs/UltraFastFileSearch-code/file.cpp(optional repo-root, gitignored, local-only, never pushed)- Lines 939-1193: NTFS structures
- Lines 1884-2090: NtfsIndex data structures
- Lines 2370-2530: Record parsing logic
-
Rust Implementation:
crates/uffs-mft/src/ntfs.rs: NTFS structuresio.rs: Record parsingreader.rs: DataFrame buildingplatform.rs: Windows API wrappers