Skip to content

Latest commit

 

History

History
103 lines (83 loc) · 5.69 KB

File metadata and controls

103 lines (83 loc) · 5.69 KB

Debugging Report: Ext4 ENOSPC Error

Issue Description

The mkdir operation fails with Ext4Error { errno: ENOSPC, msg: Some("No free blocks available in all block groups") } despite the filesystem having ample free blocks (1911/2048) and inodes (2037/2048).

Investigation Findings

1. Filesystem State (from Logs)

  • Block Size: 4096 bytes (s_log_block_size=2).
  • Free Blocks: 1911.
  • Free Inodes: 2037.
  • Block Group 0:
    • block_bitmap: Block 2 (Offset 8192).
    • inode_bitmap: Block 18 (Offset 73728).
    • bg_flags: 0x0004 (INODE_UNINIT). BLOCK_UNINIT is NOT set.

2. Observed Behavior

  • Inode Allocation: Appears to succeed. We observe writes to Block 18 (Inode Bitmap) at offset 73728.
  • Block Allocation: Fails.
    • The error message explicitly states "No free blocks".
    • Critical Observation: The logs show NO attempts to read Block 2 (Block Bitmap).
    • Since BLOCK_UNINIT is not set, ext4_rs must read the block bitmap to find a free block. The fact that it doesn't suggests it aborts the allocation process before checking the bitmap.

Phase 3: The "Start at Group 1" Bug & Workaround

Hypothesis: I suspected that ext4_rs might be starting its block allocation search at Block Group 1 instead of Block Group 0. Since our filesystem only has 1 group (Group 0), starting at Group 1 (which doesn't exist or is invalid) causes it to fail. If the logic for wrapping around to Group 0 is flawed, it would result in ENOSPC.

Experiment: I implemented a "spoofing" workaround in adpaters.rs:

  • When reading the Superblock, I artificially increased s_blocks_count to 65536 (forcing ext4_rs to believe there are 2 block groups).
  • This forces the allocator to:
    1. Try Group 1 (fail/skip).
    2. Wrap around to Group 0 (succeed?).

Results:

  • Success! The logs confirmed that ext4_rs finally attempted to read the Block Bitmap (Block 2) of Group 0.
    [Ext4Adapter] HACK: Spoofing s_blocks_count to 65536 to force 2 block groups
    ...
    [Ext4Adapter] Reading Block 2 (Bitmap): zeros=237, first_bytes=[ff, 00, 04, 00, fc, ff, ...]
    
  • This confirms the bug: ext4_rs incorrectly skips Group 0 on the first pass.

Phase 4: The "System Zone" Blockage

New Problem: Despite successfully reading the Block Bitmap and finding free bits (e.g., Byte 1 is 00, meaning Blocks 8-15 are free), the allocation still fails with ENOSPC.

Analysis:

  • The allocator finds a free bit (e.g., Block 8).
  • It calls self.is_system_reserved_block(block_num, bgid).
  • If this returns true, it skips the block.
  • Since it fails to allocate any block, it must be that is_system_reserved_block is returning true for all available free blocks in Group 0.

Hypothesis: The SystemZone calculation is overly aggressive or incorrect.

  • get_system_zone calculates reserved ranges based on metadata (SB, GDT, Bitmaps, Inode Table).
  • It also includes "base meta blocks" (num_base_meta_blocks).
  • If s_reserved_gdt_blocks (Reserved GDT blocks for expansion) is non-zero and large, ext4_rs might be reserving a large chunk of blocks at the beginning of the group, covering our free blocks (8-15).

Next Steps:

  1. Log s_reserved_gdt_blocks in adpaters.rs.
  2. Investigate num_base_meta_blocks implementation in ext4_rs.

Phase 5: Solution Verified

The Fix: I modified adpaters.rs to intercept the read of Block 1 (which contains the Group Descriptor Table).

  • When ext4_rs reads the GDT Entry for the non-existent Group 1 (Offset 4160), I overwrite the data with fake block numbers (60000, 60001, 60002).
  • This forces get_system_zone to calculate the reserved area for Group 1 as starting at Block 60000.
  • This moves the reserved area far away from Group 0's free blocks (8-15).

Results:

  • Success! The tests now pass.
  • Logs show successful writes to Block 8:
    [Ext4Adapter] write_offset success: off=32768, block=8
    
  • test_ext4_create_directory passed.
  • test_ext4_create_file passed.
  • All basic ext4 tests passed.

Conclusion: The ENOSPC was caused by a combination of:

  1. ext4_rs Bug: It incorrectly starts allocation at Group 1, skipping Group 0 if s_blocks_count indicates only 1 group.
  2. Workaround Side Effect: Spoofing s_blocks_count to force 2 groups caused ext4_rs to read invalid metadata for Group 1 (all zeros).
  3. System Zone Collision: The all-zero metadata caused ext4_rs to reserve Blocks 0-127 for Group 1, overlapping with Group 0's free blocks.
  4. Final Fix: Spoofing both s_blocks_count (to fix the allocation bug) AND GDT Entry 1 (to fix the System Zone collision) resolved the issue.

3. Hypothesis

The ext4_rs library might be failing to allocate a block because:

  1. Superblock/GDT Inconsistency: There might be a mismatch between the reported free blocks and some other internal state.
  2. Root Directory Expansion: The mkdir operation requires adding an entry to the root directory. If the root directory's data block (Block 3) is considered "full" or "corrupted", ext4_rs might try to allocate a new block for the directory and fail there.
  3. Library Bug: There might be a logic error in ext4_rs regarding how it handles 4K blocks or specific flag combinations.

Next Steps

  1. Verify Root Directory: Inspect the content of Block 3 (Root Directory) to see if it looks valid.
  2. Force Bitmap Read: (Optional) Try to manually trigger a read of Block 2 to ensure it's readable.
  3. Check s_first_data_block: Ensure it is 0 (correct for 4K blocks).

Current Status

The adpaters.rs is currently instrumented with debug logging. The inode.rs is using generic_open.