Skip to content

feat(optical): remote optical-disc ripping (desktop-driven, device-streamed)#47

Merged
danifunker merged 19 commits into
mainfrom
remote-optical-ripping
Jun 29, 2026
Merged

feat(optical): remote optical-disc ripping (desktop-driven, device-streamed)#47
danifunker merged 19 commits into
mainfrom
remote-optical-ripping

Conversation

@danifunker

@danifunker danifunker commented Jun 27, 2026

Copy link
Copy Markdown
Owner

This branch carries two independent bodies of work that grew up together: the remote optical ripping feature it started as, and a substantial HFS / HFS+ catalog B-tree scaling + correctness effort that landed on top while validating against real Apple disks. Both are summarized below.


1. Remote optical-disc ripping (desktop-driven, device-streamed)

Drive a remote optical drive from the desktop app / CLI: the machine with the CD/DVD drive runs rb-cli serve and only issues SCSI reads, while the desktop pulls raw sectors over the LAN and does all the encoding (ISO / BIN-CUE assembly + CHD compression). A weak device (e.g. a MiSTer SuperStation One, ~800 MHz Cortex-A9) never gets taxed by compression.

Full design + a phase-by-phase tracker live in docs/remote_ripping.md (all [x] except the two hardware/GUI-runtime checks).

How it works

The rip pipeline touches the drive through one tiny seam — OpticalSource (read_toc / read_data_sectors / eject) — so swapping a LocalCdReader for a RemoteCdReader moves the reader over the wire while every encode step stays on the desktop, unchanged.

machine with the drive (rb-cli serve)     your desktop (GUI / rb-cli)
─────────────────────────────────────     ──────────────────────────
cd-da-reader: READ TOC / READ CD     ──►   RemoteCdReader (proxies the 3 ops)
retry/backoff loop (next to drive)         write local .bin/.cue / .iso
ship raw 2352-byte sectors           ──►   CHD compress (libchdman)  ← stays here

Usage

On the box with the drive: rb-cli serve (needs optical+remote — the desktop release and rb-cli-mini both have them; the daemon must run elevated to open /dev/sr0).

From the desktop — CLI:

rb-cli optical drives --remote that-box:7341          # discover (prints rb:// args)
rb-cli optical rip --device rb://that-box:7341/dev/sr0 --output disc.cue --format bincue

From the desktop — GUI: Optical tab → Add remote daemon… (with an MRU quick-pick of past daemons) → the drive appears in the unified pulldown tagged [host:port] … → Start Rip.

What's inside

  • Daemon optical tier (CAP_FAMILY_O): ListOpticalDrives / OpenOptical / ReadToc / ReadOpticalSectors / EjectOptical / CloseOptical, reusing the existing control-frame + FileBegin/chunk framing. One session per daemon (cd-da-reader keeps a global handle), guarded process-wide.
  • RemoteCdReader proxying the seam; OpticalTarget { Local | Remote } dispatch; CLI rb:// device parse.
  • Unified picker (model::optical_devices) merging local + per-daemon drives; CLI optical drives --remote; GUI unified pulldown + worker-thread "Add remote daemon…" + MRU persisted to config.json.
  • Transfer speed + ETA in the CLI rip/convert progress line (e.g. 45% (315.0 MiB/700.0 MiB) - 28.4 MiB/s, ETA 13s). RateTracker lifted out of gui/progress.rs into model::rate_tracker so the CLI / mini build can use it; the GUI re-exports it unchanged. For a remote rip the rate is LAN throughput; for CHD it's the local encode rate.
  • No new container/file types; CHD encode reuses the existing path-based to_chd (temp BIN/CUE materialized locally, exactly like the existing rip-to-CHD flow).

Tests

17 lib optical/picker/protocol unit tests + 3 loopback integration tests (tests/remote_optical.rs: handshake/ListOpticalDrives round-trip, OpenOptical error + busy-guard release, remote enumeration) + the MRU unit test + RateTracker unit tests. Builds clean across default(GUI) / optical+remote / optical-only / remote-only.

Not yet validated (needs hardware / a GUI run)

  • A real disc rip over the network — byte-good ISO/CHD with the device CPU idle (the loopback tests cover the wire path, not a physical drive).
  • The GUI at runtime — compile- and clippy-clean here, but egui wasn't driven.

Companion change already on main: opticaldiscs 0.4.5 + the cd-da-reader armv7 fix that made the MiSTer optical build possible (PR #46).


2. HFS / HFS+ catalog B-tree scaling + correctness

A chain of HFS/HFS+ fixes surfaced while validating rusty-backup against real Apple-formatted disks (a MacPack 1.8 GB HFS volume and a Mac OS 9.2.2 HFS+ install). Both now fsck with 0 errors, and large imports no longer corrupt or stall. Plan + tracker: docs/hfsplus_btree_growth_plan.md (P1–P5 complete) and the deferred-step writeup docs/todo_hfsplus_fork_growth.md.

Classic HFS catalog B-tree (the reported corruption)

  • insert_catalog_record now uses the shared incremental inserter instead of rebuilding the whole index after every leaf split. The per-split rebuild was O(n²) and leaked index nodes living past the header-bitmap window, exhausting free nodes mid-rebuild and corrupting the tree at ~7.4k records (disk full: no free B-tree nodes / IndexSiblingLinkBroken).
  • split_index_node maintains fLink/bLink sibling links; rebuild_index_nodes (delete / fsck-repair) frees index nodes across all bitmap segments, not just the header window.

B*-tree density (matching how a real Mac packs)

  • btree_try_rotate_leaf — before allocating on a full leaf, redistribute records with an adjacent sibling that shares the parent, patching the one separator key in place. Lifts random-insert occupancy ~69% → ~88%. Reused for index nodes too.
  • btree_split_index_with_insert — append-aware index split (greedy pack-left on a tail append, rebalance only a genuine middle insert), replacing the fixed 50/50 split. Sequential index occupancy ~45% → ~96%.
  • Applies to the incremental per-put path as well as bulk import: 20,000 shuffled multi-dir files via individual rb-cli put calls land fsck-clean with no IndexSiblingLinkBroken (85% leaf / 82% index occupancy), matching the bulk untar path.

Import speed (untar of tens of thousands of files)

  • Duplicate check descends the index (was an O(n) leaf walk → O(n²) import); bulk mode skips the per-file full-catalog snapshot; tar_import caches each directory's child names; ensure_catalog_initialized stops re-reading the extents fork on every create. Net: a 9,000-file untar went 2m25s → 1.6s.

Mac-faithful collation + fsck (validated against real disks)

  • Classic HFS catalog keys use the real Mac Roman collation order (hfs_charorder, confirmed against Apple's _RelString and hfsutils), not an ASCII-uppercase table — fixes KeysOutOfOrder on accented / curly-quote / nbsp names.
  • HFS+ names use Apple's exact TN1150 case-fold + canonical decomposition tables (ported from the Linux kernel) for both comparison and on-disk form, replacing char::to_lowercase + Rust NFD. Fixes underscore-vs-letter ordering and matches Mac OS for ß, Hangul, and the decomposition-excluded ranges. Drops the now-unused unicode-normalization dependency.
  • Null bytes in catalog names are a warning (UnusualCatalogName), not an error: valid on classic HFS and present on real disks.

HFS+ B-tree growth (P1–P5 of the growth plan)

The classic helpers were hardwired for the classic-HFS key shape, so an HFS+ catalog couldn't grow past a single index level without corrupting. Fixed across five phases:

  • P1BTreeKeyFormat descriptor (big_keys / variable_index_keys / max_key_len) threaded through every split/grow/rotate helper, so HFS+ variable-length 2-byte keys are handled correctly; classic output byte-identical.
  • P2 — blank HFS+ catalog is now volume-scaled (~0.5% of volume, like classic HFS) instead of a fixed 4 nodes, so the live insert path no longer exhausts it after ~24 files. New create_blank_hfsplus_sized for clone targets/tests.
  • P3 — extents-overflow B-tree splits past depth-1 (was split-on-overflow not yet implemented); attributes tree splits too.
  • P4 — regression test proving the streamed defrag builder produces a multi-level catalog that's fsck-clean and round-trips byte-for-byte.
  • P5 — all three HFS+ live inserters delegate to the shared btree_insert_full (B*-rotation density), deleting ~250 lines of duplicated split/grow code; shuffled inserts pack ~0.69 → ~0.84 occupancy.
  • §4b grow-on-full is intentionally deferred (classic HFS ships without it too); design + a fsck_hfs-clean-on-real-Mac validation recipe captured in docs/todo_hfsplus_fork_growth.md.

Tests

Regression tests throughout: 20k+ record imports (bulk and per-put), bulk mode, random/sequential packing density, the exact real-disk collation cases, null-byte names, the TN1150 fold/decompose tables, buffer-level multi-level growth for catalog/extents/attributes trees, a 64-extent fragmented-file extents-overflow round-trip, and the multi-level defrag clone.

🤖 Generated with Claude Code

danifunker and others added 19 commits June 27, 2026 07:19
Kicks off the desktop-driven / device-streamed remote ripping feature
(docs/remote_ripping.md): the MiSTer streams raw CD sectors and the desktop
does all encoding, so a weak armv7 device isn't taxed by CHD compression.

P1.1 — the read seam, a pure local refactor with no behavior change:
- New src/optical/source.rs: `OpticalSource` trait (read_toc /
  read_data_sectors / eject) + `LocalCdReader` wrapping cd-da-reader.
- rip_iso / rip_bin_cue take `&dyn OpticalSource`; `run_rip` builds the source
  via `open_optical_source` and ejects through the trait. eject_disc moved into
  source.rs.
- RipConfig is unchanged (still device_path) so the GUI/CLI are untouched; the
  OpticalTarget switch lands in P1.7 with the remote dispatch.

Optical unit tests green (14/14); behavior byte-identical.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds the daemon side of remote ripping: a desktop client can list a remote
machine's optical drives, open one, read its TOC, stream raw sectors, and
eject — while the daemon only issues SCSI reads.

- protocol.rs (P1.2/P1.3): CAP_FAMILY_O = 1<<2; six optical Request verbs
  (ListOpticalDrives / OpenOptical / ReadToc / ReadOpticalSectors /
  EjectOptical / CloseOptical) + OpticalOpened / Toc / OpticalDrives responses;
  serde-mirror DTOs (WireToc/WireTrack/WireSectorMode/WireRetryConfig/
  WireOpticalDrive) always compiled under `remote`, with cd-da-reader From
  conversions gated behind `optical`. Sector data reuses FileBegin + chunk
  stream. Round-trip + conversion tests.
- server.rs (P1.4): `optical_server` module — a per-connection OpticalState
  wrapping a LocalCdReader (reuses the P1.1 read/eject ops) plus a process-global
  AtomicBool busy guard (cd-da-reader keeps a global drive handle, so only one
  session per daemon; released on drop / disconnect). Dispatch arms for all six
  verbs; non-optical builds reply cleanly. Hello advertises CAP_FAMILY_O.

Builds clean in optical, remote-only, and default (GUI) configs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Completes the desktop side of remote ripping — `rb-cli optical rip --device
rb://host:port/dev/sr0` now streams raw sectors from a remote daemon and does
all encoding locally.

- client.rs / connection.rs (P1.5): RemoteSession + RemoteConnection optical
  methods (list_optical_drives / open_optical / read_toc / read_optical_sectors
  / eject_optical / close_optical). They return the Wire DTOs, so no `optical`
  gate on the client.
- source.rs (P1.6): RemoteCdReader (gated `remote`) implements OpticalSource
  over an Arc<Mutex<RemoteConnection>>; retry is sent at open, Drop frees the
  daemon's optical slot.
- rip.rs (P1.7): OpticalTarget { Local | Remote{conn,device_path} } with a
  manual Debug + resolve() that parses an rb:// device arg; RipConfig.device
  replaces device_path; open_optical_source branches Local/Remote.
- CLI + GUI updated to the new RipConfig.device (local call sites build
  OpticalTarget::Local); CLI --device help documents the rb:// form.

Builds clean across optical+remote / optical-only / default(GUI); 14 optical
unit tests + protocol round-trip tests green. Hardware rip validation (P1.8)
pending a real drive on a networked box.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
tests/remote_optical.rs drives the client<->daemon optical path over a port-0
rb-cli serve listener without a physical drive:
- the optical-built daemon handles ListOpticalDrives (round-trips, doesn't
  reply "built without the optical feature");
- OpenOptical of a bogus device errors cleanly and releases the process-global
  busy guard, so a second open fails at open rather than reporting "busy".

This validates the wire path + the single-session guard. The byte-identical
rip-a-real-disc validation still needs an optical drive on a networked box.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…3.3)

- model/optical_devices.rs (P3.1): RipDevice / DeviceLocation (Local |
  Remote{conn,label}) + list_local_rip_devices / append_remote_rip_devices /
  list_rip_devices merging local + per-daemon drives. picker_label /
  cli_device_arg / into_target helpers. Remote enumeration errors are swallowed,
  so an offline or non-optical daemon contributes nothing — which also
  capability-gates the picker without inspecting handshake bits.
- cli/verbs/optical.rs (P3.3): `optical drives --remote host:port` (repeatable)
  lists local + each daemon's drives, printing a feedable device arg
  (rb://host:port/dev/sr0 for remote rows). Routed through the picker core.
- tests/remote_optical.rs: remote_rip_device_enumeration_over_loopback validates
  the remote arm; optical_devices unit test covers local label/arg/target.

Builds clean across default(GUI) / optical+remote / optical-only; loopback +
unit tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…2, P3.4)

The Optical tab's drive picker now lists local AND remote drives in one
pulldown, so a desktop user can rip from a networked daemon's drive (e.g. the
SuperStation) with all encoding staying local.

- optical_tab: unified `rip_devices: Vec<RipDevice>` (+ `remote_daemons`)
  replaces the local-only `Vec<OpticalDrive>`; the combo labels entries via
  picker_label (`[host:port] name (path)` for remote). An "Add remote daemon..."
  modal connects on a worker thread (ConnectStatus / poll_add_remote, so an
  unreachable host can't freeze egui) and unlocks the Physical-drive mode.
- Rip dispatch goes through RipDevice::to_target(); start_rip_to_chd /
  rip_to_chd_worker now take an OpticalTarget (CHD encode still local).
- Remote drives are rip-only here: get_browsable_path returns None for them
  (disc-info/browse open the device locally; remote browse is the Inspect tab).
- model::optical_devices: add to_target(&self) (borrowing) + is_remote().
- P3.4: capability gating falls out of append_remote_rip_devices (a non-optical
  daemon contributes no drives); eject is location-aware via OpticalSource::eject
  (eject checkbox gained a hover note).

Builds + clippy clean on default(GUI); GUI runtime behavior pending a user check.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Cross-cutting doc-sync for the remote-ripping feature (CLAUDE.md rule):
- README MiSTer build list: "Remote ripping off-device" bullet (run rb-daemon
  on the device, drive it from the desktop; device only does SCSI reads).
- docs/full_MiSTer_support_status.md: "Remote optical ripping" capability line.
- docs/remote_ripping.md: P2.1 done (rip_to_chd_worker takes OpticalTarget, so
  remote -> CHD encodes locally), P2.2 ~ (hardware), done-criteria all checked.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The "Add remote daemon" dialog now remembers daemon addresses across sessions,
so a recurring drive (e.g. the SuperStation) is one click away.

- update.rs: UpdateConfig.recent_daemon_addrs (config.json) + remember_daemon()
  (dedup, newest-first, capped at 8). Unit-tested.
- optical_tab: on a successful connect, record the address (in-memory MRU +
  persisted); the dialog shows a "Recent:" quick-pick list — clicking an entry
  re-connects. It's a pick list, not auto-reconnect, so an offline daemon never
  blocks startup.

Builds + clippy clean (GUI + mini); MRU unit test green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The GUI progress bar already showed rate + ETA; the CLI only printed percent.
Lift the rolling-window estimator into a shared, non-GUI module so both
surfaces use one implementation, and wire it into the CLI.

- model/rate_tracker.rs (new): RateTracker moved verbatim out of gui/progress.rs
  (record / rate_bytes_per_sec / eta_secs / suffix / reset) + its unit tests.
  Pure std, ungated — available to the CLI/mini build.
- gui/progress.rs: re-exports RateTracker from the model layer (the backup /
  restore / inspect / export tabs that use `progress::RateTracker` are unchanged).
- cli/verbs/optical.rs: drain_rip + drain_convert sample the tracker each tick
  and append " - <rate>/s, ETA <eta>" to the progress line, e.g.
  `  progress:  45% (315.0 MiB/700.0 MiB) - 28.4 MiB/s, ETA 13s`.
  For a remote rip the rate reflects LAN throughput; for CHD it's the local
  encode rate.

Builds + clippy clean (GUI + mini); rate_tracker tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ation/fsck

Resolves PROMPT-hfs-catalog-btree-scaling.md plus a chain of HFS/HFS+
correctness issues surfaced while validating rusty-backup against real
Apple-formatted disks (a MacPack 1.8 GB HFS volume and a Mac OS 9.2.2
HFS+ install). Both now fsck with 0 errors.

Catalog B-tree (the reported corruption):
- insert_catalog_record uses the shared incremental inserter instead of
  rebuilding the whole index after every leaf split. The per-split rebuild
  was O(n^2) and leaked index nodes living past the 2048-node header
  bitmap, exhausting free nodes mid-rebuild and corrupting the tree at
  ~7.4k records ("disk full: no free B-tree nodes" / IndexSiblingLinkBroken).
- split_index_node now maintains fLink/bLink sibling links.
- rebuild_index_nodes (delete / fsck-repair) frees index nodes across all
  bitmap segments, not just the header window.

Import speed (untar of tens of thousands of files):
- duplicate check descends the index (was an O(n) leaf walk -> O(n^2) import)
- bulk mode skips the per-file full-catalog snapshot
- tar_import caches each directory's child names (was list_directory/entry)
- ensure_catalog_initialized stops re-reading the extents fork every create
  Net: a 9,000-file untar went 2m25s -> 1.6s.

B-tree leaf packing:
- append-aware split: dense pack-left for sequential inserts, balanced for
  random ones (random imports 1.6 -> 2.6 records/node, matching Mac OS;
  the default 2 GB catalog now holds ~48k random-order files, was ~28k).

Collation + fsck (validated against real disks):
- classic HFS catalog keys use the real Mac Roman collation order
  (hfs_charorder; confirmed against Apple OS/HFS/CMMAINT.a _RelString and
  hfsutils/Linux), not an ASCII-only uppercase table -- fixes KeysOutOfOrder
  on accented / curly-quote / nbsp names.
- HFS+ names use Apple's exact TN1150 case-fold + canonical decomposition
  tables (ported from the Linux kernel) for both comparison and the on-disk
  form, replacing char::to_lowercase + Rust NFD. Fixes underscore-vs-letter
  ordering on real HFS+ volumes and matches Mac OS for ß (1:1 fold), Hangul,
  and the decomposition-excluded ranges. Drops the now-unused
  unicode-normalization dependency.
- null bytes in catalog names are a warning (UnusualCatalogName), not an
  error: valid on classic HFS, and real disks carry them.

Adds regression tests throughout (20k+ record imports, bulk mode, random
packing density, the exact real-disk collation cases, null-byte names, and
the TN1150 fold/decompose tables).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The bulk-import scaling fix (PROMPT-hfs-catalog-btree-scaling) left the
incremental, one-record-at-a-time `put` path packing the catalog B-tree far
more loosely than a sequential build, so a catalog grown by many separate
`rb-cli put` calls (MacAtrium's per-title art build) ran out of catalog nodes
("disk full: no free B-tree nodes") at a fraction of the file count and left
IndexSiblingLinkBroken near the ceiling.

Root cause was two naive 50/50 splits, both of which freeze a node at ~half
full when later inserts land elsewhere:

- Leaf splits left non-sequential (mid-leaf) inserts at the classic ~69%
  B-tree occupancy. On a 100M volume shuffled `put` died at ~2384 records vs
  ~3281 for a sequential build.
- The index split was append-blind: sequential separators (what every leaf
  split emits in key order) froze each non-rightmost index node at ~45%,
  doubling index nodes and adding a whole tree level — this hurt *every*
  workload, sequential included.

Fix, mirroring how a real Mac packs a B*-tree:

- `btree_try_rotate_leaf` — before allocating a node on a full leaf, redistribute
  records with an adjacent sibling that shares the parent and has room, updating
  the one parent separator key in place. Lifts random-insert occupancy ~69%→~88%.
  Reused for index nodes too (record-agnostic).
- `btree_split_index_with_insert` — append-aware index split (greedy pack-left on
  a tail append, rebalance only a genuine middle insert), replacing the fixed
  50/50 `split_index_node`. Sequential index occupancy ~45%→~96%.

The rotation only patches a separator in place when the existing key is the same
length as the normalized classic-HFS key; for a real variable-length HFS+ index
it bails to the normal split, so HFS+ behaviour is unchanged.

Results (100M volume, was/now): flat shuffled put 2384→3107, nested-dir longer
names 1893→2378, sequential 3281→3713 — all fsck-clean. End-to-end: 20,000
shuffled multi-dir files via individual `rb-cli put` calls land fsck-clean with
no IndexSiblingLinkBroken (85% leaf / 82% index occupancy), matching the bulk
`untar` path. Adds a 20k shuffled multi-dir regression test and tightens the
random-insert density assertion.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…x keys (P1)

The shared HFS B-tree split/grow/rotate helpers in hfs_common.rs were hardwired
for the classic-HFS key shape: a 1-byte key-length prefix and the fixed 0x25
normalized index separator. HFS+ reuses those helpers verbatim, so the moment an
HFS+ catalog leaf split and needed an index separator, the helper read the high
byte of the 2-byte key length as the whole length and ran the key through the
classic normalize -- producing a malformed 38-byte separator. Descent then
misrouted (parentID read at the wrong offset), records landed in the wrong leaf,
and fsck reported "leaves must be strictly ascending". An HFS+ catalog therefore
could not grow past a single index level without corrupting.

Introduce BTreeKeyFormat -- a small Copy descriptor (big_keys, variable_index_keys,
max_key_len) derived from the BTHeaderRec attributes + maxKeyLength -- and thread
&BTreeKeyFormat through btree_split_leaf_with_insert, btree_split_index_with_insert,
btree_insert_into_index, btree_grow_root, btree_try_rotate_leaf,
btree_update_index_separator, and btree_insert_full. Separator extraction now reads
the key portion via kf.key_portion(); index records are built via kf.make_index_key(),
which keeps the classic fixed-0x25 key for CLASSIC_CATALOG (byte-identical to the
old normalize_catalog_index_key path) and stores the child's variable-length 2-byte
key verbatim for the HFS+ catalog/attributes trees.

Callers pass the matching constant: classic HFS -> CLASSIC_CATALOG, the HFS+
catalog insert -> HFSPLUS_CATALOG, the attributes insert -> HFSPLUS_ATTRIBUTES,
and the streamed defrag builders -> the same per-tree constants. The B*-rotation
stays effectively classic-only on HFS+ for now: btree_update_index_separator only
patches a separator in place when the new key matches the old length, otherwise it
abandons the rotation and splits -- so a variable-key index is never left with a
stale separator (full density rotation is P5).

This is the §4a "key-format descriptor" step of docs/hfsplus_btree_growth_plan.md.
Verified at the catalog-buffer level: btree_insert_full with HFSPLUS_CATALOG grows
a shuffled multi-parent catalog to depth >= 3 with strictly-ascending leaves and
every record findable by descent. The volume-level fsck gate lands in P2, once
blank catalogs are sized to hold enough records to reach depth >= 3 (a blank
catalog is only 4 nodes today). Classic HFS output is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…sands of records (P2 §4c)

A blank HFS+ volume reserved a fixed 4-node catalog B-tree regardless of volume
size, so the live insert path exhausted it after ~24 files ("disk full: no free
B-tree nodes") -- the second defect in the growth plan's probe. Classic HFS
avoids this by pre-sizing the catalog to ~0.5% of the volume (default_btree_sizes
/ create_blank_hfs_sized) and has no grow-on-full path at all; mirror that.

build_blank_hfsplus_front now sizes the catalog from the volume via
default_hfsplus_catalog_bytes (~0.5%), clamped to whole nodes in [4, header-bitmap
capacity] so the blank still needs no dedicated map nodes (~30,544 nodes / 117 MiB
at node_size 4096). The extents-overflow tree keeps its 4-node default (its own
scaling is P3). create_blank_hfsplus is unchanged in signature (auto-sizes); a new
create_blank_hfsplus_sized lets clone targets and tests pin a larger catalog into
a modest image, and the streamed write_blank_hfsplus_into auto-sizes too.

Verified end-to-end: 20k files in shuffled key order across 50 directories insert
into a 64 MiB volume with a 16 MiB catalog, the catalog grows to depth >= 3, and
hfsplus_fsck reports zero errors -- the volume-level gate P1 deferred, now passing
because the variable-length index keys (P1) and the sized catalog (this commit)
work together. Updated test_create_blank_hfsplus_32mib for the new (larger,
volume-scaled) reserved-block layout.

4b (grow the fork when free_nodes hits 0) is deferred: classic HFS ships without
it, the blank auto-sizes, and the clone path over-sizes its target, so live growth
is only needed for a foreign under-sized catalog -- a pre-existing classic-HFS
limitation. Rationale recorded in docs/hfsplus_btree_growth_plan.md §4b.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… key-format descriptor (P3)

insert_extents_overflow_record was pinned to a single leaf level: on a full leaf
it returned InvalidData ("split-on-overflow not yet implemented") because the
shared B-tree growth helpers used to force the classic 1-byte / fixed-0x25 index
key shape onto HFS+ records. P1's BTreeKeyFormat fixed that, so wire the extents
path through the full split/grow machinery (btree_split_leaf_with_insert /
btree_grow_root / btree_insert_into_index) with BTreeKeyFormat::HFSPLUS_EXTENTS --
2-byte big keys with fixed 10-byte index separators -- mirroring
insert_catalog_record / insert_xattr_record. The attributes path already routed
through HFSPLUS_ATTRIBUTES in P1, so it splits too.

Tests:
- buffer-level: the extents-overflow tree (fixed index keys) and the attributes
  tree (variable index keys) each grow to depth >= 2 under shuffled inserts, with
  strictly-ascending leaves and every record findable by root-to-leaf descent.
- real-path integration: a 520-block maximally-fragmented file generates 64
  extents-overflow records, splitting that B-tree past one leaf via the real
  insert_extents_overflow_record, and reads back byte-for-byte through the
  resulting multi-level tree.

§4a/P3 of docs/hfsplus_btree_growth_plan.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…(P4)

The streamed defrag builder constructs its target catalog with
hfs_common::btree_insert_full + BTreeKeyFormat::HFSPLUS_CATALOG (wired in P1), so
it inherited the variable-length index-key fix automatically. Add a round-trip
test that proves it end-to-end: a 64 MiB source with 300 files across 10 dirs has
a multi-level catalog (depth >= 2), and after stream_defragmented_hfsplus the
target's defrag-built catalog is itself multi-level, fsck-clean, and round-trips
byte-for-byte. Before P1 a defrag-built catalog that exceeded one leaf level would
have carried the same malformed classic-shaped separators as the live path.

§P4 of docs/hfsplus_btree_growth_plan.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tation density (P5)

The HFS+ live insert methods (insert_catalog_record, insert_xattr_record,
insert_extents_overflow_record) each carried a hand-copied find -> insert ->
split -> grow dance that, unlike classic HFS's insert_catalog_record, never tried
a sibling rotation before splitting. A plain split leaves a randomly-inserted
(per-put) leaf at the classic ~69% B-tree occupancy, so a catalog grown one
record at a time used ~1.4x the leaves of a packed tree.

Delegate all three to the shared hfs_common::btree_insert_full (threading the
matching BTreeKeyFormat) -- the exact path classic HFS already uses, which
attempts a B*-style rotation into an adjacent sibling sharing the leaf's parent
before allocating a new node. btree_try_rotate_leaf's separator update is already
key-format-aware (P1): it patches the parent separator in place when the new and
old separators are the same length -- always true for the fixed-length extents
keys and for same-length catalog/attribute names -- and otherwise abandons the
rotation and splits, so a variable-key index is never left with a stale
separator.

Result: shuffled multi-dir catalog inserts now pack to ~0.84 leaf occupancy (was
~0.69), measured by test_hfsplus_catalog_shuffled_inserts_pack_densely. The change
also deletes ~250 lines of duplicated split/grow code, leaving one tested insert
implementation shared by classic HFS, HFS+, and the defrag builder.

§4d/P5 of docs/hfsplus_btree_growth_plan.md. With this, P1-P5 are complete; only
the optional §4b grow-on-full (deferred, matching classic HFS) remains.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
All five phases are implemented and tested; only the optional §4b grow-on-full
is intentionally deferred (classic HFS has no grow path either). Flip the top
status line from "in progress" to complete.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ation recipe

The one deferred step of the HFS+ B-tree growth plan: growing a catalog /
extents-overflow / attributes fork when it runs out of nodes (today
btree_alloc_node just returns DiskFull). Because this writes new on-disk-mutation
machinery in the node-bitmap / map-node area that historically produced
IndexSiblingLinkBroken corruption, its acceptance bar is fsck_hfs-clean and
mountable on a real Mac (HFS+/HFSX is still fully supported there) -- not just our
own hfsplus_fsck.

The doc covers: the single allocation chokepoint and why growth must live in the
HfsPlusFilesystem insert methods; a phased design (Phase A contiguous tail growth
within the header-bitmap node cap -- the high-value 90% case; Phase B >8-extent
overflow spill + write-path overflow; Phase C map-node appending past the cap);
risks/gotchas (write-path overflow, the two conflicting bitmap-capacity formulas,
journaling, atomicity, clump alignment); in-repo tests; and a concrete macOS
validation recipe. It also flags the real CLI gaps -- there is no
`new --fs hfsplus` and `put` copies a host file (not stdin) -- and adds a Phase 0
prerequisite to expose HFS+ creation with a --min-catalog knob so the grow path
can be driven from rb-cli for the Mac recipe.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@danifunker danifunker merged commit e7f0424 into main Jun 29, 2026
14 checks passed
@danifunker danifunker deleted the remote-optical-ripping branch June 29, 2026 02:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant