Skip to content

feat: expose io_buffer_size in CompactionOptions#7226

Open
aimanmalib wants to merge 1 commit into
lance-format:mainfrom
aimanmalib:feat/compaction-io-buffer-size
Open

feat: expose io_buffer_size in CompactionOptions#7226
aimanmalib wants to merge 1 commit into
lance-format:mainfrom
aimanmalib:feat/compaction-io-buffer-size

Conversation

@aimanmalib

Copy link
Copy Markdown

Summary

CompactionOptions did not expose io_buffer_size, even though the scanner used during compaction supports it. Compaction builds its scan reader in prepare_reader, which only forwarded batch_size to the scanner — the io_buffer_size knob was never set.

This matters because a single batch larger than the I/O buffer size causes the scanner to deadlock (documented on Scanner::io_buffer_size), and since the backpressure warning was downgraded to debug, this deadlock is now silent at the default. Users had no way to raise the buffer to avoid it during compaction.

Resolves #4946.

Changes

  • Add io_buffer_size: Option<u64> to CompactionOptions (with Default = None).
  • Support the lance.compaction.io_buffer_size manifest config key in apply_dataset_config.
  • Plumb the value through prepare_readerscanner.io_buffer_size(...), mirroring the existing batch_size handling.
  • Update the Python binding to keep parameter names consistent across languages:
    • parse_compaction_options accepts io_buffer_size
    • CompactionOptions TypedDict documents the field

Testing

  • Extended test_from_dataset_config to assert the lance.compaction.io_buffer_size key round-trips.
  • Added test_compact_with_io_buffer_size (parametrized over Legacy/Stable file versions) that runs compact_files with an explicit io_buffer_size and verifies the compaction succeeds and preserves all rows.
  • cargo test -p lance --lib dataset::optimize::tests76 passed, 0 failed.
  • cargo clippy -p lance --lib --tests → clean.
  • cargo fmt --all -- --check → clean.

Notes

This is a non-breaking, additive change — the new field defaults to None, preserving existing behavior when unset.

Compaction builds its scan reader via prepare_reader, which previously
only forwarded batch_size to the scanner. The scanner's io_buffer_size
knob was never set during compaction, so users had no way to increase
the I/O buffer. This matters because a single batch larger than the I/O
buffer size causes the scanner to deadlock (documented in
Scanner::io_buffer_size), and with backpressure warnings downgraded to
debug this deadlock is silent at the default.

Add an io_buffer_size field to CompactionOptions, plumb it through
prepare_reader to scanner.io_buffer_size, and support the
lance.compaction.io_buffer_size manifest config key. The Python binding
(parse_compaction_options + CompactionOptions TypedDict) is updated to
keep parameter names consistent across languages.

Closes lance-format#4946
@github-actions github-actions Bot added A-python Python bindings enhancement New feature or request labels Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-python Python bindings enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CompactionOptions should expose io_buffer_size

1 participant