Skip to content

Route storage IO through the configured Iceberg runtime #2601

@geoffreyclaude

Description

@geoffreyclaude

Is your feature request related to a problem or challenge?

Iceberg has a Runtime abstraction with separate handles for IO-bound and CPU-bound work. However, FileIO and the storage layer do not currently route storage operations through that runtime.

This makes it difficult for users who intentionally separate Tokio runtimes for CPU work and IO work. Even when a catalog or table is constructed with a runtime, storage operations such as metadata file reads and data-file byte-range reads may still execute on the caller's current runtime.

This is especially relevant for DataFusion scans. The Parquet decode, filtering, projection, and batch transformation work should remain on the runtime that polls the returned RecordBatchStream, but storage byte-range reads should be able to run through runtime.io().

In other words, runtime routing should happen at the storage boundary, not by moving the whole scan stream onto an IO runtime.

Describe the solution you'd like

Add runtime-aware storage routing under FileIO.

A possible shape:

  • allow FileIO / FileIOBuilder to receive an Iceberg Runtime, or an IO runtime handle;
  • keep concrete storage backends runtime-agnostic;
  • wrap the raw Storage implementation in a private runtime-aware adapter when an IO runtime is configured;
  • wrap returned FileRead / FileWrite objects so delayed range reads, writes, and close operations also route through the IO runtime;
  • ensure tables built with a configured Runtime also bind their FileIO to that runtime;
  • add DataFusion runtime-aware constructors so catalog-backed providers can propagate the runtime to loaded tables.

The expected behavior would be:

DataFusion / caller runtime
  |
  +-- poll Iceberg scan stream
  |     --> Parquet decode / filtering / projection
  |
  +-- Iceberg FileIO
        |
        v
      runtime.io()
        |
        +-- metadata file reads
        +-- FileRead::read(range)
        +-- FileWrite::{write, close}

Non-goals:

  • changing scan partitioning;
  • adding eager file planning;
  • changing DataFusion physical-plan shape;
  • moving all Iceberg metadata processing onto a CPU runtime.

Testing ideas:

  • verify FileIO::exists routes through the configured IO runtime;
  • verify InputFile::reader and later FileRead::read(range) both route through the IO runtime;
  • verify OutputFile::writer, FileWrite::write, and FileWrite::close route through the IO runtime;
  • verify DataFusion catalog-backed table construction can propagate a runtime;
  • verify existing memory/local filesystem behavior remains unchanged without a configured runtime.

Disclosure: this issue text was drafted with assistance from Codex and reviewed before filing.

Willingness to contribute

I can contribute to this feature independently.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions