Skip to content

Route FileIO through Iceberg runtime#22

Merged
geoffreyclaude merged 1 commit into
DataDog:branch-0.9from
geoffreyclaude:geoffrey.claude/runtime-storage-io
Jun 8, 2026
Merged

Route FileIO through Iceberg runtime#22
geoffreyclaude merged 1 commit into
DataDog:branch-0.9from
geoffreyclaude:geoffrey.claude/runtime-storage-io

Conversation

@geoffreyclaude

@geoffreyclaude geoffreyclaude commented Jun 5, 2026

Copy link
Copy Markdown

Summary

This adds runtime-aware FileIO construction for callers that keep storage IO on a dedicated Tokio runtime.

When an IO runtime is configured, FileIO still caches the raw backend storage, but exposes operations through a private RuntimeStorage adapter. Direct storage calls, reader creation, byte-range reads through FileRead::read(range), writer creation, writer chunks, and writer close are dispatched through runtime.io().

The public API has two entry points:

  • FileIO::with_runtime(runtime) / FileIOBuilder::with_runtime(runtime) for callers that already have a full Iceberg Runtime.
  • FileIO::with_io_runtime(handle) / FileIOBuilder::with_io_runtime(handle) for callers that only need to route storage IO.

RestCatalogBuilder::with_file_io_runtime(handle) uses the IO-only path for long-lived REST catalogs without assigning a full table runtime. Existing CatalogBuilder::with_runtime(runtime) behavior remains the full-runtime path for loaded tables.

Concrete storage backends remain runtime-agnostic.

Runtime Impact

This PR intentionally routes storage operations, not every piece of Iceberg metadata processing.

Callers with separate runtimes can pass explicit handles with Runtime::new_with_handles(io, cpu). The storage adapter uses the IO half for storage scheduling.

Data-file Parquet scan CPU stays where the returned RecordBatchStream is polled. For DataFusion callers, decode, decompression, row filtering, projection, and Iceberg batch transformation remain on the query runtime polling the stream, while byte-range reads run through the IO runtime.

Catalog-backed DataFusion provider paths now use the IO runtime only for catalog reloads. Scan building and plan_files() collection stay on the caller runtime; loaded tables still carry the Iceberg runtime, so manifest and FileIO operations dispatch through their own runtime-aware paths.

Manifest planning and delete metadata processing keep their existing scheduling behavior. Existing Iceberg tasks that already use runtime.cpu() continue to use the supplied CPU handle, but this PR does not add a custom CPU spawn/accounting hook and does not broaden metadata offloading.

Shape

Before, callers that wanted storage IO off the query runtime had to move the whole scan stream onto IO:

DataFusion / query runtime
  |
  +-- IOExec / wrapper
        |
        v
      IO runtime
        |
        +-- scan stream polling
        |     --> Parquet decode/decompression
        |     --> row filtering / projection
        |     --> Iceberg batch transformation
        |
        +-- FileIO / raw Storage backend
              --> byte-range reads

After, only storage work crosses into the IO runtime; scan CPU remains on the caller runtime:

DataFusion / caller runtime
  |
  +-- scan planning and stream polling
  |     --> Parquet decode/decompression
  |     --> row filtering / projection
  |     --> Iceberg batch transformation
  |
  +-- FileIO / InputFile / OutputFile
        |
        v
      RuntimeStorage (private adapter)
        |
        +-- runtime.io() --> raw Storage backend
                         --> FileRead::read(range)
                         --> FileWrite::{write, close}

Why This Layer

Parquet readers perform byte-range reads through Iceberg FileRead objects after scan planning. Routing at the storage adapter layer covers those IO operations without moving the whole DataFusion scan stream onto the IO runtime and without making concrete storage backends runtime-aware.

The REST catalog IO-only hook is for catalog construction paths that should route FileIO storage work to IO without assigning a long-lived CPU runtime to every loaded table. Query-created tables can still be rebound later with a full runtime.

Validation

  • cargo fmt --check
  • cargo check -p iceberg -p iceberg-catalog-rest -p iceberg-storage-opendal -p iceberg-datafusion --locked
  • cargo test -p iceberg file_io --locked
  • cargo test -p iceberg test_runtime_with_handles_uses_explicit_cpu_handle --locked
  • cargo test -p iceberg-catalog-rest test_load_table_with_file_io_runtime_routes_storage_to_io --locked
  • cargo test -p iceberg-datafusion test_catalog_backed_provider --locked
  • cargo test -p iceberg test_plan_files --locked

@datadog-datadog-prod-us1-2

datadog-datadog-prod-us1-2 Bot commented Jun 5, 2026

Copy link
Copy Markdown

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 1 Pipeline job failed

GitHub Actions Security Analysis with zizmor 🌈 | Run zizmor 🌈   View in Datadog   GitHub Actions

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 7ae0d92 | Docs | Datadog PR Page | Give us feedback!

@geoffreyclaude geoffreyclaude force-pushed the geoffrey.claude/runtime-storage-io branch 4 times, most recently from 8e8f82a to 878a9a9 Compare June 5, 2026 14:13
@geoffreyclaude geoffreyclaude marked this pull request as ready for review June 5, 2026 14:16
@geoffreyclaude geoffreyclaude force-pushed the geoffrey.claude/runtime-storage-io branch 5 times, most recently from c96a871 to 4c1d3b1 Compare June 6, 2026 11:40
@geoffreyclaude

Copy link
Copy Markdown
Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Chef's kiss.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@geoffreyclaude geoffreyclaude force-pushed the geoffrey.claude/runtime-storage-io branch 3 times, most recently from 9a14913 to 6e0b315 Compare June 7, 2026 15:04
@gabotechs

Copy link
Copy Markdown

Any chance to contribute something like this upstream? I can imagine how this is a problem that anyone with CPU/IO runtime separation will have in the community

@geoffreyclaude

Copy link
Copy Markdown
Author

Any chance to contribute something like this upstream? I can imagine how this is a problem that anyone with CPU/IO runtime separation will have in the community

@gabotechs That's the goal for sure. But as discussed with @toutane about contributing #20 upstream, the runtime changes in our fork are currently depending on apache#2298 , which isn't merged upstream yet.

@gabotechs gabotechs left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Sounds good. Before merging, I'd probably leave @toutane a chance to review this, as he is more familiar with this code.

Just skimmed through it, but saw nothing obviously wrong, so +1 for me

@geoffreyclaude geoffreyclaude force-pushed the geoffrey.claude/runtime-storage-io branch from 6e0b315 to a434203 Compare June 8, 2026 08:31
@geoffreyclaude

Copy link
Copy Markdown
Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Keep them coming!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@toutane toutane left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good in deed!

I'm not very familiar with the Storage trait but I think your wrapper looks very neat.

I think a refactor is now possible in DataFusion's table/mod.rs to more precisely separate what needs to run on the IO vs. the CPU runtime. Wdyt?

.run_on_io(async move {
catalog
let table = catalog
.load_table(&table_ident)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here, we could use load_table_on_io now and get rid of the run_on_io call.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex here: agreed. I updated insert_into to use load_table_on_io directly and removed run_on_io entirely.

The new shape is:

let table = Self::load_table_on_io(
    self.catalog.clone(),
    self.table_ident.clone(),
    self.runtime.as_ref(),
)
.await
.map_err(to_datafusion_error)?;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the run_on_io wrapping here is now redundant, and maybe even counter-productive given that load_table_on_io is available.

The only operations in this closure that genuinely benefit from being routed to the IO runtime are the load_table call and get_manifest_list inside plan_files.

Either we modify plan_files to spawn get_manifest_list on the IO runtime itself, or we narrow run_on_io down to just wrapping the plan_files call.

Happy to address it in a follow-up PR if it sounds good to you!

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great point! Looks like we can indeed get rid of run_on_io completely, which will simplify things quite a bit.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex here: agreed, and I took the narrower version now instead of leaving it for a follow-up.

scan() now routes only the catalog refresh through load_table_on_io; scan building and plan_files() collection stay on the caller runtime. The loaded Table still carries the Iceberg runtime, so manifest/FileIO operations dispatch through their own runtime-aware paths.

let table = Self::load_table_on_io(...).await.map_err(to_datafusion_error)?;

let tasks: Vec<FileScanTask> = builder
    .build()
    .map_err(to_datafusion_error)?
    .plan_files()
    .await
    .map_err(to_datafusion_error)?
    .try_collect::<Vec<_>>()
    .await
    .map_err(to_datafusion_error)?;

@toutane

toutane commented Jun 8, 2026

Copy link
Copy Markdown

Any chance to contribute something like this upstream? I can imagine how this is a problem that anyone with CPU/IO runtime separation will have in the community

@gabotechs That's the goal for sure. But as discussed with @toutane about contributing #20 upstream, the runtime changes in our fork are currently depending on apache#2298 , which isn't merged upstream yet.

@geoffreyclaude I think what we're lacking upstream is the try_new_with_runtime constructor (table/mod.rs).
It was added in https://github.com/DataDog/iceberg-rust/pull/20/changes, which itself depends on apache#2298, as it also brings the run_on_io wrapper for the scan method.

One possible solution would be to include try_new_with_runtime directly in the upstream PR?

@geoffreyclaude geoffreyclaude force-pushed the geoffrey.claude/runtime-storage-io branch from a434203 to 7ae0d92 Compare June 8, 2026 09:48
@geoffreyclaude

Copy link
Copy Markdown
Author

Codex update, replying to #22 (comment):

@toutane @gabotechs we now have a dedicated upstream path for this, independent of apache#2298:

The draft PR includes the DataFusion try_new_with_runtime constructors discussed here, plus the FileIO/Storage runtime routing needed to keep storage work on runtime.io(). It deliberately does not change scan partitioning, eager file planning, or DataFusion physical-plan shape.

Could you both take an initial look when you have a chance, especially at whether the API shape matches the direction you had in mind?

@geoffreyclaude geoffreyclaude merged commit 79b97fd into DataDog:branch-0.9 Jun 8, 2026
20 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants