fix(arrow): reconcile evolved nested structs by field id, not position by jordepic · Pull Request #2647 · apache/iceberg-rust

jordepic · 2026-06-14T21:16:22Z

Closes Type casting error when reading files persisted with old schema for complex type #2617 .

What changes are included in this PR?

When a table's struct (or nested list/map) column has gained fields over time via schema evolution, reading data files written under the older schema fails with an Arrow cast error such as Cast error: Casting from Utf8 to Struct(...). The record-batch transformer reconciles a file's nested children to the table schema by position within the struct rather than by Iceberg field id, so once a nested struct adds a field, the children no longer line up and a mismatched cast is attempted (e.g. casting a string child into a struct slot). Files are valid and readable by Iceberg-Java/Spark.

e.g. struct goes from a, c to a, b, c -> when reading old file with only a, c it tries to cast c to type of b

This change fixes the bug!

Replace the flat cast with promote_array_to_target, which walks the target type and matches nested struct children by PARQUET:field_id, filling fields absent from the file with typed NULLs and recursing through list/large-list/map. Primitives still use cast for valid Iceberg promotions. Mirrors iceberg-java's by-field-id nested readers.

Are these changes tested?

Yes, unit tests are included to ensure that nested fields are now properly reconciled when present in the schema but not in the data file itself.

When a struct gains a field (schema evolution) after a data file was written, the file's struct has fewer children than the table schema. RecordBatchTransformer promoted such columns with arrow_cast::cast, which matches struct/list/map children positionally -- so every child after the gap shifts and types collide, e.g. 'Casting from Utf8 to Struct' when a scalar lands on an added list<struct>. Replace the flat cast with cast_schema_to_target, which walks the target type and matches nested struct children by PARQUET:field_id, filling fields absent from the file with typed NULLs and recursing through list/large-list/map. Primitives still use cast for valid Iceberg promotions. Mirrors iceberg-java's by-field-id nested readers. Closes apache#2617

jordepic force-pushed the fix/2617-nested-struct-field-id branch from 8bcdd6d to e33ca27 Compare June 14, 2026 21:20

jordepic force-pushed the fix/2617-nested-struct-field-id branch from e33ca27 to 5d42f1a Compare June 14, 2026 21:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(arrow): reconcile evolved nested structs by field id, not position#2647

fix(arrow): reconcile evolved nested structs by field id, not position#2647
jordepic wants to merge 1 commit into
apache:mainfrom
jordepic:fix/2617-nested-struct-field-id

jordepic commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jordepic commented Jun 14, 2026

What changes are included in this PR?

Are these changes tested?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant