fix: fill schema-added nested columns with typed NULL arrays on read by viirya · Pull Request #2635 · apache/iceberg-rust

viirya · 2026-06-12T23:53:01Z

Which issue does this PR close?

Closes Data files persisted without complex columns in the table schema hit "unexpected target column" on read #2618.

What changes are included in this PR?

When a column of a nested type — list, map, or a struct that itself contains nested children — is added to the table schema after data files were written, reading those older files fails with unexpected target column type List(...). The transformer correctly plans a ColumnSource::Add { value: None, .. } for the missing column, but the helpers that materialize the all-NULL array (create_primitive_array_repeated and create_primitive_array_single_element in arrow/value.rs) only covered primitive types plus structs with primitive-only children, each via a hand-written per-type NULL branch.

This PR replaces all of those NULL branches with a single early return using arrow's new_null_array, which constructs a typed all-NULL array for every Arrow type, including arbitrarily nested ones (the timezone of timestamps and precision/scale of decimals are part of the DataType, so they are preserved). The Some(literal) branches — used for initial_default values and partition constants — are unchanged. Net effect: the two functions shrink by ~180 lines and the unsupported-type failure mode for NULL filling disappears entirely.

Are these changes tested?

New regression test schema_evolution_adds_list_map_and_nested_struct_columns_with_nulls in record_batch_transformer.rs: a file batch containing only id is read against an evolved schema that added xs: list<int>, props: map<string, int>, and s: struct<a: string, ys: list<long>> (the struct's ys child also exercises the nested-children path that the old Struct branch couldn't handle). The test asserts the added columns come back with the evolved schema's Arrow types and null_count == num_rows.

The test fails on main with unexpected target column type List(Int32, ...) — the exact error from the issue — and passes with this change. Full iceberg lib suite (1313 tests) passes; clippy and rustfmt clean.

When a column of a nested type (list, map, or a struct with nested children) is added to the table schema after data files were written, reading those older files failed with "unexpected target column type": the helpers that materialize missing columns only handled primitive types plus structs with primitive-only children, via hand-written per-type NULL branches. Build the all-NULL column with arrow's new_null_array instead, which supports every Arrow type (including arbitrarily nested ones), and drop the per-type NULL branches from create_primitive_array_repeated and create_primitive_array_single_element. Closes apache#2618

advancedxy · 2026-06-15T08:37:31Z

+    // With no value, the single element is NULL. `new_null_array` supports every
+    // Arrow type, including nested ones (list/map/struct), which matters for
+    // columns added by schema evolution after a data file was written (#2618).
+    if prim_lit.is_none() {


Hi @viirya thanks for the fix. We encountered the same issue with a schema evolution case: a list of binary was added, and the original code cannot handle it. We fixed that internally and are about to contribute it back.

Our fix is similar with yours. However during the internal code review process, we noticed the function name create_primitive_array_single_element is no longer valid, it's not just creating primitive array any more, it also creating list/struct/map arrays now. Considering the default value in Iceberg V3, a struct could have default values. I think it would be best to change the signature to create_array_single_element snd passes prim_lit as &Option<Literal>, WDYT?

I think this fix is valid short-term fix. We can get it merged first, and refactor it in a follow-up PR to address the naming issue and default value support in Iceberg V3.

Thanks @advancedxy — agreed on both counts, and thanks for flagging the V3 angle.

You're right that create_primitive_* is now a misnomer since these functions materialize list/map/struct NULLs too. I dug into the V3 default-value direction a bit, and it's actually a slightly larger refactor than just the helper signatures: the value currently threaded into them is Option<PrimitiveLiteral> (via ColumnSource::Add), and generate_transform_operations deliberately drops any non-primitive initial_default today (if let Literal::Primitive(prim) = lit { .. } else { None } in record_batch_transformer.rs). So supporting a struct/nested default would mean widening ColumnSource::Add.value to Option<Literal> and the transformer's default-extraction alongside the rename to create_array_single_element.

Given that, I'd prefer to keep this PR as the focused wraparound/nested-NULL fix and do the rename + Literal widening + V3 default support together in the follow-up, so the type change lands in one coherent step rather than renaming now and re-touching the signature later. Happy to open the follow-up issue/PR for that.

advancedxy reviewed Jun 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fill schema-added nested columns with typed NULL arrays on read#2635

fix: fill schema-added nested columns with typed NULL arrays on read#2635
viirya wants to merge 1 commit into
apache:mainfrom
viirya:fix/2618-null-fill-nested-columns

viirya commented Jun 12, 2026

Uh oh!

advancedxy Jun 15, 2026

Uh oh!

advancedxy Jun 15, 2026

Uh oh!

viirya Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

viirya commented Jun 12, 2026

Which issue does this PR close?

What changes are included in this PR?

Are these changes tested?

Uh oh!

advancedxy Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

advancedxy Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

viirya Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants