feat(datafusion): support PARTITIONED BY for identity-partitioned external tables by huan233usc · Pull Request #2575 · apache/iceberg-rust

huan233usc · 2026-06-03T06:19:49Z

Which issue does this PR close?

Partially solve Support CREATE EXTERNAL TABLE PARTITIONED BY syntax with DataFusion #2050

What changes are included in this PR?

CREATE EXTERNAL TABLE ... STORED AS ICEBERG (via IcebergTableProviderFactory) previously rejected any PARTITIONED BY clause outright.

DataFusion's PARTITIONED BY grammar only accepts plain column names — it cannot express Iceberg transforms such as bucket(16, id) or days(ts) (unlike Spark's native DSv2 grammar). Given that constraint, this PR:

Stops rejecting table_partition_cols in check_cmd.
Adds validate_partition_columns, run after the table is loaded:
- If the table's default partition spec uses any non-identity transform, returns a clear FeatureUnsupported error naming the offending field/transform.
- Otherwise validates that the declared columns exactly match the identity partition columns in order (consistent with PartitionSpec::is_compatible_with and Java's PartitionSpec.compatibleWith, where field order is significant).
Omitting PARTITIONED BY keeps the previous behavior: any table — including non-identity partitioned ones — can still be registered for read-only access.
A TODO is left to support non-identity transforms once DataFusion's grammar can express them.

Example

CREATE EXTERNAL TABLE my_iceberg_table
STORED AS ICEBERG LOCATION '/path/to/metadata.json'
PARTITIONED BY (event_date);

Are these changes tested?

Yes. Added unit tests in table_provider_factory.rs plus two metadata fixtures (bucket-partitioned and multi-identity-partitioned):

single identity column match / mismatch
multiple identity columns match / wrong order / subset (count mismatch)
non-identity (bucket[4]) transform rejected with a clear error
non-identity partitioned table still registers when PARTITIONED BY is omitted

cargo test -p iceberg-datafusion and cargo clippy -p iceberg-datafusion --all-targets pass.

…ernal tables `CREATE EXTERNAL TABLE ... STORED AS ICEBERG` previously rejected any `PARTITIONED BY` clause. Since DataFusion's grammar only accepts plain column names (it cannot express transforms such as `bucket[N]` or `day`), allow the clause for identity-partitioned tables and validate that the declared columns match the table's default partition spec, in order. Tables partitioned with non-identity transforms can still be registered by omitting the clause; specifying it returns a clear error pointing at the offending transform. Closes apache#2050

huan233usc · 2026-06-03T06:23:04Z

+/// non-identity transforms, can still be registered for read-only access without declaring
+/// its partitioning.
+fn validate_partition_columns(table: &Table, declared_partition_cols: &[String]) -> Result<()> {
+    if declared_partition_cols.is_empty() {


The behavior here is open for discussion.

We could choose ignore validation partition spec, pros is it will unblock user creating an external table that is partitioned(potentially with the case data fusion not supported), cons is the sql is not strictly accurate.

…rtition mismatch cases

…l tables

huan233usc · 2026-06-05T00:59:43Z

Hi @CTTY, can I get some feedback and thoughts from you when you have a chance? Thanks

CTTY · 2026-06-11T22:13:19Z

Hi @huan233usc , thanks for the contribution. Throwing errors on partition transforms looks good to me. However, I'm not sure if this is a problem that we want to solve at this point.

Currently we don't support CREATE EXTERNAL TABLE/register_table, and I think we should tackle #2021 first before coming to this. wdyt?

huan233usc · 2026-06-12T17:55:43Z

Hi @huan233usc , thanks for the contribution. Throwing errors on partition transforms looks good to me. However, I'm not sure if this is a problem that we want to solve at this point.

Currently we don't support CREATE EXTERNAL TABLE/register_table, and I think we should tackle #2021 first before coming to this. wdyt?

Hi @CTTY

Makes sense, thanks for the context.

Based on my observation, from a user perspective, the ideal priority would probably be:

CREATE TABLE
-- this is isn't really doable with stock DataFusion today unless we make
CREATE EXTERNAL TABLE ... LOCATION ... where LOCATION points to a storage path (creating a new table there)
-- iiuc is Support CREATE EXTERNAL TABLE backed by a Catalog with DataFusion #2021 is mainly about.
CREATE EXTERNAL TABLE ... LOCATION ... where LOCATION points to an existing metadata JSON / snapshot (read-only registration) (this PR handle, stepping back a bit this pr seems a bit redundant/unnecessary )

Let me know if anything I could help with #2021?
Thanks

huan233usc commented Jun 3, 2026

View reviewed changes

huan233usc mentioned this pull request Jun 3, 2026

Support CREATE EXTERNAL TABLE PARTITIONED BY syntax with DataFusion #2050

Open

huan233usc added 3 commits June 2, 2026 23:32

test(datafusion): dedupe metadata-location helpers and consolidate pa…

5cc0751

…rtition mismatch cases

chore(datafusion): drop self-referential issue link from TODO comment

729c86d

test(datafusion): add end-to-end SQL tests for PARTITIONED BY externa…

3a7fc98

…l tables

CTTY self-requested a review June 11, 2026 00:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(datafusion): support PARTITIONED BY for identity-partitioned external tables#2575

feat(datafusion): support PARTITIONED BY for identity-partitioned external tables#2575
huan233usc wants to merge 4 commits into
apache:mainfrom
huan233usc:feat/datafusion-external-table-partitioned-by

huan233usc commented Jun 3, 2026 •

edited

Loading

Uh oh!

huan233usc Jun 3, 2026

Uh oh!

huan233usc commented Jun 5, 2026

Uh oh!

CTTY commented Jun 11, 2026

Uh oh!

huan233usc commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

huan233usc commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

What changes are included in this PR?

Example

Are these changes tested?

Uh oh!

huan233usc Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

huan233usc commented Jun 5, 2026

Uh oh!

CTTY commented Jun 11, 2026

Uh oh!

huan233usc commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

huan233usc commented Jun 3, 2026 •

edited

Loading