Skip to content

[SPARK-56241][SQL] Derive outputOrdering from KeyedPartitioning key expressions#55036

Draft
peter-toth wants to merge 2 commits intoapache:masterfrom
peter-toth:SPARK-56241-outputordering-from-keyedpartitioning
Draft

[SPARK-56241][SQL] Derive outputOrdering from KeyedPartitioning key expressions#55036
peter-toth wants to merge 2 commits intoapache:masterfrom
peter-toth:SPARK-56241-outputordering-from-keyedpartitioning

Conversation

@peter-toth
Copy link
Contributor

What changes were proposed in this pull request?

Within a KeyedPartitioning partition, all rows share the same key value, so the key expressions are trivially sorted (ascending) within each partition.

This PR makes two plan nodes expose that structural guarantee via outputOrdering:

  • DataSourceV2ScanExecBase: when outputPartitioning is a KeyedPartitioning, prepend one ascending SortOrder per key expression to whatever SupportsReportOrdering reports, merging overlapping sameOrderExpressions in a single pass.

  • GroupPartitionsExec:

    • Non-coalescing (every group has ≤ 1 input partition): pass through child.outputOrdering unchanged.
    • Coalescing without reducers: re-derive ordering from the output KeyedPartitioning key expressions; a join may embed multiple KeyedPartitionings with different expressions — expose equivalences via sameOrderExpressions.
    • Coalescing with reducers: fall back to super.outputOrdering (empty), because merged partitions share only the reduced key.

Why are the changes needed?

Before this change, outputOrdering on both nodes returned an empty sequence (unless SupportsReportOrdering was implemented), even though the within- partition ordering was structurally guaranteed by the partitioning itself. As a result, EnsureRequirements would insert a redundant SortExec before SortMergeJoin inputs that are already in key order.

Does this PR introduce any user-facing change?

Yes. Queries involving storage-partitioned joins (v2 bucketing) no longer add a redundant SortExec before SortMergeJoin when the join keys match the partition keys, reducing CPU and memory overhead.

How was this patch tested?

  • New unit test class GroupPartitionsExecSuite covering all four outputOrdering branches (non-coalescing, coalescing without reducers with single and multi-key, join sameOrderExpressions, coalescing with reducers).
  • New SQL integration tests in KeyGroupedPartitioningSuite:
    • Scan with KeyedPartitioning reports key-derived outputOrdering.
    • Non-coalescing GroupPartitionsExec (non-identical key sets) passes through child ordering — no pre-join SortExec.
    • Coalescing GroupPartitionsExec derives ordering from key expressions — no pre-join SortExec.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Sonnet 4.6

…xpressions

### What changes were proposed in this pull request?

Within a `KeyedPartitioning` partition, all rows share the same key value, so
the key expressions are trivially sorted (ascending) within each partition.

This PR makes two plan nodes expose that structural guarantee via
`outputOrdering`:

- **`DataSourceV2ScanExecBase`**: when `outputPartitioning` is a
  `KeyedPartitioning`, prepend one ascending `SortOrder` per key expression
  to whatever `SupportsReportOrdering` reports, merging overlapping
  `sameOrderExpressions` in a single pass.

- **`GroupPartitionsExec`**:
  - *Non-coalescing* (every group has ≤ 1 input partition): pass through
    `child.outputOrdering` unchanged.
  - *Coalescing without reducers*: re-derive ordering from the output
    `KeyedPartitioning` key expressions; a join may embed multiple
    `KeyedPartitioning`s with different expressions — expose equivalences
    via `sameOrderExpressions`.
  - *Coalescing with reducers*: fall back to `super.outputOrdering` (empty),
    because merged partitions share only the reduced key.

### Why are the changes needed?

Before this change, `outputOrdering` on both nodes returned an empty sequence
(unless `SupportsReportOrdering` was implemented), even though the within-
partition ordering was structurally guaranteed by the partitioning itself.
As a result, `EnsureRequirements` would insert a redundant `SortExec` before
`SortMergeJoin` inputs that are already in key order.

### Does this PR introduce _any_ user-facing change?

Yes. Queries involving storage-partitioned joins (v2 bucketing) no longer add
a redundant `SortExec` before `SortMergeJoin` when the join keys match the
partition keys, reducing CPU and memory overhead.

### How was this patch tested?

- New unit test class `GroupPartitionsExecSuite` covering all four
  `outputOrdering` branches (non-coalescing, coalescing without reducers with
  single and multi-key, join `sameOrderExpressions`, coalescing with reducers).
- New SQL integration tests in `KeyGroupedPartitioningSuite` (SPARK-56241):
  - Scan with `KeyedPartitioning` reports key-derived `outputOrdering`.
  - Non-coalescing `GroupPartitionsExec` (non-identical key sets) passes
    through child ordering — no pre-join `SortExec`.
  - Coalescing `GroupPartitionsExec` derives ordering from key expressions —
    no pre-join `SortExec`.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Sonnet 4.6
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a nice improvement. I expected many generated query plan changes in the test case, but there is no change from the existing generated plan. Is there any reason, @peter-toth ?

@peter-toth
Copy link
Contributor Author

peter-toth commented Mar 26, 2026

It's a nice improvement. I expected many generated query plan changes in the test case, but there is no change from the existing generated plan. Is there any reason, @peter-toth ?

We don't have any prodiction ready DSv2 filesources in Spark so the generated test plans / expected outputs doesn't cover this feature either.

@dongjoon-hyun
Copy link
Member

Got it~

dongjoon-hyun
dongjoon-hyun previously approved these changes Mar 26, 2026
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @peter-toth .

@dongjoon-hyun
Copy link
Member

cc @cloud-fan , @szehon-ho , @aokolnychyi , @gengliangwang , too

@peter-toth
Copy link
Contributor Author

peter-toth commented Mar 26, 2026

Iceberg can benefit from the change.
I will add a follow-up improvement in the scope of SPARK-55715 to keep ordering even when we coalesce partitions, and once @anuragmantri's apache/iceberg#14948 is also merged it will be a major improvement.

@peter-toth peter-toth force-pushed the SPARK-56241-outputordering-from-keyedpartitioning branch from 7946dce to 4260f53 Compare March 26, 2026 19:39
@peter-toth peter-toth marked this pull request as draft March 26, 2026 20:21
@peter-toth
Copy link
Contributor Author

peter-toth commented Mar 26, 2026

Marked as draft for now. Let me doublecheck a few edgecases as changing the reported ordering without the concept of constant order, which would be safe to prepend to any ordering, can be problematic.

@dongjoon-hyun dongjoon-hyun dismissed their stale review March 26, 2026 21:07

Stale review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants