[pull] main from apache:main by pull[bot] · Pull Request #90 · buraksenn/datafusion

pull · 2026-04-09T00:33:25Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

…#21460) ## Which issue does this PR close?  - Closes #21459 ## Rationale for this change When a `ProjectionExec` sits on top of a `FilterExec` that already carries an explicit projection, the `ProjectionPushdown` optimizer attempts to swap them via `try_swapping_with_projection`. The swap replaces the `FilterExec's` input with the narrower `ProjectionExec`, but `FilterExecBuilder::from(self)` carried over the old projection indices (e.g. [0, 1, 2]). After the swap the new input only has the columns selected by the `ProjectionExec` (e.g. 2 columns), so .build() tries to validate the stale projection against the narrower schema and panics with "project index 2 out of bounds, max field 2". ## What changes are included in this PR? In `FilterExec::try_swapping_with_projection`, after replacing the input with the narrower ProjectionExec, clear the FilterExec's own projection via .`apply_projection(None)`. The ProjectionExec that is now the input already handles column selection, so the FilterExec no longer needs its own projection. ## Are these changes tested? yes, add test case ## Are there any user-facing changes?

## Which issue does this PR close? - Closes #21316. ## Rationale for this change `GROUPING SETS` with duplicate grouping lists were incorrectly collapsed during execution. The internal grouping id only encoded the semantic null mask, so repeated grouping sets shared the same execution key and were merged, which caused rows to be lost compared with PostgreSQL behavior. For example, with: ```sql create table duplicate_grouping_sets(deptno int, job varchar, sal int, comm int); insert into duplicate_grouping_sets values (10, 'CLERK', 1300, null), (20, 'MANAGER', 3000, null); select deptno, job, sal, sum(comm), grouping(deptno), grouping(job), grouping(sal) from duplicate_grouping_sets group by grouping sets ((deptno, job), (deptno, sal), (deptno, job)) order by deptno, job, sal, grouping(deptno), grouping(job), grouping(sal); ``` PostgreSQL preserves the duplicate grouping set and returns: ```text deptno | job | sal | sum | grouping | grouping | grouping --------+---------+------+-----+----------+----------+---------- 10 | CLERK | | | 0 | 0 | 1 10 | CLERK | | | 0 | 0 | 1 10 | | 1300 | | 0 | 1 | 0 20 | MANAGER | | | 0 | 0 | 1 20 | MANAGER | | | 0 | 0 | 1 20 | | 3000 | | 0 | 1 | 0 (6 rows) ``` Before this fix, DataFusion collapsed the duplicate `(deptno, job)` grouping set and returned only 4 rows for the same query shape. ```text +--------+---------+------+-----------------------------------+------------------------------------------+---------------------------------------+---------------------------------------+ | deptno | job | sal | sum(duplicate_grouping_sets.comm) | grouping(duplicate_grouping_sets.deptno) | grouping(duplicate_grouping_sets.job) | grouping(duplicate_grouping_sets.sal) | +--------+---------+------+-----------------------------------+------------------------------------------+---------------------------------------+---------------------------------------+ | 10 | CLERK | NULL | NULL | 0 | 0 | 1 | | 10 | NULL | 1300 | NULL | 0 | 1 | 0 | | 20 | MANAGER | NULL | NULL | 0 | 0 | 1 | | 20 | NULL | 3000 | NULL | 0 | 1 | 0 | +--------+---------+------+-----------------------------------+------------------------------------------+---------------------------------------+---------------------------------------+ ``` ## What changes are included in this PR? - Preserve duplicate grouping sets by packing a duplicate ordinal into the high bits of `__grouping_id`, so repeated occurrences of the same grouping set pattern produce distinct execution keys. - `GROUPING()` now reads the actual `__grouping_id` column type directly from the schema (via `Aggregate::grouping_id_type` rather than inferring bit width from the count of grouping expressions alone. This ensures bitmask literals are correctly sized when duplicate-ordinal bits widen the column type beyond what the expression count would imply. - `GROUPING()` masks off the ordinal bits before returning the result, so the duplicate-ordinal encoding is invisible to user-facing SQL and semantics remain unchanged. - Add regression coverage for the duplicate `GROUPING SETS` case in: - `datafusion/core/tests/sql/aggregates/basic.rs` - `datafusion/sqllogictest/test_files/group_by.slt` ## Are these changes tested? - `cargo fmt --all` - `cargo test -p datafusion duplicate_grouping_sets_are_preserved` - `cargo test -p datafusion-physical-plan grouping_sets_preserve_duplicate_groups` - `cargo test -p datafusion-physical-plan evaluate_group_by_supports_duplicate_grouping_sets_with_eight_columns` - PostgreSQL validation against the same query/result shape ## Are there any user-facing changes? - Yes. Queries that contain duplicate `GROUPING SETS` entries now return the correct duplicated result rows, matching PostgreSQL behavior. --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

…#21321) ## Which issue does this PR close? - Closes #21320 ### Rationale for this change When one side of a LEFT/RIGHT/FULL outer join is an EmptyRelation, the current PropagateEmptyRelation optimizer rule leaves the join untouched. This means the engine still builds a hash table for the empty side, probes every row from the non-empty side, finds zero matches, and pads NULLs — all wasted work. The TODO at lines 76-80 of propagate_empty_relation.rs explicitly called out this gap: ``` // TODO: For LeftOut/Full Join, if the right side is empty, the Join can be eliminated // with a Projection with left side columns + right side columns replaced with null values. // For RightOut/Full Join, if the left side is empty, the Join can be eliminated // with a Projection with right side columns + left side columns replaced with null values. ``` ### What changes are included in this PR? Extends the PropagateEmptyRelation rule to handle 4 previously unoptimized cases by replacing the join with a Projection that null-pads the empty side's columns: ### Are these changes tested? Yes. 4 new unit tests added: ### Are there any user-facing changes? No API changes. --------- Co-authored-by: Subham Singhal <subhamsinghal@Subhams-MacBook-Air.local> Co-authored-by: Dmitrii Blaginin <dmitrii@blaginin.me>

Bumps [cryptography](https://github.com/pyca/cryptography) from 46.0.6 to 46.0.7. <details> <summary>Changelog</summary> Sourced from <a href="https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst">cryptography's changelog</a>. <blockquote> 46.0.7 - 2026-04-07 <pre><code> * **SECURITY ISSUE**: Fixed an issue where non-contiguous buffers could be passed to APIs that accept Python buffers, which could lead to buffer overflow. **CVE-2026-39892** * Updated Windows, macOS, and Linux wheels to be compiled with OpenSSL 3.5.6. .. _v46-0-6: </code></pre> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/pyca/cryptography/commit/622d672e429a7cff836a23c5903683dbec1901f5"><code>622d672</code></a> 46.0.7 release (<a href="https://redirect.github.com/pyca/cryptography/issues/14602">#14602</a>)</li> <li>See full diff in <a href="https://github.com/pyca/cryptography/compare/46.0.6...46.0.7">compare view</a></li> </ul> </details> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=cryptography&package-manager=uv&previous-version=46.0.6&new-version=46.0.7)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/apache/datafusion/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

haohuaijin and others added 4 commits April 8, 2026 18:41

pull bot locked and limited conversation to collaborators Apr 9, 2026

pull bot added the ⤵️ pull label Apr 9, 2026

pull bot merged commit 8f77a3b into buraksenn:main Apr 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] main from apache:main#90

[pull] main from apache:main#90
pull[bot] merged 4 commits intoburaksenn:mainfrom
apache:main

pull bot commented Apr 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pull bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pull bot commented Apr 9, 2026 •

edited

Loading