Skip to content

[pull] main from apache:main#90

Merged
pull[bot] merged 4 commits intoburaksenn:mainfrom
apache:main
Apr 9, 2026
Merged

[pull] main from apache:main#90
pull[bot] merged 4 commits intoburaksenn:mainfrom
apache:main

Conversation

@pull
Copy link
Copy Markdown

@pull pull bot commented Apr 9, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

haohuaijin and others added 4 commits April 8, 2026 18:41
…#21460)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #21459 

## Rationale for this change

When a `ProjectionExec` sits on top of a `FilterExec` that already
carries an explicit projection, the `ProjectionPushdown` optimizer
attempts to swap them via `try_swapping_with_projection`. The swap
replaces the `FilterExec's` input with the narrower `ProjectionExec`,
but `FilterExecBuilder::from(self)` carried over the old projection
indices (e.g. [0, 1, 2]). After the swap the new input only has the
columns selected by the `ProjectionExec` (e.g. 2 columns), so .build()
tries to validate the stale projection against the narrower schema and
panics with "project index 2 out of bounds, max field 2".

## What changes are included in this PR?

In `FilterExec::try_swapping_with_projection`, after replacing the input
with the narrower ProjectionExec, clear the FilterExec's own projection
via .`apply_projection(None)`. The ProjectionExec that is now the input
already handles column selection, so the FilterExec no longer needs its
own projection.



## Are these changes tested?

yes, add test case

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
## Which issue does this PR close?

- Closes #21316.

## Rationale for this change

`GROUPING SETS` with duplicate grouping lists were incorrectly collapsed
during execution. The internal grouping id only encoded the semantic
null mask, so repeated grouping sets shared the same execution key and
were merged, which caused rows to be lost compared with PostgreSQL
behavior.

For example, with:

```sql
create table duplicate_grouping_sets(deptno int, job varchar, sal int, comm int);
insert into duplicate_grouping_sets values
(10, 'CLERK', 1300, null),
(20, 'MANAGER', 3000, null);

select deptno, job, sal, sum(comm), grouping(deptno), grouping(job), grouping(sal)
from duplicate_grouping_sets
group by grouping sets ((deptno, job), (deptno, sal), (deptno, job))
order by deptno, job, sal, grouping(deptno), grouping(job), grouping(sal);
```

PostgreSQL preserves the duplicate grouping set and returns:

```text
 deptno |   job   | sal  | sum | grouping | grouping | grouping
--------+---------+------+-----+----------+----------+----------
     10 | CLERK   |      |     |        0 |        0 |        1
     10 | CLERK   |      |     |        0 |        0 |        1
     10 |         | 1300 |     |        0 |        1 |        0
     20 | MANAGER |      |     |        0 |        0 |        1
     20 | MANAGER |      |     |        0 |        0 |        1
     20 |         | 3000 |     |        0 |        1 |        0
(6 rows)
```

Before this fix, DataFusion collapsed the duplicate `(deptno, job)`
grouping set and returned only 4 rows for the same query shape.

```text
+--------+---------+------+-----------------------------------+------------------------------------------+---------------------------------------+---------------------------------------+
| deptno | job     | sal  | sum(duplicate_grouping_sets.comm) | grouping(duplicate_grouping_sets.deptno) | grouping(duplicate_grouping_sets.job) | grouping(duplicate_grouping_sets.sal) |
+--------+---------+------+-----------------------------------+------------------------------------------+---------------------------------------+---------------------------------------+
| 10     | CLERK   | NULL | NULL                              | 0                                        | 0                                     | 1                                     |
| 10     | NULL    | 1300 | NULL                              | 0                                        | 1                                     | 0                                     |
| 20     | MANAGER | NULL | NULL                              | 0                                        | 0                                     | 1                                     |
| 20     | NULL    | 3000 | NULL                              | 0                                        | 1                                     | 0                                     |
+--------+---------+------+-----------------------------------+------------------------------------------+---------------------------------------+---------------------------------------+
```

## What changes are included in this PR?

- Preserve duplicate grouping sets by packing a duplicate ordinal into
the high bits of `__grouping_id`, so repeated occurrences of the same
grouping set pattern produce distinct execution keys.
- `GROUPING()` now reads the actual `__grouping_id` column type directly
from the schema (via `Aggregate::grouping_id_type` rather than inferring
bit width from the count of grouping expressions alone. This ensures
bitmask literals are correctly sized when duplicate-ordinal bits widen
the column type beyond what the expression count would imply.
- `GROUPING()` masks off the ordinal bits before returning the result,
so the duplicate-ordinal encoding is invisible to user-facing SQL and
semantics remain unchanged.
- Add regression coverage for the duplicate `GROUPING SETS` case in:
  - `datafusion/core/tests/sql/aggregates/basic.rs`
  - `datafusion/sqllogictest/test_files/group_by.slt`

## Are these changes tested?

- `cargo fmt --all`
- `cargo test -p datafusion duplicate_grouping_sets_are_preserved`
- `cargo test -p datafusion-physical-plan
grouping_sets_preserve_duplicate_groups`
- `cargo test -p datafusion-physical-plan
evaluate_group_by_supports_duplicate_grouping_sets_with_eight_columns`
- PostgreSQL validation against the same query/result shape

## Are there any user-facing changes?

- Yes. Queries that contain duplicate `GROUPING SETS` entries now return
the correct duplicated result rows, matching PostgreSQL behavior.

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…#21321)

## Which issue does this PR close?

- Closes #21320

 ### Rationale for this change

When one side of a LEFT/RIGHT/FULL outer join is an EmptyRelation, the
current PropagateEmptyRelation optimizer rule leaves the join untouched.
This means the engine still builds a hash table for the empty side,
probes every row from the non-empty side, finds zero matches, and pads
NULLs — all wasted work.

The TODO at lines 76-80 of propagate_empty_relation.rs explicitly called
out this gap:
```
// TODO: For LeftOut/Full Join, if the right side is empty, the Join can be eliminated
// with a Projection with left side columns + right side columns replaced with null values.
// For RightOut/Full Join, if the left side is empty, the Join can be eliminated
// with a Projection with right side columns + left side columns replaced with null values.
```

 ### What changes are included in this PR?

Extends the PropagateEmptyRelation rule to handle 4 previously
unoptimized cases by replacing the join with a Projection that null-pads
the empty side's columns:

 ### Are these changes tested?

 Yes. 4 new unit tests added:

###  Are there any user-facing changes?

 No API changes.

---------

Co-authored-by: Subham Singhal <subhamsinghal@Subhams-MacBook-Air.local>
Co-authored-by: Dmitrii Blaginin <dmitrii@blaginin.me>
Bumps [cryptography](https://github.com/pyca/cryptography) from 46.0.6
to 46.0.7.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst">cryptography's
changelog</a>.</em></p>
<blockquote>
<p>46.0.7 - 2026-04-07</p>
<pre><code>
* **SECURITY ISSUE**: Fixed an issue where non-contiguous buffers could
be
  passed to APIs that accept Python buffers, which could lead to buffer
  overflow. **CVE-2026-39892**
* Updated Windows, macOS, and Linux wheels to be compiled with OpenSSL
3.5.6.
<p>.. _v46-0-6:<br />
</code></pre></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/pyca/cryptography/commit/622d672e429a7cff836a23c5903683dbec1901f5"><code>622d672</code></a>
46.0.7 release (<a
href="https://redirect.github.com/pyca/cryptography/issues/14602">#14602</a>)</li>
<li>See full diff in <a
href="https://github.com/pyca/cryptography/compare/46.0.6...46.0.7">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=cryptography&package-manager=uv&previous-version=46.0.6&new-version=46.0.7)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/apache/datafusion/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@pull pull bot locked and limited conversation to collaborators Apr 9, 2026
@pull pull bot added the ⤵️ pull label Apr 9, 2026
@pull pull bot merged commit 8f77a3b into buraksenn:main Apr 9, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants