Skip to content

Unparser drops ORDER BY alias when flattening Projection through SubqueryAlias#21491

Open
yonatan-sevenai wants to merge 9 commits intoapache:mainfrom
yonatan-sevenai:bug/sort_push
Open

Unparser drops ORDER BY alias when flattening Projection through SubqueryAlias#21491
yonatan-sevenai wants to merge 9 commits intoapache:mainfrom
yonatan-sevenai:bug/sort_push

Conversation

@yonatan-sevenai
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

When plan_to_sql encounters a Projection → Sort → Projection → SubqueryAlias plan where the outer Projection excludes a Sort
column defined as an alias in the inner Projection (e.g. Z AS c), the function rewrite_plan_for_sort_on_non_projected_fields
flattens the two Projections into one but only keeps the outer Projection's columns. This drops the alias definition, leaving
ORDER BY c referencing a column that no longer exists in the generated SQL.

What changes are included in this PR?

In rewrite_plan_for_sort_on_non_projected_fields (datafusion/sql/src/unparser/rewrite.rs), when the inner Projection is
trimmed to only the outer Projection's expressions, sort expressions that reference dropped aliases are now inlined to the
underlying physical expression. For example, ORDER BY c becomes ORDER BY t."Z" when c was defined as Z AS c in the
dropped inner Projection.

Are these changes tested?

Yes. A new regression test test_sort_on_aliased_column_dropped_by_outer_projection in
datafusion/sql/tests/cases/plan_to_sql.rs constructs the exact plan shape that triggers the bug and asserts the correct SQL
output:

SELECT t."X" AS a, t."Y" AS b FROM phys_table AS t ORDER BY t."Z" DESC NULLS FIRST LIMIT 1

All existing plan_to_sql tests (118 tests) continue to pass.

Are there any user-facing changes?

No API changes. Previously invalid SQL is now generated correctly.

yonatan-sevenai and others added 9 commits March 22, 2026 00:06
…gregate

When the SQL unparser encountered a SubqueryAlias node whose direct
child was an Aggregate (or other clause-building plan like Window, Sort,
Limit, Union), it would flatten the subquery into a simple table alias,
losing the aggregate entirely.

For example, a plan representing:
  SELECT j1.col FROM j1 JOIN (SELECT max(id) AS m FROM j2) AS b ON j1.id = b.m

would unparse to:
  SELECT j1.col FROM j1 INNER JOIN j2 AS b ON j1.id = b.m

dropping the MAX aggregate and the subquery.

Root cause: the SubqueryAlias handler in select_to_sql_recursively would
call subquery_alias_inner_query_and_columns (which only unwraps
Projection children) and unparse_table_scan_pushdown (which only handles
TableScan/SubqueryAlias/Projection). When both returned nothing useful
for an Aggregate child, the code recursed directly into the Aggregate,
merging its GROUP BY into the outer SELECT instead of wrapping it in a
derived subquery.

The fix adds an early check: if the SubqueryAlias's direct child is a
plan type that builds its own SELECT clauses (Aggregate, Window, Sort,
Limit, Union), emit it as a derived subquery via self.derive() with the
alias always attached, rather than falling through to the recursive
path that would flatten it.
…gregate

When the SQL unparser encountered a SubqueryAlias node whose direct
child was an Aggregate (or other clause-building plan like Window, Sort,
Limit, Union), it would flatten the subquery into a simple table alias,
losing the aggregate entirely.

For example, a plan representing:
  SELECT j1.col FROM j1 JOIN (SELECT max(id) AS m FROM j2) AS b ON j1.id = b.m

would unparse to:
  SELECT j1.col FROM j1 INNER JOIN j2 AS b ON j1.id = b.m

dropping the MAX aggregate and the subquery.

Root cause: the SubqueryAlias handler in select_to_sql_recursively would
call subquery_alias_inner_query_and_columns (which only unwraps
Projection children) and unparse_table_scan_pushdown (which only handles
TableScan/SubqueryAlias/Projection). When both returned nothing useful
for an Aggregate child, the code recursed directly into the Aggregate,
merging its GROUP BY into the outer SELECT instead of wrapping it in a
derived subquery.

The fix adds an early check: if the SubqueryAlias's direct child is a
plan type that builds its own SELECT clauses (Aggregate, Window, Sort,
Limit, Union), emit it as a derived subquery via self.derive() with the
alias always attached, rather than falling through to the recursive
path that would flatten it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rojection

In `rewrite_plan_for_sort_on_non_projected_fields`, when the outer
Projection excludes a Sort column that was defined as an alias in the
inner Projection (e.g. `Z AS c`), the rewrite replaced the inner
Projection's expressions with only the outer Projection's mapped
expressions, dropping the alias definition. This left ORDER BY
referencing a non-existent column.

The fix inlines the underlying physical expression into the Sort when
an alias is dropped (e.g. `ORDER BY c` → `ORDER BY t."Z"`).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Clarify what the algorithm does and why, remove example-specific
column names from the implementation comments.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added the sql SQL Planner label Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sql SQL Planner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Unparser drops column alias needed by ORDER BY when flattening Projection through SubqueryAlias

1 participant