-
Notifications
You must be signed in to change notification settings - Fork 181
Description
Description
The reverse command's sort-collation-reversal optimization does not work correctly when the reversed sort is placed directly on top of the original sort with no intermediate operator between them. Calcite's physical optimizer merges (or eliminates) consecutive LogicalSort nodes, causing the reversed direction to be lost.
Related PR: #4775
Root Cause
The reverse optimization in CalciteRelNodeVisitor.visitReverse() detects an existing sort collation and creates a new LogicalSort with the reversed direction on top of the original sort. This produces a plan like:
LogicalSort(sort0=[$1], dir0=[DESC]) <-- reverse adds this
LogicalSort(sort0=[$1], dir0=[ASC]) <-- original sort
...
While this logical plan is correct, Calcite's physical optimizer (e.g., SortRemoveRule) recognizes that two consecutive sorts on the same field are redundant and merges them — but it keeps the original (lower) sort direction instead of the top-level (reversed) one, effectively making the reverse a no-op.
When It Works vs. When It Fails
Works — when there is an intermediate operator (filter, project, eval) between the original sort and the reversed sort. The intermediate operator prevents Calcite from merging the two sorts:
-- WORKS: sort -> fields (project) -> reverse
source=accounts | sort account_number | fields account_number | reverse
Logical plan:
LogicalSort(sort0=[$0], dir0=[DESC]) <-- reversed sort
LogicalProject(account_number=[$0]) <-- intermediate operator prevents merging
LogicalSort(sort0=[$0], dir0=[ASC])
LogicalTableScan
-- WORKS: sort -> where (filter) -> reverse
source=accounts | sort account_number | where balance > 30000 | reverse
Logical plan:
LogicalSort(sort0=[$0], dir0=[DESC])
LogicalFilter(condition=[>($4, 30000)]) <-- intermediate operator prevents merging
LogicalSort(sort0=[$0], dir0=[ASC])
LogicalTableScan
Fails — when the reversed sort sits directly on top of the original sort with no intermediate operator:
-- FAILS: stats -> sort -> reverse (two consecutive sorts, no operator between them)
source=accounts | stats count() as c by gender | sort gender | reverse
Logical plan:
LogicalSort(sort0=[$1], dir0=[DESC]) <-- reversed sort
LogicalSort(sort0=[$1], dir0=[ASC]) <-- original sort (nothing between them)
LogicalProject(c=[$1], gender=[$0])
LogicalAggregate(...)
Expected: M, F (descending)
Actual: F, M (ascending — reverse was ignored)
-- FAILS: timechart -> reverse (timechart adds a sort, reverse adds another directly on top)
source=events | timechart span=1m count() | reverse
Logical plan:
LogicalSort(sort0=[$0], dir0=[DESC]) <-- reversed sort
LogicalSort(sort0=[$0], dir0=[ASC]) <-- timechart's sort (nothing between them)
LogicalAggregate(...)
Expected: 00:04, 00:03, 00:02, 00:01, 00:00 (descending)
Actual: 00:00, 00:01, 00:02, 00:03, 00:04 (ascending — reverse was ignored)
Affected Scenarios
| Query Pattern | Why It Fails |
|---|---|
stats ... | sort field | reverse |
Sort after aggregation + reverse = consecutive sorts |
timechart ... | reverse |
Timechart's built-in sort + reverse = consecutive sorts |
timechart timefield=X ... | reverse |
Same as above with custom time field |
timechart ... by field | reverse |
Same as above with group-by |
Expected Behavior
reverse should always flip the sort direction, regardless of whether the reversed sort is directly adjacent to the original sort.
Suggested Fix
Instead of creating a new sort on top of the existing sort, modify the existing sort node in-place to flip its direction. This avoids creating consecutive sorts entirely. For example:
// Instead of:
context.relBuilder.sort(reversedCollation); // adds a NEW sort on top
// Consider replacing the existing sort:
RelNode currentNode = context.relBuilder.build(); // pop current
// Replace the top-level Sort with one that has reversed collation
RelNode modifiedSort = LogicalSort.create(
((Sort) currentNode).getInput(), // keep the sort's child
reversedCollation, // use reversed direction
((Sort) currentNode).offset,
((Sort) currentNode).fetch);
context.relBuilder.push(modifiedSort);This would produce a single sort node with the correct (reversed) direction, eliminating the problem entirely.
Tests
The following integration tests in CalciteReverseCommandIT are affected and currently use verifyDataRows (unordered) as a workaround:
testReverseAfterAggregationWithSorttestTimechartWithReversetestTimechartWithCustomTimefieldAndReversetestTimechartWithGroupByAndReverse
The corresponding unit tests in CalcitePPLReverseTest verify the logical plan is correct — the issue is only in physical execution.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status