Skip to content

Reverse Optimization fails when creating consecutive sorts due to Calcite physical optimizer merging #5125

@ahkcs

Description

@ahkcs

Description

The reverse command's sort-collation-reversal optimization does not work correctly when the reversed sort is placed directly on top of the original sort with no intermediate operator between them. Calcite's physical optimizer merges (or eliminates) consecutive LogicalSort nodes, causing the reversed direction to be lost.

Related PR: #4775

Root Cause

The reverse optimization in CalciteRelNodeVisitor.visitReverse() detects an existing sort collation and creates a new LogicalSort with the reversed direction on top of the original sort. This produces a plan like:

LogicalSort(sort0=[$1], dir0=[DESC])      <-- reverse adds this
  LogicalSort(sort0=[$1], dir0=[ASC])     <-- original sort
    ...

While this logical plan is correct, Calcite's physical optimizer (e.g., SortRemoveRule) recognizes that two consecutive sorts on the same field are redundant and merges them — but it keeps the original (lower) sort direction instead of the top-level (reversed) one, effectively making the reverse a no-op.

When It Works vs. When It Fails

Works — when there is an intermediate operator (filter, project, eval) between the original sort and the reversed sort. The intermediate operator prevents Calcite from merging the two sorts:

-- WORKS: sort -> fields (project) -> reverse
source=accounts | sort account_number | fields account_number | reverse

Logical plan:
  LogicalSort(sort0=[$0], dir0=[DESC])     <-- reversed sort
    LogicalProject(account_number=[$0])    <-- intermediate operator prevents merging
      LogicalSort(sort0=[$0], dir0=[ASC])
        LogicalTableScan
-- WORKS: sort -> where (filter) -> reverse
source=accounts | sort account_number | where balance > 30000 | reverse

Logical plan:
  LogicalSort(sort0=[$0], dir0=[DESC])
    LogicalFilter(condition=[>($4, 30000)])  <-- intermediate operator prevents merging
      LogicalSort(sort0=[$0], dir0=[ASC])
        LogicalTableScan

Fails — when the reversed sort sits directly on top of the original sort with no intermediate operator:

-- FAILS: stats -> sort -> reverse (two consecutive sorts, no operator between them)
source=accounts | stats count() as c by gender | sort gender | reverse

Logical plan:
  LogicalSort(sort0=[$1], dir0=[DESC])     <-- reversed sort
    LogicalSort(sort0=[$1], dir0=[ASC])    <-- original sort (nothing between them)
      LogicalProject(c=[$1], gender=[$0])
        LogicalAggregate(...)

Expected: M, F (descending)
Actual:   F, M (ascending — reverse was ignored)
-- FAILS: timechart -> reverse (timechart adds a sort, reverse adds another directly on top)
source=events | timechart span=1m count() | reverse

Logical plan:
  LogicalSort(sort0=[$0], dir0=[DESC])     <-- reversed sort
    LogicalSort(sort0=[$0], dir0=[ASC])    <-- timechart's sort (nothing between them)
      LogicalAggregate(...)

Expected: 00:04, 00:03, 00:02, 00:01, 00:00 (descending)
Actual:   00:00, 00:01, 00:02, 00:03, 00:04 (ascending — reverse was ignored)

Affected Scenarios

Query Pattern Why It Fails
stats ... | sort field | reverse Sort after aggregation + reverse = consecutive sorts
timechart ... | reverse Timechart's built-in sort + reverse = consecutive sorts
timechart timefield=X ... | reverse Same as above with custom time field
timechart ... by field | reverse Same as above with group-by

Expected Behavior

reverse should always flip the sort direction, regardless of whether the reversed sort is directly adjacent to the original sort.

Suggested Fix

Instead of creating a new sort on top of the existing sort, modify the existing sort node in-place to flip its direction. This avoids creating consecutive sorts entirely. For example:

// Instead of:
context.relBuilder.sort(reversedCollation);  // adds a NEW sort on top

// Consider replacing the existing sort:
RelNode currentNode = context.relBuilder.build();  // pop current
// Replace the top-level Sort with one that has reversed collation
RelNode modifiedSort = LogicalSort.create(
    ((Sort) currentNode).getInput(),  // keep the sort's child
    reversedCollation,                // use reversed direction
    ((Sort) currentNode).offset,
    ((Sort) currentNode).fetch);
context.relBuilder.push(modifiedSort);

This would produce a single sort node with the correct (reversed) direction, eliminating the problem entirely.

Tests

The following integration tests in CalciteReverseCommandIT are affected and currently use verifyDataRows (unordered) as a workaround:

  • testReverseAfterAggregationWithSort
  • testTimechartWithReverse
  • testTimechartWithCustomTimefieldAndReverse
  • testTimechartWithGroupByAndReverse

The corresponding unit tests in CalcitePPLReverseTest verify the logical plan is correct — the issue is only in physical execution.

Metadata

Metadata

Assignees

Labels

PPLPiped processing languagebugSomething isn't workingcalcitecalcite migration releated

Type

No type

Projects

Status

Not Started

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions