Skip to content

Enable pushing aggregate past Join by default#6538

Draft
deniskuzZ wants to merge 1 commit into
apache:masterfrom
deniskuzZ:aggr_join
Draft

Enable pushing aggregate past Join by default#6538
deniskuzZ wants to merge 1 commit into
apache:masterfrom
deniskuzZ:aggr_join

Conversation

@deniskuzZ

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

https://issues.apache.org/jira/browse/HIVE-10785?focusedCommentId=14906571&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14906571

Why are the changes needed?

Perf optimization

Does this PR introduce any user-facing change?

How was this patch tested?

@deniskuzZ deniskuzZ marked this pull request as draft June 12, 2026 15:42
@sonarqubecloud

Copy link
Copy Markdown

@aturoczy

Copy link
Copy Markdown

Does it need anything else for this PR? It seems like to resolve the performance part just need to allow it. Is there any concern to keep as a draft pr?

@deniskuzZ

deniskuzZ commented Jun 15, 2026

Copy link
Copy Markdown
Member Author

running with set hive.transpose.aggr.join=true (default is false).

It enables Hive's Calcite rule HiveAggregateJoinTransposeRule, which pushes an aggregation below a join — i.e., aggregate first, then join, instead of join then aggregate.

Concrete example (q4): the query joins store_sales/catalog_sales/web_sales to customer and then does SUM(...) GROUP BY c_customer_id, ….

  • transpose-OFF: join the full fact rows to customer first (~539M rows), then aggregate.
  • transpose-ON: push the per-channel SUM below the customer join, so the wide GROUP BY collapses ~539M → a few million rows before the join. The plan signature: the customer side picks up a count() and the join emits CAST(sum * count AS decimal).

TPCDS results:

  • Wins: q2 (139→20), q4, q22, q71, q86, q59 — when the pre-aggregation genuinely shrinks the join input.
  • Regressions (~14): q47 (89→25 OFF-is-better), q78, q98, q25, q51, q87… — when pushing the agg down doesn't shrink things but adds a wide-key shuffle.

The root cause of the regressions: Hive's default cost model is cardinality-only (cpu=io=0), so the rule fires on tiny rowcount differences and can't tell a beneficial transpose from a harmful one. That's why "best-of per-query" beats turning it globally ON or OFF.

Turning on hive.cbo.costmodel.extended makes cpu/io non-zero everywhere, which changes the entire CBO, not just the transpose rule — join ordering, join algorithm selection, etc. On the 5-query cherry-picked subset (q4/q14/q47/q78/q98) it won (458 vs 471 OFF vs 574 ON) because those were hand-picked transpose regressors.
But across all 99, the new regressions it introduces elsewhere outweigh the transpose decisions it fixes — most visibly it regressed q4 via join-order (~88→105s), nothing to do with transpose. Broad blast radius = net loss.

cc @kasakrisz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants