Enable pushing aggregate past Join by default by deniskuzZ · Pull Request #6538 · apache/hive

deniskuzZ · 2026-06-12T15:42:50Z

What changes were proposed in this pull request?

https://issues.apache.org/jira/browse/HIVE-10785?focusedCommentId=14906571&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14906571

Why are the changes needed?

Perf optimization

Does this PR introduce any user-facing change?

How was this patch tested?

sonarqubecloud · 2026-06-12T16:56:28Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.4% Duplication on New Code

See analysis details on SonarQube Cloud

aturoczy · 2026-06-14T08:41:11Z

Does it need anything else for this PR? It seems like to resolve the performance part just need to allow it. Is there any concern to keep as a draft pr?

deniskuzZ · 2026-06-15T12:53:03Z

running with set hive.transpose.aggr.join=true (default is false).

It enables Hive's Calcite rule HiveAggregateJoinTransposeRule, which pushes an aggregation below a join — i.e., aggregate first, then join, instead of join then aggregate.

Concrete example (q4): the query joins store_sales/catalog_sales/web_sales to customer and then does SUM(...) GROUP BY c_customer_id, ….

transpose-OFF: join the full fact rows to customer first (~539M rows), then aggregate.
transpose-ON: push the per-channel SUM below the customer join, so the wide GROUP BY collapses ~539M → a few million rows before the join. The plan signature: the customer side picks up a count() and the join emits CAST(sum * count AS decimal).

TPCDS results:

Wins: q2 (139→20), q4, q22, q71, q86, q59 — when the pre-aggregation genuinely shrinks the join input.
Regressions (~14): q47 (89→25 OFF-is-better), q78, q98, q25, q51, q87… — when pushing the agg down doesn't shrink things but adds a wide-key shuffle.

The root cause of the regressions: Hive's default cost model is cardinality-only (cpu=io=0), so the rule fires on tiny rowcount differences and can't tell a beneficial transpose from a harmful one. That's why "best-of per-query" beats turning it globally ON or OFF.

Turning on hive.cbo.costmodel.extended makes cpu/io non-zero everywhere, which changes the entire CBO, not just the transpose rule — join ordering, join algorithm selection, etc. On the 5-query cherry-picked subset (q4/q14/q47/q78/q98) it won (458 vs 471 OFF vs 574 ON) because those were hand-picked transpose regressors.
But across all 99, the new regressions it introduces elsewhere outweigh the transpose decisions it fixes — most visibly it regressed q4 via join-order (~88→105s), nothing to do with transpose. Broad blast radius = net loss.

cc @kasakrisz

Enable pushing aggregate past Join by default

c9c185a

deniskuzZ marked this pull request as draft June 12, 2026 15:42

asf-ci-hive added the tests pending label Jun 12, 2026

asf-ci-hive added tests failed and removed tests pending labels Jun 13, 2026

aturoczy approved these changes Jun 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable pushing aggregate past Join by default#6538

Enable pushing aggregate past Join by default#6538
deniskuzZ wants to merge 1 commit into
apache:masterfrom
deniskuzZ:aggr_join

deniskuzZ commented Jun 12, 2026

Uh oh!

sonarqubecloud Bot commented Jun 12, 2026

Uh oh!

aturoczy commented Jun 14, 2026

Uh oh!

deniskuzZ commented Jun 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

deniskuzZ commented Jun 12, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

sonarqubecloud Bot commented Jun 12, 2026

Quality Gate passed

Uh oh!

aturoczy commented Jun 14, 2026

Uh oh!

deniskuzZ commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

deniskuzZ commented Jun 15, 2026 •

edited

Loading