Skip to content

Conversation

@revans2
Copy link
Collaborator

@revans2 revans2 commented Dec 17, 2025

This is a step towards more join optimizations. It makes it so that the AST we compile is not tied to a specific build side. This gives us the flexibility to switch the build side so that the smaller wins, at least where the APIs allow it. It does not give a huge performance boost, because most of the places where we could take advantage of that were already in place. But it does allow us to get a more apples to apples comparison for some heuristics that are being added in.

Generally for inner, left outer, and right outer joins with AST post processing doing an inner + post filtering is much faster. So this adds a new heuristic to allow that. Also when no AST is involved the smaller build side can be a big win so we switch to an inner + post on left outer or right outer joins so that we can get the smaller build side.

This second heuristic is not as clean of a difference, and I need to do some more benchmarking to hopefully make it better. But for now it provides some decent uplift.

Description

I ran two sets of benchmarks locally. In once case I did a join between 50m rows on each side, with quite a bit of skew in the data.

In the second set I set it up to almost be distinct, but because the data is generated randomly it is not 100% distinct.

In the first case the heuristic with AUTO for JoinStrategy and AUTO for BuildSide selection ended up being about about 10% faster than the HASH_ONLY (Mostly FIXED) config that we have as the default today. Note that the SORT with post was way faster than the hash based joins, but I need to finish rapidsai/cudf#20826 before I can add int he heuristic for that.

In the other case it was about a 2% improvement. It wasn't huge, but it also was not a loss in performance on average. In that case the sort merge join was way slower than the hash based joins. Hence the need for the heuristic.

I do want to find some time to run NDS, but I think the results are compelling enough to get the PR up before I do that.

Checklists

  • This PR has added documentation for new or modified features or behaviors.
  • This PR has added new tests or modified existing tests to cover new code paths.
    (Please explain in the PR description how the new code paths are tested, such as names of the new/existing tests that cover them.)
  • Performance testing has been performed and its results are added in the PR description. Or, an issue has been filed with a link in the PR description.

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
@revans2
Copy link
Collaborator Author

revans2 commented Dec 17, 2025

build

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces AST join build side selection and heuristics to optimize GPU join operations. The key innovation is the ability to dynamically select the build side for joins at runtime based on row counts, rather than being locked into a fixed build side determined by the query plan.

Key changes:

  • New LazyCompiledCondition class that compiles AST expressions for either build side on-demand
  • Build side selection strategies (AUTO, FIXED, SMALLEST) configurable via spark.rapids.sql.join.buildSide
  • Heuristics that automatically switch to INNER_HASH_WITH_POST strategy for better performance with conditional joins and when the smaller side isn't the natural build side

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
LazyCompiledCondition.scala New class providing lazy, cached AST compilation for both build sides by transforming bound reference ordinals
GpuHashJoin.scala Adds JoinBuildSideSelection enum, selectStrategy heuristics, refactors join methods to support dynamic build side selection, updates iterators to use LazyCompiledCondition
RapidsConf.scala Adds JOIN_BUILD_SIDE configuration option and updates getJoinOptions to include build side selection
GpuShuffledSizedHashJoinExec.scala Updates BoundJoinExprs to bind conditions to leftOutput++rightOutput, adds createLazyCompiledCondition method, updates iterators
join_test.py Adds comprehensive tests for build side selection with unconditional, conditional, and symmetric joins
GpuDeleteFilter.scala Removes numFirstConditionColumns parameter now handled by LazyCompiledCondition, updates to pass GpuBuildRight explicitly
Comments suppressed due to low confidence (2)

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuHashJoin.scala:1610

  • There is a mismatch between the build side selected by innerHashJoin and the build side used to get the compiled condition. The innerHashJoin method selects a build side based on buildSideSelection and row counts (which may differ from the buildSide parameter when buildSideSelection is AUTO or SMALLEST). However, the code then uses the original buildSide parameter to get the compiled condition and for AST filtering. This could result in using the wrong compiled AST condition. The selected build side should be computed once before both calls and used consistently, or innerHashJoin should return which build side it selected.
    val innerMaps = JoinImpl.innerHashJoin(leftKeys, rightKeys,
      nullEquality == NullEquality.EQUAL, joinOptions.buildSideSelection, buildSide)

    // Get the appropriate compiled condition based on the build side
    val compiledCondition = lazyCompiledCondition.getForBuildSide(buildSide)

    // Filter by AST condition
    val filteredMaps = withResource(innerMaps) { _ =>
      JoinImpl.filterInnerJoinWithASTSwapByJoinTypeAndBuildSide(innerMaps, leftTable, rightTable,
        compiledCondition, joinType, buildSide)

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuHashJoin.scala:1640

  • Similar to the issue in computeInnerHashWithPost, there is a mismatch between the build side selected by innerSortJoin and the build side used to get the compiled condition. The selected build side should be computed once and used consistently for both the join operation and getting the compiled condition.
    val innerMaps = JoinImpl.innerSortJoin(leftKeys, rightKeys,
      nullEquality == NullEquality.EQUAL, joinOptions.buildSideSelection, buildSide)

    // Get the appropriate compiled condition based on the build side
    val compiledCondition = lazyCompiledCondition.getForBuildSide(buildSide)

    // Filter by AST condition
    val filteredMaps = withResource(innerMaps) { _ =>
      JoinImpl.filterInnerJoinWithASTSwapByJoinTypeAndBuildSide(innerMaps, leftTable, rightTable,
        compiledCondition, joinType, buildSide)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 850 to 852
// Heuristic 2: For unconditional LEFT_OUTER/RIGHT_OUTER when build side selection is AUTO,
// if the smaller side is not the fixed build side, use INNER_HASH_WITH_POST to enable
// dynamic build side selection
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says "when build side selection is AUTO" but the condition checks for both AUTO and SMALLEST. The comment should be updated to reflect that this heuristic applies to both AUTO and SMALLEST build side selection strategies.

Suggested change
// Heuristic 2: For unconditional LEFT_OUTER/RIGHT_OUTER when build side selection is AUTO,
// if the smaller side is not the fixed build side, use INNER_HASH_WITH_POST to enable
// dynamic build side selection
// Heuristic 2: For unconditional LEFT_OUTER/RIGHT_OUTER when build side selection is
// AUTO or SMALLEST, if the smaller side is not the fixed build side, use
// INNER_HASH_WITH_POST to enable dynamic build side selection

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with copilot, we should make the comment match

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an impressive catch.

Also, are we using Copilot or Greptile? Both?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like to use both. They tend to catch different things

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we are alternating, there might be an A/B test :)

Comment on lines 1952 to 1956
val innerMaps = JoinImpl.innerHashJoin(leftKeys, rightKeys, compareNullsEqual,
joinOptions.buildSideSelection, cudfBuildSide)

// Get the appropriate compiled condition based on the cudfBuildSide
val compiledCondition = lazyCondition.getForBuildSide(cudfBuildSide)
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar build side mismatch issue: innerHashJoin is called with cudfBuildSide but may select a different build side when buildSideSelection is AUTO or SMALLEST. The code then uses cudfBuildSide to get the compiled condition, which could be incorrect. The selected build side should be determined before both the join and getting the compiled condition.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually in cases when we do an INNER join with post processing to turn it into another join this is not a problem. The AST filtering is totally separated from the join build side. The only time they have to match is when they are done together as a single cudf call. In response to the obvious confusion about this I have updated the comments and the names of several variables to make it clear the distinction between the build side used for data movement and the build side used at the physical layer for when the join actually happens. The two are related to each other and are likely to have a lot of interplay, especially when we start to do build object caching. But until then the code is actually correct.

Comment on lines 1987 to 1988
// Get the appropriate compiled condition based on the cudfBuildSide
val compiledCondition = lazyCondition.getForBuildSide(cudfBuildSide)
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar build side mismatch issue: innerSortJoin is called with cudfBuildSide but may select a different build side when buildSideSelection is AUTO or SMALLEST. The code then uses cudfBuildSide to get the compiled condition, which could be incorrect.

Suggested change
// Get the appropriate compiled condition based on the cudfBuildSide
val compiledCondition = lazyCondition.getForBuildSide(cudfBuildSide)
// Get the appropriate compiled condition based on the actual build side used
val compiledCondition = lazyCondition.getForBuildSide(innerMaps.buildSide)

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice to add a clarifying comment here. It's not clear to me either why buildSideSelection is used in line 1984 and cudfBuildSide is used after to materialize a non-lazy compiled condition.

* Options to control join behavior.
* @param strategy the join strategy to use (AUTO, INNER_HASH_WITH_POST, INNER_SORT_WITH_POST,
* or HASH_ONLY)
* @param buildSideSelection the build side selection strategy (AUTO, DEFAULT, or SMALLEST)
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation comment refers to "DEFAULT" as a build side selection strategy, but the enumeration defines "FIXED" instead. The comment should say "FIXED" to match the actual enumeration value.

Suggested change
* @param buildSideSelection the build side selection strategy (AUTO, DEFAULT, or SMALLEST)
* @param buildSideSelection the build side selection strategy (AUTO, FIXED, or SMALLEST)

Copilot uses AI. Check for mistakes.
Comment on lines 850 to 852
// Heuristic 2: For unconditional LEFT_OUTER/RIGHT_OUTER when build side selection is AUTO,
// if the smaller side is not the fixed build side, use INNER_HASH_WITH_POST to enable
// dynamic build side selection
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with copilot, we should make the comment match

rightRowCount: Long): JoinStrategy = {
if (configuredStrategy != AUTO) {
// If not AUTO, use the configured strategy as-is
return configuredStrategy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, should we rework this without return? I know that's discouraged in scala.

Comment on lines 1987 to 1988
// Get the appropriate compiled condition based on the cudfBuildSide
val compiledCondition = lazyCondition.getForBuildSide(cudfBuildSide)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice to add a clarifying comment here. It's not clear to me either why buildSideSelection is used in line 1984 and cudfBuildSide is used after to materialize a non-lazy compiled condition.

@revans2
Copy link
Collaborator Author

revans2 commented Dec 31, 2025

build

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 31, 2025

Greptile Summary

  • Implements AST join build side selection optimization allowing GPU joins to dynamically choose the build side independently of query plan decisions, enabling performance improvements through automatic selection of the smaller table as build side
  • Adds new join strategy heuristics that automatically select INNER_HASH_WITH_POST for conditional joins with specific join types, showing ~10% performance improvements in benchmarks
  • Introduces LazyCompiledCondition class to decouple AST compilation from build side selection, allowing runtime build side switching without recompiling the join condition AST

Important Files Changed

Filename Overview
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/LazyCompiledCondition.scala New utility class providing lazy AST compilation for join conditions with dynamic build side transformation
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuHashJoin.scala Core join optimization implementation with build side selection framework and strategy heuristics
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuShuffledSizedHashJoinExec.scala Refactored to use LazyCompiledCondition and support dynamic build side selection throughout join iterators
sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala Added new configuration option for join build side selection strategy (AUTO, FIXED, SMALLEST)

Confidence score: 4/5

  • This PR introduces a complex but well-architected optimization with proper abstraction boundaries and comprehensive testing
  • Score reflects the sophisticated nature of join optimizations which could have subtle performance implications, though the implementation appears sound with proper resource management and thread safety
  • Pay close attention to LazyCompiledCondition for proper AST resource cleanup and the join strategy heuristics in GpuHashJoin.scala for correctness across different join types

Sequence Diagram

sequenceDiagram
    participant User
    participant GpuHashJoin as "GpuHashJoin"
    participant JoinStrategy as "JoinStrategy"
    participant LazyCompiledCondition as "LazyCompiledCondition"
    participant JoinImpl as "JoinImpl"
    participant BaseHashJoinIterator as "BaseHashJoinIterator"

    User->>GpuHashJoin: "Execute join query"
    GpuHashJoin->>GpuHashJoin: "Parse join configuration"
    GpuHashJoin->>JoinStrategy: "selectStrategy(configuredStrategy, joinType, hasCondition, buildSideSelection, leftRowCount, rightRowCount)"
    
    alt Strategy is AUTO
        JoinStrategy->>JoinStrategy: "Apply heuristics for conditional joins"
        JoinStrategy->>JoinStrategy: "Apply heuristics for build side selection"
        JoinStrategy-->>GpuHashJoin: "Return INNER_HASH_WITH_POST or HASH_ONLY"
    else Strategy is not AUTO
        JoinStrategy-->>GpuHashJoin: "Return configured strategy"
    end

    alt Has AST condition
        GpuHashJoin->>LazyCompiledCondition: "new LazyCompiledCondition(condition, numLeftColumns, numRightColumns)"
        LazyCompiledCondition-->>GpuHashJoin: "LazyCompiledCondition instance"
    end

    GpuHashJoin->>BaseHashJoinIterator: "Create join iterator with strategy and options"
    BaseHashJoinIterator->>BaseHashJoinIterator: "Begin processing stream batches"

    loop For each stream batch
        BaseHashJoinIterator->>JoinStrategy: "Apply effective strategy"
        
        alt Strategy is INNER_HASH_WITH_POST
            BaseHashJoinIterator->>JoinImpl: "innerHashJoin(leftKeys, rightKeys, buildSideSelection)"
            JoinImpl->>JoinImpl: "selectPhysicalBuildSide(buildSideSelection, leftRowCount, rightRowCount)"
            JoinImpl->>JoinImpl: "Perform inner hash join with selected build side"
            JoinImpl-->>BaseHashJoinIterator: "Return inner join gather maps"
            
            alt Has AST condition
                BaseHashJoinIterator->>LazyCompiledCondition: "getForBuildSide(selectedBuildSide)"
                LazyCompiledCondition->>LazyCompiledCondition: "Compile AST for selected build side"
                LazyCompiledCondition-->>BaseHashJoinIterator: "Return compiled condition"
                BaseHashJoinIterator->>JoinImpl: "filterInnerJoinWithAST(innerMaps, condition)"
                JoinImpl-->>BaseHashJoinIterator: "Return filtered gather maps"
            end
            
            BaseHashJoinIterator->>BaseHashJoinIterator: "convertInnerJoinMapsToTargetType(innerMaps, targetJoinType)"
        else Strategy is HASH_ONLY
            BaseHashJoinIterator->>JoinImpl: "Perform traditional hash join for target type"
            JoinImpl-->>BaseHashJoinIterator: "Return gather maps"
        end

        BaseHashJoinIterator->>BaseHashJoinIterator: "Create join gatherer with results"
        BaseHashJoinIterator-->>User: "Return columnar batch result"
    end
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (3)

  1. sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuHashJoin.scala, line 888-890 (link)

    logic: Logic assumes left < right means left should be build side, but for LEFT OUTER the comment correctly notes data movement build side must be right (GpuBuildRight). Verify this heuristic doesn't conflict with join semantics - LEFT OUTER needs right side preserved. Have you verified this heuristic works correctly for all LEFT OUTER join scenarios, particularly when the left side is much smaller but semantically must preserve all left rows?

  2. sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuHashJoin.scala, line 1697-1707 (link)

    logic: Dynamic build side selection for InnerLike conditional joins uses different LazyCompiledCondition methods (getForBuildLeft vs getForBuildRight) based on selected build side. The AST compilation depends on table order - verify the condition compilation matches the selected physical build side. How is it ensured that the LazyCompiledCondition provides the correctly compiled AST for the dynamically selected build side?

  3. sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuHashJoin.scala, line 2052-2064 (link)

    logic: This inner sub-join selection reuses the same dynamic build side logic but with cudfBuildSide as the plan build side. The cudfBuildSide is derived from subJoinType, not the original build side - ensure this mapping is correct for all join type transformations.

6 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile


override def close(): Unit = {
synchronized {
if (compiledBuildRight != null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could safeClose these.

@revans2
Copy link
Collaborator Author

revans2 commented Jan 2, 2026

I ran the benchmarks and it is showing a slight regression on NDS (about 2.4 seconds). I am going to put this back in draft until I can figure out a slightly better heuristic. There are clearly cases when this wins, and cases when it loses. The losses outweigh the wins by a little bit, so I need to spend some more time on it.

Comparing join_swapping_test-results/join_swapping_test test against join_swapping_baseline-results/join_swapping_baseline test
query1: Baseline (1580.00 ms) vs Test (1673.00 ms) Diff -93 E2E 0.94x
query2: Baseline (1398.20 ms) vs Test (1234.20 ms) Diff 164 E2E 1.13x
query3: Baseline (388.00 ms) vs Test (375.20 ms) Diff 12 E2E 1.03x
query4: Baseline (4459.00 ms) vs Test (4361.20 ms) Diff 97 E2E 1.02x
query5: Baseline (2123.00 ms) vs Test (2178.20 ms) Diff -55 E2E 0.97x
query6: Baseline (746.20 ms) vs Test (751.60 ms) Diff -5 E2E 0.99x
query7: Baseline (3142.40 ms) vs Test (2881.20 ms) Diff 261 E2E 1.09x
query8: Baseline (739.00 ms) vs Test (835.80 ms) Diff -96 E2E 0.88x
query9: Baseline (1791.20 ms) vs Test (1832.80 ms) Diff -41 E2E 0.98x
query10: Baseline (1441.60 ms) vs Test (1361.20 ms) Diff 80 E2E 1.06x
query11: Baseline (2864.20 ms) vs Test (2854.60 ms) Diff 9 E2E 1.00x
query12: Baseline (501.60 ms) vs Test (527.60 ms) Diff -26 E2E 0.95x
query13: Baseline (1160.20 ms) vs Test (1524.40 ms) Diff -364 E2E 0.76x
query14_part1: Baseline (5078.00 ms) vs Test (5543.80 ms) Diff -465 E2E 0.92x
query14_part2: Baseline (4407.20 ms) vs Test (4424.20 ms) Diff -17 E2E 1.00x
query15: Baseline (971.20 ms) vs Test (1211.40 ms) Diff -240 E2E 0.80x
query16: Baseline (3325.00 ms) vs Test (3339.00 ms) Diff -14 E2E 1.00x
query17: Baseline (1559.20 ms) vs Test (1550.40 ms) Diff 8 E2E 1.01x
query18: Baseline (1579.00 ms) vs Test (1768.60 ms) Diff -189 E2E 0.89x
query19: Baseline (1236.80 ms) vs Test (1167.60 ms) Diff 69 E2E 1.06x
query20: Baseline (481.00 ms) vs Test (555.60 ms) Diff -74 E2E 0.87x
query21: Baseline (541.60 ms) vs Test (592.00 ms) Diff -50 E2E 0.91x
query22: Baseline (1026.80 ms) vs Test (1061.00 ms) Diff -34 E2E 0.97x
query23_part1: Baseline (5268.80 ms) vs Test (5279.60 ms) Diff -10 E2E 1.00x
query23_part2: Baseline (6009.60 ms) vs Test (5744.80 ms) Diff 264 E2E 1.05x
query24_part1: Baseline (6368.00 ms) vs Test (6381.40 ms) Diff -13 E2E 1.00x
query24_part2: Baseline (6305.00 ms) vs Test (6576.00 ms) Diff -271 E2E 0.96x
query25: Baseline (1920.40 ms) vs Test (1940.40 ms) Diff -20 E2E 0.99x
query26: Baseline (850.60 ms) vs Test (685.40 ms) Diff 165 E2E 1.24x
query27: Baseline (987.60 ms) vs Test (1163.00 ms) Diff -175 E2E 0.85x
query28: Baseline (4042.60 ms) vs Test (4276.40 ms) Diff -233 E2E 0.95x
query29: Baseline (2595.00 ms) vs Test (2577.00 ms) Diff 18 E2E 1.01x
query30: Baseline (1680.80 ms) vs Test (1883.40 ms) Diff -202 E2E 0.89x
query31: Baseline (1585.40 ms) vs Test (1638.20 ms) Diff -52 E2E 0.97x
query32: Baseline (903.20 ms) vs Test (989.80 ms) Diff -86 E2E 0.91x
query33: Baseline (1063.20 ms) vs Test (1144.80 ms) Diff -81 E2E 0.93x
query34: Baseline (1886.60 ms) vs Test (1828.60 ms) Diff 58 E2E 1.03x
query35: Baseline (1639.60 ms) vs Test (1701.80 ms) Diff -62 E2E 0.96x
query36: Baseline (1016.00 ms) vs Test (1184.60 ms) Diff -168 E2E 0.86x
query37: Baseline (638.60 ms) vs Test (532.40 ms) Diff 106 E2E 1.20x
query38: Baseline (1859.20 ms) vs Test (1869.20 ms) Diff -10 E2E 0.99x
query39_part1: Baseline (1773.80 ms) vs Test (1863.40 ms) Diff -89 E2E 0.95x
query39_part2: Baseline (1268.80 ms) vs Test (1260.40 ms) Diff 8 E2E 1.01x
query40: Baseline (1122.80 ms) vs Test (1091.00 ms) Diff 31 E2E 1.03x
query41: Baseline (271.20 ms) vs Test (282.40 ms) Diff -11 E2E 0.96x
query42: Baseline (316.20 ms) vs Test (314.00 ms) Diff 2 E2E 1.01x
query43: Baseline (832.40 ms) vs Test (814.00 ms) Diff 18 E2E 1.02x
query44: Baseline (658.20 ms) vs Test (682.20 ms) Diff -24 E2E 0.96x
query45: Baseline (1071.60 ms) vs Test (1126.40 ms) Diff -54 E2E 0.95x
query46: Baseline (1288.20 ms) vs Test (1468.40 ms) Diff -180 E2E 0.88x
query47: Baseline (1498.00 ms) vs Test (1460.40 ms) Diff 37 E2E 1.03x
query48: Baseline (873.40 ms) vs Test (856.60 ms) Diff 16 E2E 1.02x
query49: Baseline (1791.20 ms) vs Test (1792.40 ms) Diff -1 E2E 1.00x
query50: Baseline (7568.00 ms) vs Test (7879.60 ms) Diff -311 E2E 0.96x
query51: Baseline (1550.20 ms) vs Test (1747.40 ms) Diff -197 E2E 0.89x
query52: Baseline (461.40 ms) vs Test (476.00 ms) Diff -14 E2E 0.97x
query53: Baseline (604.60 ms) vs Test (605.60 ms) Diff -1 E2E 1.00x
query54: Baseline (1336.00 ms) vs Test (1262.20 ms) Diff 73 E2E 1.06x
query55: Baseline (383.20 ms) vs Test (388.40 ms) Diff -5 E2E 0.99x
query56: Baseline (818.00 ms) vs Test (757.80 ms) Diff 60 E2E 1.08x
query57: Baseline (1227.00 ms) vs Test (1349.00 ms) Diff -122 E2E 0.91x
query58: Baseline (836.40 ms) vs Test (824.80 ms) Diff 11 E2E 1.01x
query59: Baseline (1880.00 ms) vs Test (1776.60 ms) Diff 103 E2E 1.06x
query60: Baseline (1178.20 ms) vs Test (1136.80 ms) Diff 41 E2E 1.04x
query61: Baseline (1222.20 ms) vs Test (1157.00 ms) Diff 65 E2E 1.06x
query62: Baseline (1086.00 ms) vs Test (1096.20 ms) Diff -10 E2E 0.99x
query63: Baseline (793.40 ms) vs Test (791.40 ms) Diff 2 E2E 1.00x
query64: Baseline (13639.60 ms) vs Test (13787.00 ms) Diff -147 E2E 0.99x
query65: Baseline (3066.00 ms) vs Test (3089.00 ms) Diff -23 E2E 0.99x
query66: Baseline (2238.80 ms) vs Test (2251.60 ms) Diff -12 E2E 0.99x
query67: Baseline (17925.40 ms) vs Test (17465.40 ms) Diff 460 E2E 1.03x
query68: Baseline (1115.40 ms) vs Test (1086.40 ms) Diff 29 E2E 1.03x
query69: Baseline (1161.80 ms) vs Test (1117.00 ms) Diff 44 E2E 1.04x
query70: Baseline (1471.20 ms) vs Test (1202.20 ms) Diff 269 E2E 1.22x
query71: Baseline (3123.60 ms) vs Test (3088.40 ms) Diff 35 E2E 1.01x
query72: Baseline (2549.80 ms) vs Test (2476.60 ms) Diff 73 E2E 1.03x
query73: Baseline (992.60 ms) vs Test (933.60 ms) Diff 59 E2E 1.06x
query74: Baseline (2261.40 ms) vs Test (2264.60 ms) Diff -3 E2E 1.00x
query75: Baseline (6718.40 ms) vs Test (6599.00 ms) Diff 119 E2E 1.02x
query76: Baseline (1655.60 ms) vs Test (1571.40 ms) Diff 84 E2E 1.05x
query77: Baseline (947.80 ms) vs Test (968.80 ms) Diff -21 E2E 0.98x
query78: Baseline (8303.20 ms) vs Test (8396.40 ms) Diff -93 E2E 0.99x
query79: Baseline (816.00 ms) vs Test (802.40 ms) Diff 13 E2E 1.02x
query80: Baseline (3833.80 ms) vs Test (3750.80 ms) Diff 83 E2E 1.02x
query81: Baseline (2065.80 ms) vs Test (2116.60 ms) Diff -50 E2E 0.98x
query82: Baseline (664.60 ms) vs Test (603.00 ms) Diff 61 E2E 1.10x
query83: Baseline (680.80 ms) vs Test (661.80 ms) Diff 19 E2E 1.03x
query84: Baseline (1102.60 ms) vs Test (1124.00 ms) Diff -21 E2E 0.98x
query85: Baseline (1506.00 ms) vs Test (1478.80 ms) Diff 27 E2E 1.02x
query86: Baseline (951.00 ms) vs Test (971.00 ms) Diff -20 E2E 0.98x
query87: Baseline (1784.60 ms) vs Test (1744.00 ms) Diff 40 E2E 1.02x
query88: Baseline (3237.80 ms) vs Test (3191.60 ms) Diff 46 E2E 1.01x
query89: Baseline (999.80 ms) vs Test (1036.60 ms) Diff -36 E2E 0.96x
query90: Baseline (572.80 ms) vs Test (600.00 ms) Diff -27 E2E 0.95x
query91: Baseline (1079.00 ms) vs Test (922.60 ms) Diff 156 E2E 1.17x
query92: Baseline (534.40 ms) vs Test (516.40 ms) Diff 18 E2E 1.03x
query93: Baseline (9811.80 ms) vs Test (9918.00 ms) Diff -106 E2E 0.99x
query94: Baseline (3879.00 ms) vs Test (3973.00 ms) Diff -94 E2E 0.98x
query95: Baseline (4931.80 ms) vs Test (5338.20 ms) Diff -406 E2E 0.92x
query96: Baseline (4804.60 ms) vs Test (4901.80 ms) Diff -97 E2E 0.98x
query97: Baseline (1733.20 ms) vs Test (1846.40 ms) Diff -113 E2E 0.94x
query98: Baseline (1238.20 ms) vs Test (1385.00 ms) Diff -146 E2E 0.89x
query99: Baseline (1349.20 ms) vs Test (1342.80 ms) Diff 6 E2E 1.00x
benchmark: Baseline (238400.00 ms) vs Test (240800.00 ms) Diff -2400 E2E 0.99x

@revans2 revans2 marked this pull request as draft January 2, 2026 17:28

trait GpuHashJoin extends GpuJoinExec {
/**
* The build side determines which side of the join is materialized/buffered/broadcast.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is helpful, thank you

Copy link
Collaborator

@abellina abellina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good to me, pending your analysis of the performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants