Skip to content

Conversation

@revans2
Copy link
Collaborator

@revans2 revans2 commented Dec 12, 2025

This is step 3 in splitting #13368 into smaller pieces

Description

This adds in basic GPU/CPU bridge functionality, but it is off by default because the performance would not be good without the thread pool and optimizer.

Checklists

  • This PR has added documentation for new or modified features or behaviors.
  • This PR has added new tests or modified existing tests to cover new code paths.
    (Please explain in the PR description how the new code paths are tested, such as names of the new/existing tests that cover them.)
  • Performance testing has been performed and its results are added in the PR description. Or, an issue has been filed with a link in the PR description.

The performance is expected to be bad so it is off by default an not tested. I did add some basic tests to verify that the code works.

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 12, 2025

Greptile Overview

Greptile Summary

This PR implements basic GPU/CPU bridge functionality that enables CPU expression evaluation within GPU execution plans. The feature is disabled by default (spark.rapids.sql.expression.cpuBridge.enabled=false) as noted in the PR description, since full performance optimization will come in future PRs.

Key Changes:

  • New GpuCpuBridgeExpression that transfers data GPU→Host→CPU→Host→GPU for CPU expression evaluation
  • Code generation support via BridgeGenerateUnsafeProjection with interpreted fallback for ~940 lines
  • Bridge optimizer in RapidsMeta that automatically wraps incompatible expressions (~300 lines of changes)
  • Comprehensive test coverage with GpuCpuBridgeSuite and BridgeUnsafeProjectionSuite (1500+ test lines)
  • New metrics for tracking bridge processing and wait times
  • Proper resource management with ThreadLocal projections and task completion cleanup

Architecture:
The bridge sits between GPU and CPU execution by: evaluating GPU input expressions → copying to host → running CPU expression → copying result back to GPU. The implementation includes deduplication of GPU inputs using semantic equality to minimize data transfers.

Safety Considerations:

  • Feature is off by default with explicit configuration required
  • Proper resource cleanup via task completion hooks
  • ThreadLocal usage prevents conflicts across threads
  • Comprehensive null handling and type support
  • Excludes non-deterministic and unevaluable expressions from bridge

Confidence Score: 4/5

  • Safe to merge - feature is disabled by default and has comprehensive test coverage
  • Strong implementation with proper resource management and extensive testing. Score is 4/5 rather than 5/5 due to: (1) large amount of new code (~3000 lines) requiring careful runtime validation, (2) complex ThreadLocal usage that needs production verification, (3) bridge optimizer modifying expression trees which could have edge cases, and (4) acknowledged performance concerns that will be addressed in future PRs
  • Pay close attention to RapidsMeta.scala (complex optimizer logic with AST interaction) and BridgeGenerateUnsafeProjection.scala (large codegen module)

Important Files Changed

File Analysis

Filename Score Overview
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuCpuBridgeExpression.scala 5/5 New GPU/CPU bridge expression implementation - enables CPU expression evaluation within GPU plans with proper resource management and metrics
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/BridgeGenerateUnsafeProjection.scala 4/5 Large code generation module (~940 lines) for optimized bridge projections with codegen and interpreted fallback support
sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsMeta.scala 4/5 Extensive changes (~300 lines) adding bridge optimization logic, AST interaction handling, and expression tree conversion support
tests/src/test/scala/org/apache/spark/sql/rapids/BridgeUnsafeProjectionSuite.scala 5/5 Comprehensive tests (~1373 lines) for projection correctness across all data types

Sequence Diagram

sequenceDiagram
    participant Plan as SparkPlan
    participant Bridge as GpuCpuBridgeExpression
    participant GPU as GPU Memory
    participant Host as Host Memory
    participant CPU as CPU Expression
    participant Builder as RapidsHostColumnBuilder
    
    Plan->>Bridge: columnarEval(batch)
    Note over Bridge: Start wait time tracking
    
    Bridge->>Bridge: Evaluate GPU input expressions
    Bridge->>GPU: Get GPU column data
    GPU-->>Bridge: GPU columns
    
    Bridge->>Bridge: Create ColumnarBatch with GPU columns
    Note over Bridge: Start processing time tracking
    
    Bridge->>Host: ColumnarToRowIterator (GPU→Host)
    Note over Host: Data copied to host memory
    
    Host->>CPU: Iterate rows through projection
    loop For each row
        CPU->>CPU: Evaluate CPU expression
        CPU->>Builder: Append result to builder
    end
    
    Builder->>Host: Build host column
    Host->>GPU: buildAndPutOnDevice() (Host→GPU)
    Note over GPU: Result copied back to GPU
    
    GPU-->>Bridge: GPU result column
    Note over Bridge: End processing time
    Bridge-->>Plan: GpuColumnVector
    Note over Bridge: End wait time
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuCpuBridgeExpression.scala, line 112-113 (link)

    style: Consider cleanup for thread-local state

    The ThreadLocal for projections is never explicitly removed. In long-running applications or thread pools, consider adding cleanup logic, though Spark's task model may naturally handle this through task completion.

11 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds basic GPU/CPU bridge operation functionality that enables CPU expressions to run while keeping the overall execution plan on the GPU. The bridge is disabled by default since performance optimizations (thread pool, optimizer) are not yet implemented. This represents step 3 in splitting a larger feature into smaller, reviewable pieces.

Key changes include:

  • New GpuCpuBridgeExpression that wraps CPU expressions and manages data transfer between GPU and CPU
  • Code generation support via BridgeGenerateUnsafeProjection for efficient columnar-to-row conversion
  • Configuration options to enable/disable the bridge and maintain a disallow list
  • Comprehensive test coverage for various data types and nested structures

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
BridgeUnsafeProjectionSuite.scala Comprehensive test suite covering all data types, nested structures, and edge cases for the bridge projection functionality
GpuCpuBridgeSuite.scala Unit tests for bridge expression properties, configuration, and compatibility checking
BridgeGenerateUnsafeProjection.scala Code generation implementation for efficient bridge projections with fallback to interpreted mode
GpuCpuBridgeExpression.scala Main bridge expression implementation handling GPU-to-CPU-to-GPU data flow with metrics
RapidsMeta.scala Metadata support for bridge expressions including compatibility checks and conversion logic
RapidsConf.scala Configuration options for enabling bridge and maintaining disallow list
TypeConverter.java Extracted public interface for type conversion (visibility change from Scala to Java)
GpuRowToColumnarExec.scala Visibility changes to expose TypeConverter and related methods
GpuTypeShims.scala (both versions) Import reorganization to use the new public TypeConverter interface
GpuMetrics.scala New metrics for tracking CPU bridge processing and wait times

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@revans2 revans2 changed the title Add in basic GPU/CPU bridge operation Add in basic GPU/CPU bridge operation [databricks] Dec 12, 2025
@revans2
Copy link
Collaborator Author

revans2 commented Dec 12, 2025

build

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuCpuBridgeExpression.scala, line 54 (link)

    syntax: Comment says "Only GPU inputs are children" but the code includes cpuExpression in children. This creates inconsistency with the comment on line 42 that says cpuExpression is "not included as children"

11 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@revans2
Copy link
Collaborator Author

revans2 commented Dec 15, 2025

CI failed because of #14009 but our CI currently has not way to turn off the databricks tests when you touched something even remotely related to databricks.

@revans2
Copy link
Collaborator Author

revans2 commented Dec 15, 2025

build

@revans2
Copy link
Collaborator Author

revans2 commented Dec 15, 2025

I upmerged to make sure that I had the latest fixes for what caused CI to fail

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

11 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
@revans2
Copy link
Collaborator Author

revans2 commented Dec 15, 2025

build

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@revans2
Copy link
Collaborator Author

revans2 commented Dec 15, 2025

build

@sameerz sameerz added the task Work required that improves the product but is not user facing label Dec 19, 2025
/**
* Append row value to the column builder and return the number of data bytes written
*/
public abstract double append(SpecializedGetters row, int column, RapidsHostColumnBuilder builder);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice to add a comment about the return type here (double). Why can't the type be for a whole number?

@@ -0,0 +1,43 @@
/*
* Copyright (c) 2025, NVIDIA CORPORATION.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copyrights need updating.

* ahead of time. Also because structs push nulls down to the children this size should
* assume a validity even if the schema says it cannot be null.
*/
public abstract double getNullSize();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question, why double?

val gpuInputColumns = gpuInputs.safeMap(_.columnarEval(batch))

// Time the CPU processing (columnar->row->CPU expr->columnar)
val processingStartTime = System.nanoTime()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, we could use GpuMetric.ns(metric) { }, but it doesn't take an optional metric.

// Configuration Tests
// ============================================================================

test("Bridge config controls feature enablement") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is probably not necessary. What I think we need is testing the fallback, either in a suite or integration test.

(Float.MaxValue, false),
(0.0f, true), // null
(3.14159f, false),
(Float.NaN, false),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also test negative floats and NaN ranges? I know we teted this in the past (SparkQueryCompareTestSuite) has float positive/negative nan lower/upper values.

(Double.MaxValue, false),
(0.0, true), // null
(3.141592653589793, false),
(Double.NaN, false),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question about nan ranges and sign.

(0L, false), // 1970-01-01 00:00:00 UTC
(-1000000L, false), // Before epoch
(0L, true), // null
(1672531200000000L, false) // 2023-01-01 00:00:00 UTC
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about h/m/s and ms?

timezoneCheck()
}
// Update our expressions to allow them to run with the bridge if possible

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit extra space

)
}

test("multiple expressions") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add DayTimeIntervalType and YearMonthIntervalType and possibly an unsupported type?


val numVarLenFields = exprSchemas.count {
case Schema(dt, _) => !UnsafeRow.isFixedLength(dt)
// TODO: consider large decimal and interval type
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a follow on for this TODO?

}

val writeFieldsCode = if (isTopLevel && (row == null || ctx.currentVars != null)) {
// TODO: support whole stage codegen
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow on for this TODO?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

task Work required that improves the product but is not user facing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants