Prevent potential GPU OOM in R2C with split retry #14073

thirtiseven · 2025-12-31T02:19:17Z

Description

#13842 added retry for R2C convert to prevent Host OOM. Following on from that, this pr aimed to add split and retry for GPU OOM when copying the converted results to GPU.

This PR:

Introduced HostColumnarBatchWithRowRange, a wrapper for host columns that supports logical slicing without copying underlying host memory. This allows splitting a large host batch into smaller chunks for transfer.
Updated GpuRowToColumnarExec to use split and retry when a GPU OOM occurs during transfer.

Note that when we split a batch into two halves, we can't free them until both halves are processed. So, the first half just borrows the data, but we pass the ownership of the host columns to the second half. This ensures the host memory stays alive exactly as long as needed and is freed when the last split is closed.

Checklists

This PR has added documentation for new or modified features or behaviors.
This PR has added new tests or modified existing tests to cover new code paths.
(Please explain in the PR description how the new code paths are tested, such as names of the new/existing tests that cover them.)
Performance testing has been performed and its results are added in the PR description. Or, an issue has been filed with a link in the PR description.

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven · 2025-12-31T02:19:56Z

@greptile full review

Copilot

Pull request overview

This PR adds split retry functionality to Row-to-Columnar (R2C) conversion to prevent GPU out-of-memory (OOM) errors by implementing a mechanism to split batches in half when GPU OOM occurs during host-to-GPU data transfer.

Key Changes

Introduced a new HostColumnarBatchWithRowRange class that wraps host columns with row range tracking and supports splitting
Modified RowToColumnarIterator to use split retry logic for GPU OOM scenarios, allowing single input batches to produce multiple output batches
Added buildHostColumnsWithoutOwnership() method to transfer ownership of host columns to the retry framework

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
`sql-plugin/src/main/scala/com/nvidia/spark/rapids/HostColumnarBatchWithRowRange.scala`	New class that wraps host columns with row range support, implements splitting logic for GPU OOM retry, and handles slicing of various column types (LIST, STRUCT, STRING, fixed-width)
`sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuRowToColumnarExec.scala`	Updated `RowToColumnarIterator` to build host columns separately, use split retry for TargetSize goals, and maintain a pending batch iterator for split outputs
`sql-plugin/src/main/java/com/nvidia/spark/rapids/GpuColumnVector.java`	Added `buildHostColumnsWithoutOwnership()` method to transfer host column ownership to caller
`tests/src/test/scala/com/nvidia/spark/rapids/RowToColumnarIteratorRetrySuite.scala`	Added comprehensive test coverage for GPU OOM split retry scenarios including single batch requirement enforcement, multiple batch production, multiple consecutive splits, and single row edge case

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

greptile-apps · 2025-12-31T02:31:47Z

Greptile Summary

This PR adds GPU OOM handling with split-and-retry to Row-to-Columnar conversion, building on previous work that addressed Host OOM. The implementation introduces HostColumnarBatchWithRowRange, a wrapper that enables logical slicing of host column data without copying underlying memory, allowing large batches to be split into smaller chunks for GPU transfer when OOM occurs.

Key changes:

New HostColumnarBatchWithRowRange class supports logical row-range slicing with reference counting to manage memory lifecycle across splits
Modified RowToColumnarIterator to handle split outputs via pendingBatchIter, ensuring only one split transfers to GPU at a time
Integrated with existing RmmRapidsRetryIterator framework for split-and-retry logic
RequireSingleBatch mode correctly throws when splits are needed (cannot satisfy constraint)
Comprehensive test coverage including single/multiple splits, edge cases, and empty schemas

Implementation highlights:

Reference counting ensures host columns remain valid until all splits are processed
Complex slicing logic handles LIST, STRUCT, STRING, and fixed-width types with proper offset normalization
Split policy divides batches in half recursively until GPU transfer succeeds or reaches single-row limit

Confidence Score: 4/5

This PR is safe to merge with minimal risk
The implementation properly handles GPU OOM with split-and-retry logic, uses correct reference counting for memory management, includes comprehensive test coverage for edge cases, and follows existing retry framework patterns. The code is well-structured with clear separation of concerns. Minor deduction for the complexity of manual reference counting which requires careful review to ensure correctness
Pay close attention to HostColumnarBatchWithRowRange.scala for reference counting correctness and sliceHostColumn implementation for nested types (LIST, STRUCT)

Important Files Changed

Filename	Overview
sql-plugin/src/main/scala/com/nvidia/spark/rapids/HostColumnarBatchWithRowRange.scala	New class for logical slicing of host columns with row ranges; implements complex slicing logic for LIST, STRUCT, STRING, and fixed-width types; uses reference counting for memory management across splits
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuRowToColumnarExec.scala	Modified `RowToColumnarIterator` to support GPU OOM split-and-retry; added `pendingBatchIter` to handle multiple output batches from splits; integrated with `HostColumnarBatchWithRowRange` for splitting host-to-GPU transfers
tests/src/test/scala/com/nvidia/spark/rapids/RowToColumnarIteratorRetrySuite.scala	Added comprehensive test coverage for GPU OOM split-and-retry scenarios including: single split, multiple splits, single-row edge case, empty schema, and `RequireSingleBatch` constraint behavior

Sequence Diagram

sequenceDiagram
    participant RI as RowToColumnarIterator
    participant Builder as GpuColumnarBatchBuilder
    participant HB as HostColumnarBatchWithRowRange
    participant Retry as RmmRapidsRetryIterator
    participant GPU as GPU Memory
    
    Note over RI: next() called
    RI->>Builder: buildHostColumns()
    Builder-->>RI: hostColumns (refCount=1)
    RI->>HB: new (incRefCount)
    Note over HB: refCount=2
    RI->>RI: withResource closes hostColumns
    Note over HB: refCount=1
    
    alt RequireSingleBatch
        RI->>Retry: withRetryNoSplit(hostBatch)
        Retry->>HB: copyToGpu()
        HB->>GPU: Allocate & Transfer
        alt GPU OOM
            GPU-->>HB: GpuSplitAndRetryOOM
            Note over Retry: Cannot split with RequireSingleBatch
            Retry-->>RI: Throw exception
        else Success
            GPU-->>HB: ColumnarBatch
            HB-->>Retry: ColumnarBatch
            Retry-->>RI: ColumnarBatch
        end
    else TargetSize (allows splits)
        RI->>Retry: withRetry(hostBatch, splitPolicy)
        Retry->>HB: copyToGpu()
        HB->>GPU: Allocate & Transfer
        alt GPU OOM - Split Required
            GPU-->>HB: GpuSplitAndRetryOOM
            Retry->>HB: splitInHalf(batch)
            HB->>HB: withResource closes input batch
            Note over HB: refCount: 1 -> 0 (for old) + 2 (for splits) -> 2
            HB-->>Retry: Seq[split1, split2]
            Note over Retry: Process split1 first
            Retry->>HB: split1.copyToGpu()
            HB->>GPU: Allocate smaller chunk
            GPU-->>HB: ColumnarBatch1
            HB-->>Retry: ColumnarBatch1
            Retry-->>RI: Iterator with ColumnarBatch1
            RI->>RI: Store remaining splits in pendingBatchIter
            Note over RI: Next next() will process split2
        else Success
            GPU-->>HB: ColumnarBatch
            HB-->>Retry: ColumnarBatch
            Retry-->>RI: Iterator with ColumnarBatch
        end
    end
    
    RI->>RI: recordOutput(batch)
    RI-->>RI: Update metrics & refine targetRows

greptile-apps

Additional Comments (1)

sql-plugin/src/main/scala/com/nvidia/spark/rapids/HostColumnarBatchWithRowRange.scala, line 176-184 (link)

logic: Workaround condition may not cover all cases. If dataLen == 0 but nullCount > 0 (empty strings with some null values), the code goes to the else branch and creates a 0-length slice, which could trigger the same copyToDevice issue mentioned in the comment.

_{4 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven · 2025-12-31T07:45:05Z

build

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven · 2025-12-31T10:44:44Z

build

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

sql-plugin/src/main/scala/com/nvidia/spark/rapids/HostColumnarBatchWithRowRange.scala

tests/src/test/scala/com/nvidia/spark/rapids/RowToColumnarIteratorRetrySuite.scala

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuRowToColumnarExec.scala

sql-plugin/src/main/scala/com/nvidia/spark/rapids/HostColumnarBatchWithRowRange.scala

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

greptile-apps

Additional Comments (1)

sql-plugin/src/main/scala/com/nvidia/spark/rapids/HostColumnarBatchWithRowRange.scala, line 178-188 (link)

style: Check that this empty string workaround handles all edge cases correctly. The 1-byte allocation when dataLen == 0 && nullCount == 0 works around copyToDevice behavior, but verify it handles: (1) all strings in slice are empty (not null), (2) mixed empty strings at different positions. This edge case could cause issues with certain data distributions.

_{4 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

greptile-apps

Additional Comments (1)

sql-plugin/src/main/scala/com/nvidia/spark/rapids/HostColumnarBatchWithRowRange.scala, line 180 (link)

syntax: typo: "Existsing" should be "Existing"

_{4 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

greptile-apps

Additional Comments (1)

sql-plugin/src/main/scala/com/nvidia/spark/rapids/HostColumnarBatchWithRowRange.scala, line 178-186 (link)

style: Consider if allocating a 1-byte dummy buffer for empty strings when dataLen == 0 but numRows > nullCount is necessary, or if cudf would accept null here to save allocations.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

_{4 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

firestarman · 2026-01-05T07:43:17Z

sql-plugin/src/main/java/com/nvidia/spark/rapids/GpuColumnVector.java

+     * Build host columns and transfer ownership to the caller. The builder will not close
+     * the returned host columns.
+     */
+    public HostColumnVector[] buildHostColumnsWithoutOwnership() {


Seems buildHostColumns already plays the same behavior, so is this duplicated ?

Yes this can be removed if we use the refCount approach.

firestarman · 2026-01-05T07:49:17Z

sql-plugin/src/main/java/com/nvidia/spark/rapids/GpuColumnVector.java

+        hostColumns = buildHostColumns();
+      }
+      HostColumnVector[] ret = hostColumns;
+      hostColumns = null;


Setting hostColumns to null will lead to trigger the call to "buildHostColumns" again when calling this "buildHostColumnsWithoutOwnership" API again, then an expcetion will be thrown out.

You can increase the refCount for buffer lifecycle management, then no need ths like WithoutOwnership.

firestarman · 2026-01-05T08:29:44Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/HostColumnarBatchWithRowRange.scala

+    val startRow: Int,
+    val numRows: Int,
+    val dataTypes: Array[DataType],
+    private var ownsHostColumns: Boolean) extends AutoCloseable {


This is a little confused. I would suggest leveraging refCount instead of this boolean for buffers lifecycle management.

👍 Good idea, thanks. Updated to refCount.

firestarman · 2026-01-05T08:32:06Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/HostColumnarBatchWithRowRange.scala

+   * the last returned split. After calling this method, this instance no longer owns
+   * the columns and closing it will not free them.
+   */
+  def splitInHalf(): Seq[HostColumnarBatchWithRowRange] = {


This would be better to be a static method, i mean moving it to the ojbect HostColumnarBatchWithRowRange, along with refCount support, you can do split like

val hcb = XXX // hcb is a HostColumnarBatchWithRowRange (Maybe we can name it shorter as "SlicedHostColumnarBatch") withReource(hcb) { _ => HostColumnarBatchWithRowRange.splitInHalf(hcb) }

or even in a simpler way

withReource(hcb)(HostColumnarBatchWithRowRange.splitInHalf)

firestarman · 2026-01-06T03:08:04Z

@greptileai can you review it again ?

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

greptile-apps

Additional Comments (1)

sql-plugin/src/main/scala/com/nvidia/spark/rapids/HostColumnarBatchWithRowRange.scala, line 180-188 (link)

style: Edge case handling for empty strings is correct: when dataLen == 0 but there are non-null rows (numRows > nullCount), allocates 1-byte buffer to satisfy cuDF requirements. However, consider validating that this 1-byte allocation doesn't cause issues if we're in a tight memory situation (though this is unlikely given the small size).

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

_{4 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

Split retry for R2C

6989cb9

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven requested a review from Copilot December 31, 2025 02:19

Copilot started reviewing on behalf of thirtiseven December 31, 2025 02:19 View session

Copilot AI reviewed Dec 31, 2025

View reviewed changes

greptile-apps bot reviewed Dec 31, 2025

View reviewed changes

style

5be1b56

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

fix ut

9c8eb60

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven requested a review from Copilot December 31, 2025 12:37

Copilot started reviewing on behalf of thirtiseven December 31, 2025 12:38 View session

Copilot AI reviewed Dec 31, 2025

View reviewed changes

sameerz added the bug Something isn't working label Jan 4, 2026

thirtiseven added 2 commits January 4, 2026 15:41

address comments, copyrights

78759bd

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

refactor

579df1f

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven self-assigned this Jan 5, 2026

thirtiseven marked this pull request as ready for review January 5, 2026 07:28

greptile-apps bot reviewed Jan 5, 2026

View reviewed changes

wjxiz1992 changed the title ~~Prevent potiential GPU OOM in R2C with split retry~~ Prevent potential GPU OOM in R2C with split retry Jan 5, 2026

ai comment address

53f2257

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

greptile-apps bot reviewed Jan 5, 2026

View reviewed changes

typo

3af43ab

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

greptile-apps bot reviewed Jan 5, 2026

View reviewed changes

firestarman reviewed Jan 5, 2026

View reviewed changes

address comments

4241572

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

greptile-apps bot reviewed Jan 6, 2026

View reviewed changes

revert copyright

b0c659c

thirtiseven requested a review from firestarman January 6, 2026 07:53

Prevent potential GPU OOM in R2C with split retry #14073

Are you sure you want to change the base?

Prevent potential GPU OOM in R2C with split retry #14073

Conversation

thirtiseven commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklists

Uh oh!

thirtiseven commented Dec 31, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

greptile-apps bot commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional Comments (1)

Uh oh!

thirtiseven commented Dec 31, 2025

Uh oh!

thirtiseven commented Dec 31, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional Comments (1)

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional Comments (1)

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional Comments (1)

Uh oh!

firestarman Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

thirtiseven Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

firestarman Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

firestarman Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

firestarman Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

thirtiseven Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

firestarman Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thirtiseven Jan 6, 2026

thirtiseven commented Dec 31, 2025 •

edited

Loading

greptile-apps bot commented Dec 31, 2025 •

edited

Loading

greptile-apps bot left a comment •

edited

Loading

greptile-apps bot left a comment •

edited

Loading

greptile-apps bot left a comment •

edited

Loading

greptile-apps bot left a comment •

edited

Loading

firestarman Jan 5, 2026 •

edited

Loading

greptile-apps bot left a comment •

edited

Loading