-
Notifications
You must be signed in to change notification settings - Fork 268
Prevent potential GPU OOM in R2C with split retry #14073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Prevent potential GPU OOM in R2C with split retry #14073
Conversation
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
|
@greptile full review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds split retry functionality to Row-to-Columnar (R2C) conversion to prevent GPU out-of-memory (OOM) errors by implementing a mechanism to split batches in half when GPU OOM occurs during host-to-GPU data transfer.
Key Changes
- Introduced a new
HostColumnarBatchWithRowRangeclass that wraps host columns with row range tracking and supports splitting - Modified
RowToColumnarIteratorto use split retry logic for GPU OOM scenarios, allowing single input batches to produce multiple output batches - Added
buildHostColumnsWithoutOwnership()method to transfer ownership of host columns to the retry framework
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
sql-plugin/src/main/scala/com/nvidia/spark/rapids/HostColumnarBatchWithRowRange.scala |
New class that wraps host columns with row range support, implements splitting logic for GPU OOM retry, and handles slicing of various column types (LIST, STRUCT, STRING, fixed-width) |
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuRowToColumnarExec.scala |
Updated RowToColumnarIterator to build host columns separately, use split retry for TargetSize goals, and maintain a pending batch iterator for split outputs |
sql-plugin/src/main/java/com/nvidia/spark/rapids/GpuColumnVector.java |
Added buildHostColumnsWithoutOwnership() method to transfer host column ownership to caller |
tests/src/test/scala/com/nvidia/spark/rapids/RowToColumnarIteratorRetrySuite.scala |
Added comprehensive test coverage for GPU OOM split retry scenarios including single batch requirement enforcement, multiple batch production, multiple consecutive splits, and single row edge case |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Greptile SummaryThis PR adds GPU OOM handling with split-and-retry to Row-to-Columnar conversion, building on previous work that addressed Host OOM. The implementation introduces Key changes:
Implementation highlights:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant RI as RowToColumnarIterator
participant Builder as GpuColumnarBatchBuilder
participant HB as HostColumnarBatchWithRowRange
participant Retry as RmmRapidsRetryIterator
participant GPU as GPU Memory
Note over RI: next() called
RI->>Builder: buildHostColumns()
Builder-->>RI: hostColumns (refCount=1)
RI->>HB: new (incRefCount)
Note over HB: refCount=2
RI->>RI: withResource closes hostColumns
Note over HB: refCount=1
alt RequireSingleBatch
RI->>Retry: withRetryNoSplit(hostBatch)
Retry->>HB: copyToGpu()
HB->>GPU: Allocate & Transfer
alt GPU OOM
GPU-->>HB: GpuSplitAndRetryOOM
Note over Retry: Cannot split with RequireSingleBatch
Retry-->>RI: Throw exception
else Success
GPU-->>HB: ColumnarBatch
HB-->>Retry: ColumnarBatch
Retry-->>RI: ColumnarBatch
end
else TargetSize (allows splits)
RI->>Retry: withRetry(hostBatch, splitPolicy)
Retry->>HB: copyToGpu()
HB->>GPU: Allocate & Transfer
alt GPU OOM - Split Required
GPU-->>HB: GpuSplitAndRetryOOM
Retry->>HB: splitInHalf(batch)
HB->>HB: withResource closes input batch
Note over HB: refCount: 1 -> 0 (for old) + 2 (for splits) -> 2
HB-->>Retry: Seq[split1, split2]
Note over Retry: Process split1 first
Retry->>HB: split1.copyToGpu()
HB->>GPU: Allocate smaller chunk
GPU-->>HB: ColumnarBatch1
HB-->>Retry: ColumnarBatch1
Retry-->>RI: Iterator with ColumnarBatch1
RI->>RI: Store remaining splits in pendingBatchIter
Note over RI: Next next() will process split2
else Success
GPU-->>HB: ColumnarBatch
HB-->>Retry: ColumnarBatch
Retry-->>RI: Iterator with ColumnarBatch
end
end
RI->>RI: recordOutput(batch)
RI-->>RI: Update metrics & refine targetRows
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (1)
-
sql-plugin/src/main/scala/com/nvidia/spark/rapids/HostColumnarBatchWithRowRange.scala, line 176-184 (link)logic: Workaround condition may not cover all cases. If
dataLen == 0butnullCount > 0(empty strings with some null values), the code goes to the else branch and creates a 0-length slice, which could trigger the samecopyToDeviceissue mentioned in the comment.
4 files reviewed, 1 comment
|
build |
|
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/HostColumnarBatchWithRowRange.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/HostColumnarBatchWithRowRange.scala
Outdated
Show resolved
Hide resolved
tests/src/test/scala/com/nvidia/spark/rapids/RowToColumnarIteratorRetrySuite.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuRowToColumnarExec.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/HostColumnarBatchWithRowRange.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (1)
-
sql-plugin/src/main/scala/com/nvidia/spark/rapids/HostColumnarBatchWithRowRange.scala, line 178-188 (link)style: Check that this empty string workaround handles all edge cases correctly. The 1-byte allocation when
dataLen == 0 && nullCount == 0works aroundcopyToDevicebehavior, but verify it handles: (1) all strings in slice are empty (not null), (2) mixed empty strings at different positions. This edge case could cause issues with certain data distributions.
4 files reviewed, 1 comment
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (1)
-
sql-plugin/src/main/scala/com/nvidia/spark/rapids/HostColumnarBatchWithRowRange.scala, line 180 (link)syntax: typo: "Existsing" should be "Existing"
4 files reviewed, 1 comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (1)
-
sql-plugin/src/main/scala/com/nvidia/spark/rapids/HostColumnarBatchWithRowRange.scala, line 178-186 (link)style: Consider if allocating a 1-byte dummy buffer for empty strings when
dataLen == 0butnumRows > nullCountis necessary, or if cudf would acceptnullhere to save allocations.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
4 files reviewed, 1 comment
| * Build host columns and transfer ownership to the caller. The builder will not close | ||
| * the returned host columns. | ||
| */ | ||
| public HostColumnVector[] buildHostColumnsWithoutOwnership() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems buildHostColumns already plays the same behavior, so is this duplicated ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this can be removed if we use the refCount approach.
| hostColumns = buildHostColumns(); | ||
| } | ||
| HostColumnVector[] ret = hostColumns; | ||
| hostColumns = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting hostColumns to null will lead to trigger the call to "buildHostColumns" again when calling this "buildHostColumnsWithoutOwnership" API again, then an expcetion will be thrown out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can increase the refCount for buffer lifecycle management, then no need ths like WithoutOwnership.
| val startRow: Int, | ||
| val numRows: Int, | ||
| val dataTypes: Array[DataType], | ||
| private var ownsHostColumns: Boolean) extends AutoCloseable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little confused. I would suggest leveraging refCount instead of this boolean for buffers lifecycle management.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Good idea, thanks. Updated to refCount.
| * the last returned split. After calling this method, this instance no longer owns | ||
| * the columns and closing it will not free them. | ||
| */ | ||
| def splitInHalf(): Seq[HostColumnarBatchWithRowRange] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be better to be a static method, i mean moving it to the ojbect HostColumnarBatchWithRowRange, along with refCount support, you can do split like
val hcb = XXX // hcb is a HostColumnarBatchWithRowRange (Maybe we can name it shorter as "SlicedHostColumnarBatch")
withReource(hcb) { _ =>
HostColumnarBatchWithRowRange.splitInHalf(hcb)
}
or even in a simpler way
withReource(hcb)(HostColumnarBatchWithRowRange.splitInHalf)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
|
@greptileai can you review it again ? |
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (1)
-
sql-plugin/src/main/scala/com/nvidia/spark/rapids/HostColumnarBatchWithRowRange.scala, line 180-188 (link)style: Edge case handling for empty strings is correct: when
dataLen == 0but there are non-null rows (numRows > nullCount), allocates 1-byte buffer to satisfy cuDF requirements. However, consider validating that this 1-byte allocation doesn't cause issues if we're in a tight memory situation (though this is unlikely given the small size).Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
4 files reviewed, 1 comment
Fixes #14018
Description
#13842 added retry for R2C
convertto prevent Host OOM. Following on from that, this pr aimed to add split and retry for GPU OOM when copying the converted results to GPU.This PR:
HostColumnarBatchWithRowRange, a wrapper for host columns that supports logical slicing without copying underlying host memory. This allows splitting a large host batch into smaller chunks for transfer.GpuRowToColumnarExecto use split and retry when a GPU OOM occurs during transfer.Note that when we split a batch into two halves, we can't free them until both halves are processed. So, the first half just borrows the data, but we pass the ownership of the host columns to the second half. This ensures the host memory stays alive exactly as long as needed and is freed when the last split is closed.
Checklists
(Please explain in the PR description how the new code paths are tested, such as names of the new/existing tests that cover them.)