support withRetry with split for shuffle exchange exec base#13975
support withRetry with split for shuffle exchange exec base#13975zpuller merged 19 commits intoNVIDIA:mainfrom
Conversation
Signed-off-by: Zach Puller <zpuller@nvidia.com>
|
build |
1 similar comment
|
build |
Signed-off-by: Zach Puller <zpuller@nvidia.com>
Signed-off-by: Zach Puller <zpuller@nvidia.com> Signed-off-by: Zach Puller <zpuller@nvidia.com>
Signed-off-by: Zach Puller <zpuller@nvidia.com>
|
build |
Signed-off-by: Zach Puller <zpuller@nvidia.com>
|
build |
1 similar comment
|
build |
Signed-off-by: Zach Puller <zpuller@nvidia.com>
|
build |
|
build |
Signed-off-by: Zach Puller <zpuller@nvidia.com>
|
build |
Greptile Summary
Important Files Changed
Confidence score: 4/5
Sequence DiagramsequenceDiagram
participant User
participant GpuShuffleExchangeExecBase as "GpuShuffleExchangeExecBase"
participant prepareBatchShuffleDependency as "prepareBatchShuffleDependency"
participant RddWithPartitionIds as "rddWithPartitionIds"
participant withRetry as "withRetry"
participant GpuPartitioning as "GpuPartitioning"
participant RmmSpark as "RmmSpark"
User->>GpuShuffleExchangeExecBase: "execute shuffle"
GpuShuffleExchangeExecBase->>prepareBatchShuffleDependency: "create shuffle dependency"
prepareBatchShuffleDependency->>RddWithPartitionIds: "create partitioned RDD"
RddWithPartitionIds->>RddWithPartitionIds: "iterate batches"
RddWithPartitionIds->>withRetry: "withRetry(spillableBatch, splitSpillableInHalfByRows)"
withRetry->>GpuPartitioning: "columnarEvalAny(batch)"
alt Success Case
GpuPartitioning-->>withRetry: "partitioned data"
withRetry-->>RddWithPartitionIds: "Array[(ColumnarBatch, Int)]"
else OOM Exception
GpuPartitioning->>RmmSpark: "throw GpuSplitAndRetryOOM"
RmmSpark-->>withRetry: "OOM exception"
withRetry->>withRetry: "splitSpillableInHalfByRows(spillableBatch)"
withRetry->>GpuPartitioning: "columnarEvalAny(splitBatch1)"
GpuPartitioning-->>withRetry: "partitioned data 1"
withRetry->>GpuPartitioning: "columnarEvalAny(splitBatch2)"
GpuPartitioning-->>withRetry: "partitioned data 2"
withRetry-->>RddWithPartitionIds: "Array[(ColumnarBatch, Int)] (split results)"
end
RddWithPartitionIds-->>prepareBatchShuffleDependency: "Product2[Int, ColumnarBatch]"
prepareBatchShuffleDependency-->>GpuShuffleExchangeExecBase: "ShuffleDependency"
GpuShuffleExchangeExecBase-->>User: "shuffled data"
|
abellina
left a comment
There was a problem hiding this comment.
I'd like us to add a bit the test, unless I've missed it.
| s"Expected at least one split retry, but saw $retryCount retries") | ||
|
|
||
| // Verify batch contents match expected data (even after split retry) | ||
| verifyBatchContents(allPartitionedBatches, totalRowsSeen, serializer, |
There was a problem hiding this comment.
I think we should verify that the number of partitioned arrays seen is double given the split, than without the split.
There was a problem hiding this comment.
To me trying to verify this behavior in an integration test is that is not overloading the memory system is not going to work. We have to have a unit test of some kind and we can use the RMM injection code to verify that we are doing the right thing.
There was a problem hiding this comment.
Added a comment here at the site where we could be doing verification, without looking at the matrics: https://github.com/NVIDIA/spark-rapids/pull/13975/files#r2631405357
Signed-off-by: Zach Puller <zpuller@nvidia.com>
abellina
left a comment
There was a problem hiding this comment.
I don't really think we need a new metric: NUM_PARTITIONED_ARRAYS
We already have NUM_OUTPUT_BATCHES https://github.com/NVIDIA/spark-rapids/pull/13975/files#diff-2519047533f3504238e111c11b9d2a55903af1dfd89ea01be31233f15315dc01R445.
In this case we'll output more batches, which would be a way to check in the tests.
Signed-off-by: Zach Puller <zpuller@nvidia.com>
Signed-off-by: Zach Puller <zpuller@nvidia.com>
There was a problem hiding this comment.
Additional Comments (1)
-
tests/src/test/scala/com/nvidia/spark/rapids/GpuKudoWritePartitioningSuite.scala, line 314 (link)style: Redundant cast -
intValis already of typeIntegerNote: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
6 files reviewed, 1 comment
Signed-off-by: Zach Puller <zpuller@nvidia.com>
There was a problem hiding this comment.
Additional Comments (1)
-
tests/src/test/scala/com/nvidia/spark/rapids/GpuKudoWritePartitioningSuite.scala, line 467-470 (link)style: Verify that the test actually triggers the split retry path as intended. The test injects OOM on the first
next()call and validatesnumNextCalls == 3(one batch splits into 2, plus 1 unsplit batch). Check that the retry count assertion on line 476 consistently passes across different environments.
6 files reviewed, 1 comment
abellina
left a comment
There was a problem hiding this comment.
I would like us to improve the suite further
Signed-off-by: Zach Puller <zpuller@nvidia.com>
Signed-off-by: Zach Puller <zpuller@nvidia.com>
Signed-off-by: Zach Puller <zpuller@nvidia.com>
Signed-off-by: Zach Puller <zpuller@nvidia.com>
|
build |
2 similar comments
|
build |
|
build |
Fixes #13951 (along with #13975) ### Description Use `AutoCloseableTargetSize` to create a split policy which would halve the target size of an incoming batch when doing a split retry. Performance testing so far shows no significant change for low memory scenarios in NDS. I'll continue with perf testing while the PR is under review. ### Checklists - [ ] This PR has added documentation for new or modified features or behaviors. - [x] This PR has added new tests or modified existing tests to cover new code paths. (Please explain in the PR description how the new code paths are tested, such as names of the new/existing tests that cover them.) - [x] Performance testing has been performed and its results are added in the PR description. Or, an issue has been filed with a link in the PR description. --------- Signed-off-by: Zach Puller <zpuller@nvidia.com>
Fixes #13951 (along with #14010)
Description
We convert the internal state of
prepareBatchShuffleDependencyinrddWithPartitionIdsinto a spliterator to support split retires. This happens a level above where we call into the partitioner which does the gpu kudo serialization.Performance testing so far shows no significant change for low memory scenarios in NDS. I'll continue with perf testing while the PR is under review.
Checklists
(Please explain in the PR description how the new code paths are tested, such as names of the new/existing tests that cover them.)