[SPARK-56199][CORE] Read fallback storage blocks asynchronously and multithread#55003
Open
EnricoMi wants to merge 8 commits intoapache:masterfrom
Open
[SPARK-56199][CORE] Read fallback storage blocks asynchronously and multithread#55003EnricoMi wants to merge 8 commits intoapache:masterfrom
EnricoMi wants to merge 8 commits intoapache:masterfrom
Conversation
Removes "Fast fail when failed to get fallback storage blocks" as fallback storage blocks are fetched concurrently and fast fail is not guaranteed any more.
This blocks cleanup() calling fallbackStorageReadPool.shutdownNow() while futures are locking results to put FailureFetchResults. Otherwise, that put in catch clauses would be interrupted and that exception kills the executor. Logging errors only if !isZombie, meaning the iterator is not yet cleaning up. Further, FailureFetchResult are putFirst to stop iteration as quickly as possible. Finally, fallbackStorageReadPool.shutdownNow() is only called in cleanup().
Contributor
Author
|
@attilapiros you might be interested in this given your comment #45228 (comment) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This changes how
ShuffleBlockFetcherIteratorfetches blocks from Fallback Storage. It now treats fallback storage blocks separate from local and more like remote blocks. It considers these blocks when updatingrequestsInFlightandbytesInFlight, so that not too many bytes are hold in memory while iterating over fallback storage blocks. Blocks are read asynchronously and multi-threaded.Why are the changes needed?
Currently, fallback storage blocks are treated by
ShuffleBlockFetcherIteratoras local blocks, and only on an exception code path (when the block cannot be found as a local block), it is read from fallback storage. Fallback storage blocks are read one-by-one (single-threaded) and they block reading local blocks (synchronously).Does this PR introduce any user-facing change?
No
How was this patch tested?
Existing tests are updated accordingly.
Was this patch authored or co-authored using generative AI tooling?
No.
Note: This includes #54268