-
Notifications
You must be signed in to change notification settings - Fork 296
Description
What is the problem the feature request solves?
Following on from the refactors to move shuffle code into a separate crate in #3749 and #3772, and the new standalone shuffle benchmark binary in #3752, I propose that we start evaluating some different shuffle approaches to better understand the trade offs in terms of throughput and memory usage of different approaches.
This is a very open-ended epic. There are some ideas that I am planning on experimenting with, such as moving away from the current interleave batches approach, which keeps batches buffered for a long time, and moving to a more immediate scatter approach that removes the need to buffer the input batches for such a long time. I expect other contributors may also have ideas that they would like to explore.
There is a already a trait for the shuffle partition-and-write operation, so it should be trivial to support different implementations behind a config for testing and evaluation, and allow the standalone benchmark to compare performance characteristics across implementations.
Describe the potential solution
No response
Additional context
No response