[EPIC] Experiment with some different shuffle implementations

### What is the problem the feature request solves?

Following on from the refactors to move shuffle code into a separate crate in https://github.com/apache/datafusion-comet/pull/3749 and https://github.com/apache/datafusion-comet/pull/3772, and the new standalone shuffle benchmark binary in https://github.com/apache/datafusion-comet/pull/3752, I propose that we start evaluating some different shuffle approaches to better understand the trade offs in terms of throughput and memory usage of different approaches.

This is a very open-ended epic. There are some ideas that I am planning on experimenting with, such as moving away from the current interleave batches approach, which keeps batches buffered for a long time, and moving to a more immediate scatter approach that removes the need to buffer the input batches for such a long time. I expect other contributors may also have ideas that they would like to explore.

There is a already a trait for the shuffle partition-and-write operation, so it should be trivial to support different implementations behind a config for testing and evaluation, and allow the standalone benchmark to compare performance characteristics across implementations.

### Describe the potential solution

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EPIC] Experiment with some different shuffle implementations #3778

What is the problem the feature request solves?

Describe the potential solution

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[EPIC] Experiment with some different shuffle implementations #3778

Description

What is the problem the feature request solves?

Describe the potential solution

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions