-
Notifications
You must be signed in to change notification settings - Fork 296
Open
Labels
Milestone
Description
Description
In MultiPartitionShuffleRepartitioner::shuffle_write(), spill files are copied to the final shuffle output using BufReader:
let mut spill_file = BufReader::new(File::open(spill_path)?);
std::io::copy(&mut spill_file, &mut output_data)?;The BufReader wrapper is counterproductive here because:
std::io::copyalready uses an internal buffer for the copy- On Linux,
std::io::copywith rawFile-to-Filecan usecopy_file_range/sendfilefor kernel zero-copy, but wrapping inBufReaderdefeats this specialization since the source is no longer aFile
Proposed Change
Remove the BufReader wrapper and pass the raw File handle directly:
let mut spill_file = File::open(spill_path)?;
std::io::copy(&mut spill_file, &mut output_data)?;This is a one-line change in native/shuffle/src/partitioners/multi_partition.rs (in the shuffle_write method).
Note: the same optimization was already applied to the new ImmediateShufflePartitioner.
Reactions are currently unavailable