Skip to content

Remove BufReader wrapper when copying spill files to final shuffle output #3834

@andygrove

Description

@andygrove

Description

In MultiPartitionShuffleRepartitioner::shuffle_write(), spill files are copied to the final shuffle output using BufReader:

let mut spill_file = BufReader::new(File::open(spill_path)?);
std::io::copy(&mut spill_file, &mut output_data)?;

The BufReader wrapper is counterproductive here because:

  1. std::io::copy already uses an internal buffer for the copy
  2. On Linux, std::io::copy with raw File-to-File can use copy_file_range / sendfile for kernel zero-copy, but wrapping in BufReader defeats this specialization since the source is no longer a File

Proposed Change

Remove the BufReader wrapper and pass the raw File handle directly:

let mut spill_file = File::open(spill_path)?;
std::io::copy(&mut spill_file, &mut output_data)?;

This is a one-line change in native/shuffle/src/partitioners/multi_partition.rs (in the shuffle_write method).

Note: the same optimization was already applied to the new ImmediateShufflePartitioner.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions