Skip to content

perf(arrow-ipc): Avoid copies and write dictionary batches directly to writers when possible#10128

Draft
JakeDern wants to merge 19 commits into
apache:mainfrom
JakeDern:ipc-writer-collect-dicts
Draft

perf(arrow-ipc): Avoid copies and write dictionary batches directly to writers when possible#10128
JakeDern wants to merge 19 commits into
apache:mainfrom
JakeDern:ipc-writer-collect-dicts

Conversation

@JakeDern

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

This is a follow on to #10044, applying basically the same optimization for dictionary batches.

This needs to wait for #10122 before merge.

What changes are included in this PR?

  • Add a gather step for collecting dictionary buffers and then write them direct to the final writer
  • Unify the write code path for dictionaries with the one for record batches as they're basically the same
  • Update a couple of function names to more accurately reflect their purpose

Are these changes tested?

Yes, existing unit tests should cover the change.

Are there any user-facing changes?

No.

@github-actions github-actions Bot added the arrow Changes to the arrow crate label Jun 11, 2026
@JakeDern

Copy link
Copy Markdown
Contributor Author

Pretty good improvement - ~42% for the dictionary case and ~20% for delta dictionary cases. Not 100% sure why less improvement on the delta side yet, but I think this is worth it to take on its own and can investigate further later.

Perf results from #10122:

➜  arrow-ipc git:(ipc-writer-dict-benches) cargo bench (StreamWriter|FileWriter)/write_10 --features zstd
zsh: no matches found: (StreamWriter|FileWriter)/write_10
➜  arrow-ipc git:(ipc-writer-dict-benches) cargo bench "(StreamWriter|FileWriter)/write_10" --features zstd
    Finished `bench` profile [optimized] target(s) in 0.07s
     Running benches/ipc_reader.rs (/home/jakedern/repos/arrow-rs/target/release/deps/ipc_reader-a1b491f58c77bb6a)
     Running benches/ipc_writer.rs (/home/jakedern/repos/arrow-rs/target/release/deps/ipc_writer-6612be2d7eba35b1)
Benchmarking arrow_ipc_stream_writer/StreamWriter/write_10: Collecting 100 samples in estimated 5.5019 s (50k iteratiarrow_ipc_stream_writer/StreamWriter/write_10
                        time:   [107.53 µs 108.06 µs 108.61 µs]
                        change: [−2.9828% −0.9112% +0.7341%] (p = 0.39 > 0.05)
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
Benchmarking arrow_ipc_stream_writer/StreamWriter/write_10/zstd: Collecting 100 samples in estimated 5.0248 s (1100 iarrow_ipc_stream_writer/StreamWriter/write_10/zstd
                        time:   [4.5765 ms 4.6054 ms 4.6355 ms]
                        change: [−0.7831% +0.1488% +1.0639%] (p = 0.75 > 0.05)
                        No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
Benchmarking arrow_ipc_stream_writer/FileWriter/write_10: Collecting 100 samples in estimated 5.3861 s (50k iterationarrow_ipc_stream_writer/FileWriter/write_10
                        time:   [106.14 µs 106.82 µs 107.54 µs]
                        change: [+1.1887% +2.7126% +4.6164%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  6 (6.00%) high mild
  6 (6.00%) high severe
Benchmarking arrow_ipc_stream_writer/StreamWriter/write_10/dict: Collecting 100 samples in estimated 5.2009 s (71k itarrow_ipc_stream_writer/StreamWriter/write_10/dict
                        time:   [60.775 µs 62.004 µs 63.440 µs]
                        change: [−6.4822% −3.5063% −0.6010%] (p = 0.03 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
Benchmarking arrow_ipc_stream_writer/StreamWriter/write_10/dict/delta: Collecting 100 samples in estimated 5.0870 s (arrow_ipc_stream_writer/StreamWriter/write_10/dict/delta
                        time:   [128.47 µs 129.73 µs 130.88 µs]
                        change: [−1.8693% −0.0642% +1.7216%] (p = 0.95 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
Benchmarking arrow_ipc_stream_writer/FileWriter/write_10/dict/delta: Collecting 100 samples in estimated 5.5440 s (45arrow_ipc_stream_writer/FileWriter/write_10/dict/delta
                        time:   [130.29 µs 131.33 µs 132.26 µs]
                        change: [+1.8877% +2.8406% +3.8001%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe

➜  arrow-ipc git:(ipc-writer-dict-benches)

perf results from this branch:

➜  arrow-ipc git:(ipc-writer-collect-dicts) ✗ cargo bench "(StreamWriter|FileWriter)/write_10" --features zstd
   Compiling arrow-ipc v59.0.0 (/home/jakedern/repos/arrow-rs/arrow-ipc)
    Finished `bench` profile [optimized] target(s) in 2.55s
     Running benches/ipc_reader.rs (/home/jakedern/repos/arrow-rs/target/release/deps/ipc_reader-a1b491f58c77bb6a)
     Running benches/ipc_writer.rs (/home/jakedern/repos/arrow-rs/target/release/deps/ipc_writer-6612be2d7eba35b1)
Benchmarking arrow_ipc_stream_writer/StreamWriter/write_10: Collecting 100 samples in estimated 5.3935 s (50k iteratiarrow_ipc_stream_writer/StreamWriter/write_10
                        time:   [106.95 µs 107.85 µs 108.76 µs]
                        change: [−2.2269% −1.1032% −0.0394%] (p = 0.06 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
Benchmarking arrow_ipc_stream_writer/StreamWriter/write_10/zstd: Collecting 100 samples in estimated 5.0249 s (1100 iarrow_ipc_stream_writer/StreamWriter/write_10/zstd
                        time:   [4.5629 ms 4.5901 ms 4.6184 ms]
                        change: [−1.1939% −0.3327% +0.5704%] (p = 0.47 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
Benchmarking arrow_ipc_stream_writer/FileWriter/write_10: Collecting 100 samples in estimated 5.0247 s (45k iterationarrow_ipc_stream_writer/FileWriter/write_10
                        time:   [109.86 µs 110.45 µs 111.11 µs]
                        change: [+0.3417% +2.0979% +3.7750%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
Benchmarking arrow_ipc_stream_writer/StreamWriter/write_10/dict: Collecting 100 samples in estimated 5.0650 s (136k iarrow_ipc_stream_writer/StreamWriter/write_10/dict
                        time:   [37.300 µs 37.543 µs 37.807 µs]
                        change: [−43.963% −42.283% −40.548%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe
Benchmarking arrow_ipc_stream_writer/StreamWriter/write_10/dict/delta: Collecting 100 samples in estimated 5.4418 s (arrow_ipc_stream_writer/StreamWriter/write_10/dict/delta
                        time:   [103.36 µs 104.28 µs 105.18 µs]
                        change: [−19.764% −18.621% −17.508%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe
Benchmarking arrow_ipc_stream_writer/FileWriter/write_10/dict/delta: Collecting 100 samples in estimated 5.4730 s (56arrow_ipc_stream_writer/FileWriter/write_10/dict/delta
                        time:   [104.84 µs 105.65 µs 106.44 µs]
                        change: [−20.651% −20.021% −19.377%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild

@JakeDern

Copy link
Copy Markdown
Contributor Author

CC: @alamb and @Rich-T-kid - I think we got pretty good results here! I also tried to clean up a few things here and there where I could like removing some unnecessary parameter drilling.

This has the benchmarks from #10122 as well, will rebase once that goes in.

@Rich-T-kid

Copy link
Copy Markdown
Contributor

I can take a look at this early next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

arrow-ipc: Reduce writer allocations for dictionary batches

2 participants