Reduce copies in Arrow IPC writer by Rich-T-kid · Pull Request #10044 · apache/arrow-rs

Rich-T-kid · 2026-06-01T19:27:04Z

Which issue does this PR close?

Closes Optimize arrow-ipc #10029.
A document that provides a bit of context

Rationale for this change

Compression is the most compute and memory intensive part of the arrow-ipc encoding pipeline. It runs per buffer, not per record batch. For a Flight stream of 10 batches with 5 primitive arrays each, that is 100 compression calls minimum, more for string and struct arrays. Each of those calls produced an owned compressed Vec that was then copied a second time into a flat arrow_data accumulator before being written to the output. For the uncompressed path the situation was the same: Arc-backed buffer slices that required no compression were still copied into that accumulator unnecessarily.

Separately, the original write_message() function flushed after every dictionary and every record batch, causing repeated small OS write calls per batch. ( for non vector backed writer implementations )
The goal was to eliminate both problems: stop copying buffers that do not need to be copied, and stop flushing on every message.

What changes are included in this PR?

Introduced EncodedBuffer, an enum that wraps either a raw Arc-backed Buffer for the uncompressed path or an owned Vec for the compressed path, so both can be held in a uniform collection without an extra copy into a flat accumulator
Changed write_array_data to push EncodedBuffer segments instead of copying bytes into arrow_data
FileWriter and StreamWriter both now call write_batch_direct(), eliminating the flush-per-message behavior and the intermediate copy on the hot path

Are these changes tested?

These changes are intended to be completely seamless. I didn't write new unit test for the code as nothing externally changed. all test still pass

benchmarks

[main -> cargo bench --bench ipc_writer -- "StreamWriter/write_10$" --sample-size 100]
[my branch -> cargo bench --bench ipc_writer -- "StreamWriter/write_10$" --sample-size 100 ]

[main -> cargo bench --bench ipc_writer -- --sample-size 1000]
[my branch -> cargo bench --bench ipc_writer -- --sample-size 1000]

Are there any user-facing changes?

no

gabotechs

This is looking pretty good. Good job! left some comments mainly directed towards exploring more reuse and bringing a bit more clarity to this file, let me know if you have other ideas.

gabotechs · 2026-06-03T14:20:29Z

        }

-        let (encoded_dictionaries, encoded_message) = self.data_gen.encode(
+        let (dict_sizes, (meta, data)) = self.data_gen.write_batch_direct(


The two last (meta, data) fields returned by write_batch_direct are named (aligned_size, body_len) in that function.

Is this correct? not sure if this is just a naming thing, but it's hard to know if this is correct given the different naming. Is meta == aligned_size and data == body_len? they sound like completely different things.

I updated the struct & variable names to hopefully make this clearer.

gabotechs · 2026-06-03T16:23:59Z

+    /// each buffer is compressed into a per-buffer scratch `Vec<u8>` and written from
+    /// there, eliminating the extra copy that `write_buffer` -> `arrow_data` ->
+    /// `write_body_buffers` would otherwise incur.
+    fn write_batch_direct<W: Write>(


I see most contents of this function are essentially copy-pastes from record_batch_to_bytes, duplication seems too much here. Is there any chance to:

Completely replace record_batch_to_bytes and keep just a single function for writing batches in IPC format

Factoring out some ergonomic helpers that could be reused in both functions?

Also, it seems like the .encode() method and the new .write_batch_direct() are both doing the same thing with slightly different ergonomics. Do you see any opportunity to collapse them into just 1 method?

This file is overall pretty bloated with complex logic and a relatively arbitrary separation of concerns between methods, the more we can do for debloating it the better it will be for future maintainers.

This makes sense to me. the main issue is that FileWriter needs metadata while both StreamWriter and arrow-flight do not. Its better to not compute metadata the caller will not use but the slowdown should be negligible.

Rich-T-kid · 2026-06-03T17:50:51Z

Benchmark results from #10031

ran cargo bench --bench flight encode -- --sample-size 100

Rich-T-kid · 2026-06-03T17:53:05Z

going to look into why fixed/9182x1 regressed. Might just be noise

Rich-T-kid · 2026-06-03T18:49:45Z

I think its worth mentioning that no dictionary optimizations were made in the PR, could make to make that a follow up ticket.

Rich-T-kid · 2026-06-03T19:03:14Z

Ran benchmarks again and the results still look good. Test are passing for arrow-flight & arrow-ipc.
@alamb could you run the benchmarks for this PR on the CI bot when you get a chance? thank you!

alamb · 2026-06-04T10:11:54Z

run benchmark flight

adriangbot · 2026-06-04T10:14:11Z

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4621154654-431-g2mb7 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing rich-T-kid/optimize-arrow-ipc-copies (04e7992) to 97f4b14 (merge-base) diff
BENCH_NAME=flight
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench flight
BENCH_FILTER=
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-06-04T10:26:39Z

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

group                         main                                   rich-T-kid_optimize-arrow-ipc-copies
-----                         ----                                   ------------------------------------
encode/dict/65536x1           1.02    274.7±1.07µs   915.2 MB/sec    1.00    270.3±1.55µs   930.2 MB/sec
encode/dict/65536x8           1.30      5.5±0.04ms   365.9 MB/sec    1.00      4.2±0.06ms   474.4 MB/sec
encode/dict/8192x1            1.01     35.3±0.04µs   924.9 MB/sec    1.00     35.1±0.05µs   930.1 MB/sec
encode/dict/8192x8            1.10    315.3±1.66µs   828.7 MB/sec    1.00    286.1±2.01µs   913.3 MB/sec
encode/fixed/65536x1          1.00     10.0±0.02µs    48.8 GB/sec    1.00     10.0±0.04µs    49.1 GB/sec
encode/fixed/65536x8          1.01   1092.1±2.65µs     3.6 GB/sec    1.00   1076.9±5.83µs     3.6 GB/sec
encode/fixed/8192x1           1.03      3.2±0.01µs    19.3 GB/sec    1.00      3.1±0.02µs    19.8 GB/sec
encode/fixed/8192x8           1.07     17.5±0.04µs    27.9 GB/sec    1.00     16.4±0.07µs    29.8 GB/sec
encode/nested/65536x1         1.00     29.0±0.24µs    42.1 GB/sec    1.04     30.2±0.30µs    40.4 GB/sec
encode/nested/65536x8         1.12      2.5±0.08ms     3.9 GB/sec    1.00      2.2±0.13ms     4.4 GB/sec
encode/nested/8192x1          1.00      5.8±0.01µs    26.5 GB/sec    1.00      5.8±0.01µs    26.6 GB/sec
encode/nested/8192x8          1.01     46.3±0.10µs    26.4 GB/sec    1.00     45.9±0.56µs    26.6 GB/sec
encode/variable/65536x1       1.04     51.5±0.55µs    42.7 GB/sec    1.00     49.3±0.36µs    44.6 GB/sec
encode/variable/65536x8       1.24      5.7±0.09ms     3.1 GB/sec    1.00      4.6±0.06ms     3.8 GB/sec
encode/variable/8192x1        1.18      7.1±0.01µs    38.8 GB/sec    1.00      6.0±0.01µs    45.8 GB/sec
encode/variable/8192x8        1.32     83.3±1.96µs    26.4 GB/sec    1.00     63.3±0.20µs    34.7 GB/sec
roundtrip/dict/65536x1        1.01  1288.9±47.00µs   195.1 MB/sec    1.00  1279.9±51.20µs   196.4 MB/sec
roundtrip/dict/65536x8        1.00     14.4±0.55ms   139.8 MB/sec    1.01     14.5±0.53ms   138.4 MB/sec
roundtrip/dict/8192x1         1.00    206.6±5.60µs   158.1 MB/sec    1.00    206.0±5.73µs   158.6 MB/sec
roundtrip/dict/8192x8         1.00  1322.8±44.85µs   197.5 MB/sec    1.00  1324.0±44.79µs   197.3 MB/sec
roundtrip/fixed/65536x1       1.02    314.2±4.02µs  1591.6 MB/sec    1.00    307.7±4.21µs  1625.2 MB/sec
roundtrip/fixed/65536x8       1.00      2.2±0.03ms  1856.3 MB/sec    1.15      2.5±0.13ms  1608.9 MB/sec
roundtrip/fixed/8192x1        1.00     89.9±0.94µs   696.0 MB/sec    1.00     89.6±1.04µs   698.6 MB/sec
roundtrip/fixed/8192x8        1.01    333.8±3.84µs  1499.9 MB/sec    1.00    330.7±3.14µs  1514.2 MB/sec
roundtrip/nested/65536x1      1.00   859.6±40.22µs  1454.4 MB/sec    1.00   862.4±44.79µs  1449.6 MB/sec
roundtrip/nested/65536x8      1.01      8.6±0.36ms  1156.8 MB/sec    1.00      8.6±0.36ms  1168.2 MB/sec
roundtrip/nested/8192x1       1.01    159.3±5.82µs   981.9 MB/sec    1.00    157.6±5.86µs   993.0 MB/sec
roundtrip/nested/8192x8       1.02   925.9±41.73µs  1351.8 MB/sec    1.00   910.3±44.09µs  1374.9 MB/sec
roundtrip/variable/65536x1    1.00  1246.2±34.07µs  1805.6 MB/sec    1.54  1922.3±132.51µs  1170.5 MB/sec
roundtrip/variable/65536x8    1.17     16.1±0.50ms  1119.3 MB/sec    1.00     13.7±0.50ms  1313.5 MB/sec
roundtrip/variable/8192x1     1.00    204.0±5.40µs  1379.8 MB/sec    1.00    203.1±5.65µs  1385.4 MB/sec
roundtrip/variable/8192x8     1.00  1211.3±26.93µs  1858.7 MB/sec    1.57  1897.2±120.59µs  1186.7 MB/sec

Resource Usage

base (merge-base)

Metric	Value
Wall time	345.1s
Peak memory	3.4 GiB
Avg memory	3.4 GiB
CPU user	350.2s
CPU sys	74.8s
Peak spill	0 B

branch

Metric	Value
Wall time	335.1s
Peak memory	3.4 GiB
Avg memory	3.4 GiB
CPU user	335.9s
CPU sys	78.4s
Peak spill	0 B

File an issue against this benchmark runner

gabotechs · 2026-06-04T11:41:52Z

run benchmarks ipc_writer

adriangbot · 2026-06-04T11:45:38Z

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4621805130-432-xw4gv 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing rich-T-kid/optimize-arrow-ipc-copies (04e7992) to 97f4b14 (merge-base) diff
BENCH_NAME=ipc_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench ipc_writer
BENCH_FILTER=
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-06-04T11:46:59Z

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

group                                                 main                                   rich-T-kid_optimize-arrow-ipc-copies
-----                                                 ----                                   ------------------------------------
arrow_ipc_stream_writer/FileWriter/write_10           1.90    185.2±2.17µs        ? ?/sec    1.00     97.3±4.50µs        ? ?/sec
arrow_ipc_stream_writer/StreamWriter/write_10         1.94    185.1±1.96µs        ? ?/sec    1.00     95.4±4.87µs        ? ?/sec
arrow_ipc_stream_writer/StreamWriter/write_10/zstd    1.01      7.3±0.02ms        ? ?/sec    1.00      7.2±0.03ms        ? ?/sec

Resource Usage

base (merge-base)

Metric	Value
Wall time	30.0s
Peak memory	2.7 GiB
Avg memory	2.6 GiB
CPU user	27.5s
CPU sys	0.6s
Peak spill	0 B

branch

Metric	Value
Wall time	30.0s
Peak memory	2.7 GiB
Avg memory	2.6 GiB
CPU user	29.7s
CPU sys	0.1s
Peak spill	0 B

File an issue against this benchmark runner

Rich-T-kid · 2026-06-04T13:43:36Z

🤔 results look good, Im curious as to why two of the roundtrip benchmarks were slightly slower even thought encode() is faster across the board.

gabotechs · 2026-06-04T13:53:21Z

run benchmark flight

adriangbot · 2026-06-04T13:56:57Z

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4622787465-441-9sh6d 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing rich-T-kid/optimize-arrow-ipc-copies (04e7992) to 97f4b14 (merge-base) diff
BENCH_NAME=flight
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench flight
BENCH_FILTER=
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-06-04T14:09:30Z

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

group                         main                                   rich-T-kid_optimize-arrow-ipc-copies
-----                         ----                                   ------------------------------------
encode/dict/65536x1           1.01    271.4±1.78µs   926.3 MB/sec    1.00    267.9±0.66µs   938.6 MB/sec
encode/dict/65536x8           1.00      5.1±0.12ms   395.2 MB/sec    1.11      5.6±0.21ms   357.0 MB/sec
encode/dict/8192x1            1.04     36.4±0.04µs   896.4 MB/sec    1.00     35.0±0.03µs   932.9 MB/sec
encode/dict/8192x8            1.06    305.3±1.34µs   855.9 MB/sec    1.00    287.6±1.80µs   908.7 MB/sec
encode/fixed/65536x1          1.04     10.3±0.02µs    47.6 GB/sec    1.00      9.9±0.02µs    49.3 GB/sec
encode/fixed/65536x8          1.00   1084.9±6.70µs     3.6 GB/sec    1.01   1096.8±4.18µs     3.6 GB/sec
encode/fixed/8192x1           1.00      3.0±0.01µs    20.3 GB/sec    1.03      3.1±0.01µs    19.7 GB/sec
encode/fixed/8192x8           1.06     17.4±0.02µs    28.0 GB/sec    1.00     16.4±0.06µs    29.8 GB/sec
encode/nested/65536x1         1.29     38.5±0.19µs    31.7 GB/sec    1.00     29.8±0.58µs    41.0 GB/sec
encode/nested/65536x8         1.32      2.7±0.04ms     3.6 GB/sec    1.00      2.1±0.20ms     4.8 GB/sec
encode/nested/8192x1          1.01      5.8±0.01µs    26.2 GB/sec    1.00      5.8±0.02µs    26.4 GB/sec
encode/nested/8192x8          1.04     47.3±0.08µs    25.8 GB/sec    1.00     45.5±0.14µs    26.9 GB/sec
encode/variable/65536x1       1.63     80.5±0.47µs    27.3 GB/sec    1.00     49.3±0.34µs    44.6 GB/sec
encode/variable/65536x8       1.10      5.4±0.19ms     3.2 GB/sec    1.00      4.9±0.11ms     3.6 GB/sec
encode/variable/8192x1        1.81     10.7±0.01µs    25.7 GB/sec    1.00      5.9±0.01µs    46.6 GB/sec
encode/variable/8192x8        1.37     87.5±0.23µs    25.1 GB/sec    1.00     63.9±0.26µs    34.4 GB/sec
roundtrip/dict/65536x1        1.00  1319.9±43.70µs   190.5 MB/sec    1.00  1316.4±41.25µs   191.0 MB/sec
roundtrip/dict/65536x8        1.00     14.4±0.59ms   139.9 MB/sec    1.02     14.6±0.57ms   137.9 MB/sec
roundtrip/dict/8192x1         1.01    213.3±5.87µs   153.1 MB/sec    1.00    211.4±6.11µs   154.5 MB/sec
roundtrip/dict/8192x8         1.01  1346.6±44.16µs   194.0 MB/sec    1.00  1338.9±42.85µs   195.1 MB/sec
roundtrip/fixed/65536x1       1.00    318.9±4.39µs  1568.0 MB/sec    1.00    319.6±4.24µs  1564.8 MB/sec
roundtrip/fixed/65536x8       1.00      2.2±0.03ms  1821.4 MB/sec    1.19      2.6±0.16ms  1532.6 MB/sec
roundtrip/fixed/8192x1        1.00     93.5±1.03µs   669.7 MB/sec    1.01     94.6±1.56µs   661.8 MB/sec
roundtrip/fixed/8192x8        1.01    342.0±6.09µs  1464.3 MB/sec    1.00    338.7±4.37µs  1478.2 MB/sec
roundtrip/nested/65536x1      1.01   885.1±37.56µs  1412.4 MB/sec    1.00   879.6±39.35µs  1421.4 MB/sec
roundtrip/nested/65536x8      1.00     10.0±0.39ms  1002.2 MB/sec    1.02     10.2±0.43ms   979.4 MB/sec
roundtrip/nested/8192x1       1.01    163.0±5.56µs   960.0 MB/sec    1.00    161.1±5.13µs   971.1 MB/sec
roundtrip/nested/8192x8       1.00   927.1±39.57µs  1350.1 MB/sec    1.02   941.8±46.61µs  1329.0 MB/sec
roundtrip/variable/65536x1    1.00  1285.3±51.72µs  1750.7 MB/sec    1.49  1910.5±128.29µs  1177.8 MB/sec
roundtrip/variable/65536x8    1.00     14.8±0.74ms  1217.3 MB/sec    1.01     15.0±0.53ms  1199.3 MB/sec
roundtrip/variable/8192x1     1.00    210.4±5.49µs  1337.6 MB/sec    1.00    209.4±6.81µs  1344.0 MB/sec
roundtrip/variable/8192x8     1.00  1251.7±29.66µs  1798.6 MB/sec    1.53  1914.9±122.75µs  1175.7 MB/sec

Resource Usage

base (merge-base)

Metric	Value
Wall time	345.1s
Peak memory	3.4 GiB
Avg memory	3.4 GiB
CPU user	352.2s
CPU sys	71.6s
Peak spill	0 B

branch

Metric	Value
Wall time	340.1s
Peak memory	3.4 GiB
Avg memory	3.4 GiB
CPU user	333.1s
CPU sys	84.7s
Peak spill	0 B

File an issue against this benchmark runner

alamb · 2026-06-04T14:58:59Z

🤔 results look good, Im curious as to why two of the roundtrip benchmarks were slightly slower even thought encode() is faster across the board.

Looks like the results are reproducable -- next step would be to profile it to see if you can find the answer

I wonder if we are missing a Vec::with_capacity or Vec::reserve to avoid extra allocations / copies 🤔

Rich-T-kid · 2026-06-04T15:41:05Z

i'm suspecting the issue has to do with
let (client, server) = tokio::io::duplex(1024 * 1024);
per the docs "The max_buf_size argument is the maximum amount of bytes that can be written to a side before the write returns Poll::Pending."

The two regression cases both involve large variable-length data where the encoded payload can be huge:
roundtrip/variable/8192x8 — 8 columns × 8192 rows
roundtrip/variable/65536x1 — 65536 rows, large values buffer

This also shows up in the regression cases,
roundtrip/variable/8192x8 1.00 1251.7±29.66µs 1798.6 MB/sec 1.53 1914.9±122.75µs 1175.7 MB/sec
roundtrip/variable/65536x1 1.00 1285.3±51.72µs 1750.7 MB/sec 1.49 1910.5±128.29µs 1177.8 MB/sec
throughput falls flat.

taking a look at the other benchmark results this seems consistant,
roundtrip/fixed/65536x8 1.00 2.2±0.03ms 1821.4 MB/sec 1.19 2.6±0.16ms 1532.6 MB/sec throughput shrinks and as such causes more blocking to happen.

Even in the event where this isn't the reason for the slow down I think 1MB is still to small for realistic max throughput.

Rich-T-kid · 2026-06-04T15:54:17Z

I wonder if we are missing a Vec::with_capacity or Vec::reserve to avoid extra allocations / copies

I think this is a strong possibility after looking at the profile, from my understanding this is mostly in arrow-flight itself and not arrow-ipc. Since arrow-flight is very dependent on arrow-ipc it make sense to start from the ground up with these optimizations.

# Which issue does this PR close?  - Closes #10029. # Rationale for this change Increase the duplex buffer from 1 MB to 64 MB to eliminate artificial back-pressure in the roundtrip benchmarks. See rational in this [comment](#10044 (comment))  # What changes are included in this PR? bumps `max_buf_size` to 64**MB**  # Are these changes tested? n/a  # Are there any user-facing changes? n/a

gabotechs

Just left some comments mostly about naming and some other nits, but overall, it looks good to me!

It was pretty hard navigating through this file, and you did a very good job at shipping this performance optimization without making it worst, nice job @Rich-T-kid!

alamb

Thank you so much @Rich-T-kid -- this looks great

It was pretty hard navigating through this file, and you did a very good job at shipping this performance optimization without making it worst, nice job @Rich-T-kid!

I actually think it is slightly better now (though I have ideas on how to make it even better :) -- thank you @gabotechs for the reviews

I think it would be good to try and avoid the extra copy (comments inline) but we could also do that as a follow on if you prefer

While reviewing this, claude and I found some missing coverage, which I have proposed adding here:

#10097

alamb · 2026-06-09T15:12:53Z

-    buffers: &mut Vec<crate::Buffer>, // output buffer descriptors
-    arrow_data: &mut Vec<u8>,         // output stream
-    offset: i64,                      // current output stream offset
+fn encode_sink_buffer(


similarly to @gabotechs above I found the relationship between buffer, buffers and sink confusing.

Could you perhaps (can be a follow on PR) to add documentation here explaining what this is doing -- specifically that the IpcBodySink is used for the actual arrow data and the buffers is an in-progresss list what will eventually become the IPC metadata

As a follow on, we might even be able to make this clearer by encapsulating buffers in some sort of other struct

struct IpcMetadataBuilder { buffers: Vec<crate::Buffer>, nodes: Vec<crate::FieldNode>, }

And then pass that through 🤔

We could probably also make the code less verbose by making buffers, sink, and field, fields on the IPC writer 🤔

Added some doc comments above both write_array_data() & encode_sink_buffer(). I think encapsulating buffers into a struct would be worth placing in a separate small PR. that PR would/should also include some other minor tweaks like removing the allow(clippy::too_many_arguments) on write_array_data()

Rich-T-kid · 2026-06-09T18:21:55Z

@alamb introduced the changes you mentioned. I'll open up a small PR tomorrow that refactors this file to be easier to understand. Thank you!

alamb · 2026-06-09T20:39:35Z

run benchmarks ipc_writer

adriangbot · 2026-06-09T20:42:27Z

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4663815058-525-bgdlw 6.12.68+ #1 SMP Sat May 2 07:49:07 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing rich-T-kid/optimize-arrow-ipc-copies (70d5802) to d7ef673 (merge-base) diff
BENCH_NAME=ipc_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench ipc_writer
BENCH_FILTER=
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-06-09T20:44:03Z

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

group                                                 main                                   rich-T-kid_optimize-arrow-ipc-copies
-----                                                 ----                                   ------------------------------------
arrow_ipc_stream_writer/FileWriter/write_10           2.01    186.5±1.56µs        ? ?/sec    1.00     92.9±0.41µs        ? ?/sec
arrow_ipc_stream_writer/StreamWriter/write_10         2.06    187.0±1.58µs        ? ?/sec    1.00     90.8±0.28µs        ? ?/sec
arrow_ipc_stream_writer/StreamWriter/write_10/zstd    1.00      7.2±0.03ms        ? ?/sec    1.01      7.3±0.03ms        ? ?/sec

Resource Usage

base (merge-base)

Metric	Value
Wall time	30.0s
Peak memory	11.6 MiB
Avg memory	10.0 MiB
CPU user	26.5s
CPU sys	0.0s
Peak spill	0 B

branch

Metric	Value
Wall time	30.0s
Peak memory	10.1 MiB
Avg memory	9.0 MiB
CPU user	28.0s
CPU sys	0.0s
Peak spill	0 B

File an issue against this benchmark runner

Rich-T-kid · 2026-06-10T15:49:58Z

Ill push the PR after this is merged so that there wont be merge conflicts

alamb · 2026-06-10T18:54:47Z

2x faster is pretty sweet. Thank you again @Rich-T-kid and @gabotechs

JakeDern · 2026-06-10T23:56:23Z

I think its worth mentioning that no dictionary optimizations were made in the PR, could make to make that a follow up ticket.

Looks like I'm a little late to the IPC writer performance party! I was just profiling this yesterday and then I stumbled on this PR today. Awesome stuff, this will really help us over in otel-arrow.

@Rich-T-kid I'm motivated to leverage this optimization for dictionaries and can open an issue + push that forward if you're not already looking at it. Let me know what you think

Rich-T-kid · 2026-06-11T00:30:09Z

@JakeDern please feel free! Im going to be looking into arrow-flight specifically so i wont have time to look at dictionary optimization at the moment. feel free to tag me in the issue & if I get a chance ill try to help out 🚀

alamb · 2026-06-11T00:47:47Z

@JakeDern any thing you can do to make the code easier to understand and avoid copies, it would be great.

alamb · 2026-06-11T00:48:09Z

A good first step might be to extend the benchmarks to include dictionary

# Which issue does this PR close?  Resolves this #10044 (comment) from #10044 # Rationale for this change  Code in this file is hard to navigate & its unclear what is happening. # What changes are included in this PR?  This PR introduces `IpcMetadataBuilde`r, a struct that groups the nodes and buffers vecs previously passed separately into `write_array_data()`, and removes the redundant num_rows/null_count parameters by deriving them from `array_data` directly. Together these reduce `write_array_data()` from 10 arguments to 7, eliminating the #[allow(clippy::too_many_arguments)] suppression, and doc comments are added to clarify the two-channel output model between `IpcMetadataBuilder` (flatbuffer header metadata) and `IpcBodySink` (raw Arrow data bytes). # Are these changes tested? yes  # Are there any user-facing changes? no

github-actions Bot added the arrow Changes to the arrow crate label Jun 1, 2026

Rich-T-kid commented Jun 1, 2026

View reviewed changes

Comment thread arrow-ipc/src/writer.rs Outdated

Rich-T-kid mentioned this pull request Jun 2, 2026

[#10029][benchmarks] arrow-flight roundtrip as well as encode/decode #10031

Merged

gabotechs reviewed Jun 3, 2026

View reviewed changes

Rich-T-kid force-pushed the rich-T-kid/optimize-arrow-ipc-copies branch from fc1dbb8 to ecb37f2 Compare June 3, 2026 18:46

Rich-T-kid mentioned this pull request Jun 4, 2026

Bump max throughput in flight benchmark before blocking #10070

Merged

Rich-T-kid added 6 commits June 4, 2026 14:05

remove repeated buffer copies in encoding pipeline

c3c87a8

fixed some linting issues

fd3a475

minor clean up. good to push

213c8e7

fix CI

6df331c

refactored PR to be easier to read/ dedupe logic

781c75c

linting fix

7acd810

gabotechs approved these changes Jun 9, 2026

View reviewed changes

PR revision & make file easier to understand

ae03023

Rich-T-kid force-pushed the rich-T-kid/optimize-arrow-ipc-copies branch from ca764ea to ae03023 Compare June 9, 2026 13:53

alamb mentioned this pull request Jun 9, 2026

Add arrow-flight test coverage for IPC compression #10097

Open

alamb changed the title ~~Avoid some copies in Arrow IPC~~ Reduce copies in Arrow IPC writer Jun 9, 2026

alamb approved these changes Jun 9, 2026

View reviewed changes

Rich-T-kid added 2 commits June 9, 2026 13:47

add doc comments and add standard interface for IpcBodySink

fdc6ba0

fix clippy

a674bed

Rich-T-kid force-pushed the rich-T-kid/optimize-arrow-ipc-copies branch from 3bfccfd to a674bed Compare June 9, 2026 17:51

Rich-T-kid added 2 commits June 9, 2026 14:00

run ci

6d178ce

forgot to add padding

70d5802

alamb approved these changes Jun 9, 2026

View reviewed changes

alamb mentioned this pull request Jun 10, 2026

[DISCUSSION] 2026 Q3-Q4 Roadmap Discussion apache/datafusion#22882

Open

alamb merged commit 301eb26 into apache:main Jun 10, 2026
27 checks passed

Rich-T-kid mentioned this pull request Jun 10, 2026

removed clippy ignore statment #10111

Merged

JakeDern mentioned this pull request Jun 11, 2026

arrow-ipc: Extend writer benchmarks to include dictionaries #10119

Open

Rich-T-kid mentioned this pull request Jun 11, 2026

[arrow-flight][#10125] add dictionary focused benchmarks that hydrate/resend #10126

Open

This was referenced Jun 11, 2026

arrow-ipc: Reduce writer allocations for dictionary batches #10127

Open

perf(arrow-ipc): Avoid copies and write dictionary batches directly to writers when possible #10128

Draft

Conversation

Rich-T-kid commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

benchmarks

Are there any user-facing changes?

Uh oh!

Uh oh!

gabotechs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Rich-T-kid commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark results from #10031

Uh oh!

Rich-T-kid commented Jun 3, 2026

Uh oh!

Rich-T-kid commented Jun 3, 2026

Uh oh!

Rich-T-kid commented Jun 3, 2026

Uh oh!

alamb commented Jun 4, 2026

Uh oh!

adriangbot commented Jun 4, 2026

Uh oh!

adriangbot commented Jun 4, 2026

Uh oh!

gabotechs commented Jun 4, 2026

Uh oh!

adriangbot commented Jun 4, 2026

Uh oh!

adriangbot commented Jun 4, 2026

Uh oh!

Rich-T-kid commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gabotechs commented Jun 4, 2026

Uh oh!

adriangbot commented Jun 4, 2026

Uh oh!

adriangbot commented Jun 4, 2026

Uh oh!

alamb commented Jun 4, 2026

Uh oh!

Rich-T-kid commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rich-T-kid commented Jun 4, 2026

Uh oh!

gabotechs left a comment

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Rich-T-kid commented Jun 9, 2026

Uh oh!

alamb commented Jun 9, 2026

Rich-T-kid commented Jun 1, 2026 •

edited

Loading

Rich-T-kid commented Jun 3, 2026 •

edited

Loading

Rich-T-kid commented Jun 4, 2026 •

edited

Loading

Rich-T-kid commented Jun 4, 2026 •

edited

Loading