Skip to content

[10125] [encode path] Minor optimizations to arrow-flight#10137

Open
Rich-T-kid wants to merge 6 commits into
apache:mainfrom
Rich-T-kid:rich-T-kid/minor-arrow-flight-opt
Open

[10125] [encode path] Minor optimizations to arrow-flight#10137
Rich-T-kid wants to merge 6 commits into
apache:mainfrom
Rich-T-kid:rich-T-kid/minor-arrow-flight-opt

Conversation

@Rich-T-kid

@Rich-T-kid Rich-T-kid commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

starting small 😄

Rationale for this change

The arrow-flight encode path was allocating intermediate Vecs to hold data that was immediately iterated and discarded. Replacing these with lazy iterators and inlining the one helper that existed only to loop removes allocations that served no purpose beyond bridging two adjacent lines of code.

What changes are included in this PR?

[commit #1]

  • Remove intermediate Vec allocations in encode path, replace these with Impl<Iterator>.
  • Cache num_rows before split closure
  • Remove queue_messages, inline call site, mark queue_message #[inline]

[commit #2]

  • pre-allocate the vector used to hold uncompressed data.
    • avoids build up of [64k,512k,4MB,12MB...]

[commit #3]

  • Renamed CompressionContext to IpcWriteContext and added an fbb: FlatBufferBuilder<'static> field to it
  • This avoids repeated heap allocations by reusing the same FlatBufferBuilder across writes, using its reset() method to clear state without deallocating

[commit #4]

  • IpcWriteContext gains a scratch: Vec<u8> field. When set before a call to IpcDataGenerator::encode(), the existing allocation is reused instead of allocating a fresh buffer for each batch's arrow data body.
  • arrow-flight's FlightIpcEncoder maintains an ArrowDataPool, a small pool of Arc<Mutex<Vec<Vec>>> buffers pre-sized to the gRPC message limit (2 MiB). Before each encode() call, a buffer is acquired from the pool and placed in IpcWriteContext::scratch. After encoding, the buffer is wrapped in PooledBuf and handed to Bytes::from_owner; when the Bytes is dropped (after the gRPC frame is sent), the buffer is automatically returned to the pool rather than freed.

[commit #5+]

  • tuning the buffer pool, updated the acquire method to also pre-allocate 2MB of space in the vector

Are these changes tested?

yes

Are there any user-facing changes?

no

@github-actions github-actions Bot added arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate labels Jun 12, 2026
}

/// Place the `FlightData` in the queue to send
#[inline]

@Rich-T-kid Rich-T-kid Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler very likely could have inlined this, but I think its work adding this explicitly.

@gabotechs

Copy link
Copy Markdown
Contributor

run benchmarks flight

@adriangbot

Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4691665801-559-vg5z6 6.12.68+ #1 SMP Sat May 2 07:49:07 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing rich-T-kid/minor-arrow-flight-opt (d02e297) to 826b808 (merge-base) diff
BENCH_NAME=flight
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench flight
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                         main                                   rich-T-kid_minor-arrow-flight-opt
-----                         ----                                   ---------------------------------
encode/dict/65536x1           1.02    283.2±1.04µs   887.9 MB/sec    1.00    278.5±1.31µs   902.7 MB/sec
encode/dict/65536x8           1.01      8.7±0.07ms   232.0 MB/sec    1.00      8.5±0.18ms   235.3 MB/sec
encode/dict/8192x1            1.00     35.2±0.02µs   928.0 MB/sec    1.02     35.8±0.03µs   913.1 MB/sec
encode/dict/8192x8            1.02    301.6±1.68µs   866.3 MB/sec    1.00    296.2±1.29µs   882.1 MB/sec
encode/fixed/65536x1          1.03     10.2±0.02µs    47.8 GB/sec    1.00      9.9±0.02µs    49.2 GB/sec
encode/fixed/65536x8          1.02   1121.6±1.92µs     3.5 GB/sec    1.00   1099.7±2.33µs     3.6 GB/sec
encode/fixed/8192x1           1.01      3.2±0.01µs    19.2 GB/sec    1.00      3.1±0.01µs    19.5 GB/sec
encode/fixed/8192x8           1.00     17.7±0.04µs    27.6 GB/sec    1.03     18.2±0.02µs    26.8 GB/sec
encode/nested/65536x1         1.01     38.9±0.29µs    31.4 GB/sec    1.00     38.4±0.17µs    31.8 GB/sec
encode/nested/65536x8         1.03      3.1±0.01ms     3.2 GB/sec    1.00      3.0±0.01ms     3.3 GB/sec
encode/nested/8192x1          1.00      5.7±0.01µs    26.9 GB/sec    1.01      5.8±0.01µs    26.5 GB/sec
encode/nested/8192x8          1.00     48.9±0.13µs    25.0 GB/sec    1.00     48.8±0.08µs    25.0 GB/sec
encode/variable/65536x1       1.00     73.4±0.26µs    29.9 GB/sec    1.01     73.9±0.31µs    29.7 GB/sec
encode/variable/65536x8       1.00      5.2±0.06ms     3.4 GB/sec    1.00      5.2±0.07ms     3.4 GB/sec
encode/variable/8192x1        1.00      6.9±0.01µs    40.1 GB/sec    1.02      7.0±0.01µs    39.1 GB/sec
encode/variable/8192x8        1.01     89.4±0.15µs    24.6 GB/sec    1.00     88.9±0.22µs    24.7 GB/sec
roundtrip/dict/65536x1        1.00  1275.9±46.22µs   197.0 MB/sec    1.01  1284.9±45.94µs   195.7 MB/sec
roundtrip/dict/65536x8        1.00     14.4±0.63ms   140.0 MB/sec    1.14     16.3±0.56ms   123.2 MB/sec
roundtrip/dict/8192x1         1.00    205.6±5.43µs   158.8 MB/sec    1.01    208.7±5.77µs   156.5 MB/sec
roundtrip/dict/8192x8         1.00  1313.8±42.83µs   198.9 MB/sec    1.00  1315.5±50.14µs   198.6 MB/sec
roundtrip/fixed/65536x1       1.00    305.2±3.84µs  1638.6 MB/sec    1.02    310.5±4.65µs  1610.4 MB/sec
roundtrip/fixed/65536x8       1.01      2.2±0.07ms  1855.0 MB/sec    1.00      2.1±0.04ms  1870.2 MB/sec
roundtrip/fixed/8192x1        1.02     90.3±1.35µs   693.3 MB/sec    1.00     88.9±1.07µs   703.7 MB/sec
roundtrip/fixed/8192x8        1.00    323.9±3.75µs  1545.8 MB/sec    1.02    330.9±5.18µs  1513.4 MB/sec
roundtrip/nested/65536x1      1.00   843.8±41.42µs  1481.6 MB/sec    1.00   841.6±41.74µs  1485.6 MB/sec
roundtrip/nested/65536x8      1.00      9.4±0.67ms  1066.8 MB/sec    1.12     10.5±0.37ms   949.0 MB/sec
roundtrip/nested/8192x1       1.00    156.6±5.36µs   999.1 MB/sec    1.01    157.9±4.96µs   990.6 MB/sec
roundtrip/nested/8192x8       1.00   889.4±42.46µs  1407.3 MB/sec    1.01   896.2±45.08µs  1396.6 MB/sec
roundtrip/variable/65536x1    1.00  1203.2±34.81µs  1870.1 MB/sec    1.04  1254.1±70.01µs  1794.3 MB/sec
roundtrip/variable/65536x8    1.03     16.4±0.51ms  1094.7 MB/sec    1.00     16.0±0.43ms  1124.1 MB/sec
roundtrip/variable/8192x1     1.00    204.6±5.86µs  1375.8 MB/sec    1.01    206.5±5.97µs  1362.6 MB/sec
roundtrip/variable/8192x8     1.00  1204.0±33.06µs  1869.9 MB/sec    1.01  1217.2±28.50µs  1849.6 MB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 340.1s
Peak memory 98.5 MiB
Avg memory 36.7 MiB
CPU user 345.0s
CPU sys 73.4s
Peak spill 0 B

branch

Metric Value
Wall time 335.1s
Peak memory 99.5 MiB
Avg memory 36.6 MiB
CPU user 339.9s
CPU sys 76.9s
Peak spill 0 B

File an issue against this benchmark runner

@Rich-T-kid

Copy link
Copy Markdown
Contributor Author

seems like its mostly noise

@Rich-T-kid

Copy link
Copy Markdown
Contributor Author
roundtrip/nested/65536x8      1.00      9.4±0.67ms  1066.8 MB/sec    1.12     10.5±0.37ms   949.0 MB/sec

its interesting that this seems to always regress

@Rich-T-kid Rich-T-kid changed the title [10125] Minor optimizations to arrow-flight [10125] [encode path] Minor optimizations to arrow-flight Jun 12, 2026
@Rich-T-kid Rich-T-kid force-pushed the rich-T-kid/minor-arrow-flight-opt branch 2 times, most recently from 2c00600 to 337abd5 Compare June 12, 2026 18:02
Comment thread arrow-ipc/src/compression.rs Outdated
@Rich-T-kid Rich-T-kid force-pushed the rich-T-kid/minor-arrow-flight-opt branch from 337abd5 to 094579b Compare June 12, 2026 21:03
@Rich-T-kid

Rich-T-kid commented Jun 13, 2026

Copy link
Copy Markdown
Contributor Author

@Jefffrey I meant to ping you on this PR . Sorry about that!

@Jefffrey

Copy link
Copy Markdown
Contributor

run benchmarks flight

@adriangbot

Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4697178377-565-qnbtn 6.12.68+ #1 SMP Sat May 2 07:49:07 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing rich-T-kid/minor-arrow-flight-opt (505fb20) to 826b808 (merge-base) diff
BENCH_NAME=flight
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench flight
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                          main                                   rich-T-kid_minor-arrow-flight-opt
-----                          ----                                   ---------------------------------
encode/dict/65536x1            1.01    273.2±1.44µs   920.4 MB/sec    1.00    271.3±0.45µs   926.7 MB/sec
encode/dict/65536x16                                                  1.00     17.3±0.22ms   232.8 MB/sec
encode/dict/65536x4                                                   1.00   1180.2±4.95µs   852.1 MB/sec
encode/dict/65536x8            1.42      8.9±0.19ms   225.1 MB/sec    1.00      6.3±0.11ms   318.6 MB/sec
encode/dict/8192x1             1.00     35.2±0.03µs   928.7 MB/sec    1.00     35.2±0.04µs   927.0 MB/sec
encode/dict/8192x16                                                   1.00    630.1±2.04µs   829.3 MB/sec
encode/dict/8192x4                                                    1.00    143.2±0.12µs   912.3 MB/sec
encode/dict/8192x8             1.00    298.5±2.75µs   875.4 MB/sec    1.00    298.1±0.83µs   876.5 MB/sec
encode/fixed/65536x1           1.08     10.6±0.02µs    46.0 GB/sec    1.00      9.8±0.01µs    49.7 GB/sec
encode/fixed/65536x16                                                 1.00      2.4±0.03ms     3.3 GB/sec
encode/fixed/65536x4                                                  1.00     49.8±0.17µs    39.3 GB/sec
encode/fixed/65536x8           1.00   1110.2±5.22µs     3.5 GB/sec    1.02   1135.8±3.38µs     3.4 GB/sec
encode/fixed/8192x1            1.00      3.2±0.01µs    19.0 GB/sec    1.03      3.3±0.01µs    18.5 GB/sec
encode/fixed/8192x16                                                  1.00     36.2±0.18µs    27.0 GB/sec
encode/fixed/8192x4                                                   1.00      8.8±0.01µs    27.8 GB/sec
encode/fixed/8192x8            1.04     17.4±0.05µs    28.1 GB/sec    1.00     16.7±0.02µs    29.3 GB/sec
encode/nested/65536x1          1.00     28.1±0.20µs    43.5 GB/sec    1.04     29.3±0.30µs    41.7 GB/sec
encode/nested/65536x16                                                1.00      7.1±0.18ms     2.8 GB/sec
encode/nested/65536x4                                                 1.00  1485.8±19.84µs     3.3 GB/sec
encode/nested/65536x8          1.00      3.2±0.06ms     3.0 GB/sec    1.00      3.2±0.08ms     3.0 GB/sec
encode/nested/8192x1           1.16      6.8±0.01µs    22.6 GB/sec    1.00      5.8±0.01µs    26.2 GB/sec
encode/nested/8192x16                                                 1.00    148.7±0.41µs    16.4 GB/sec
encode/nested/8192x4                                                  1.00     21.3±0.03µs    28.7 GB/sec
encode/nested/8192x8           1.00     46.2±0.23µs    26.4 GB/sec    1.06     48.8±0.11µs    25.0 GB/sec
encode/variable/65536x1        1.59     81.4±0.51µs    27.0 GB/sec    1.00     51.2±0.22µs    42.9 GB/sec
encode/variable/65536x16                                              1.00     11.2±0.14ms     3.1 GB/sec
encode/variable/65536x4                                               1.00      2.4±0.05ms     3.6 GB/sec
encode/variable/65536x8        1.05      5.4±0.08ms     3.2 GB/sec    1.00      5.1±0.10ms     3.4 GB/sec
encode/variable/8192x1         1.17      7.0±0.01µs    39.1 GB/sec    1.00      6.0±0.01µs    45.8 GB/sec
encode/variable/8192x16                                               1.00   1171.6±7.63µs     3.8 GB/sec
encode/variable/8192x4                                                1.00     24.9±0.04µs    44.2 GB/sec
encode/variable/8192x8         1.06     80.7±0.13µs    27.2 GB/sec    1.00     76.0±0.22µs    28.9 GB/sec
roundtrip/dict/65536x1         1.01  1330.0±45.25µs   189.0 MB/sec    1.00  1315.9±45.27µs   191.1 MB/sec
roundtrip/dict/65536x16                                               1.00     29.5±1.10ms   136.5 MB/sec
roundtrip/dict/65536x4                                                1.00      6.7±0.23ms   150.9 MB/sec
roundtrip/dict/65536x8         1.06     15.3±0.72ms   131.9 MB/sec    1.00     14.3±0.54ms   140.2 MB/sec
roundtrip/dict/8192x1          1.00    212.8±5.92µs   153.4 MB/sec    1.00    212.4±6.06µs   153.8 MB/sec
roundtrip/dict/8192x16                                                1.00      2.4±0.05ms   216.8 MB/sec
roundtrip/dict/8192x4                                                 1.00   687.6±23.18µs   190.0 MB/sec
roundtrip/dict/8192x8          1.00  1355.1±49.83µs   192.8 MB/sec    1.00  1357.6±52.34µs   192.5 MB/sec
roundtrip/fixed/65536x1        1.01    319.7±3.74µs  1564.3 MB/sec    1.00    315.2±4.71µs  1586.4 MB/sec
roundtrip/fixed/65536x16                                              1.00      7.0±0.22ms  1142.9 MB/sec
roundtrip/fixed/65536x4                                               1.00  1306.1±82.37µs  1531.6 MB/sec
roundtrip/fixed/65536x8        1.00      2.3±0.08ms  1733.1 MB/sec    1.00      2.3±0.06ms  1727.3 MB/sec
roundtrip/fixed/8192x1         1.04     95.5±1.40µs   655.5 MB/sec    1.00     92.2±1.00µs   678.7 MB/sec
roundtrip/fixed/8192x16                                               1.00    654.1±8.15µs  1531.1 MB/sec
roundtrip/fixed/8192x4                                                1.00    197.5±3.38µs  1267.8 MB/sec
roundtrip/fixed/8192x8         1.00    339.6±4.53µs  1474.5 MB/sec    1.00    338.7±5.18µs  1478.5 MB/sec
roundtrip/nested/65536x1       1.03   882.8±43.55µs  1416.1 MB/sec    1.00   859.9±42.06µs  1453.9 MB/sec
roundtrip/nested/65536x16                                             1.00     19.3±0.68ms  1036.5 MB/sec
roundtrip/nested/65536x4                                              1.00      3.8±0.23ms  1305.6 MB/sec
roundtrip/nested/65536x8       1.24     10.7±0.73ms   931.4 MB/sec    1.00      8.7±0.28ms  1152.5 MB/sec
roundtrip/nested/8192x1        1.03    162.9±5.47µs   960.6 MB/sec    1.00    158.7±5.99µs   986.1 MB/sec
roundtrip/nested/8192x16                                              1.00  1628.2±41.63µs  1537.4 MB/sec
roundtrip/nested/8192x4                                               1.00   470.5±21.40µs  1330.1 MB/sec
roundtrip/nested/8192x8        1.00   930.5±41.73µs  1345.1 MB/sec    1.00   926.5±44.00µs  1350.9 MB/sec
roundtrip/variable/65536x1     1.01  1249.7±39.83µs  1800.5 MB/sec    1.00  1236.9±36.21µs  1819.2 MB/sec
roundtrip/variable/65536x16                                           1.00     31.3±1.17ms  1150.4 MB/sec
roundtrip/variable/65536x4                                            1.00      8.1±0.31ms  1115.6 MB/sec
roundtrip/variable/65536x8     1.04     17.0±0.50ms  1059.5 MB/sec    1.00     16.4±0.70ms  1100.1 MB/sec
roundtrip/variable/8192x1      1.03    214.7±5.60µs  1310.8 MB/sec    1.00    208.7±6.21µs  1348.2 MB/sec
roundtrip/variable/8192x16                                            1.00      3.3±0.27ms  1367.4 MB/sec
roundtrip/variable/8192x4                                             1.00   680.5±24.00µs  1654.1 MB/sec
roundtrip/variable/8192x8      1.03  1267.2±30.87µs  1776.7 MB/sec    1.00  1228.6±32.30µs  1832.4 MB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 340.1s
Peak memory 100.2 MiB
Avg memory 38.1 MiB
CPU user 338.8s
CPU sys 75.8s
Peak spill 0 B

branch

Metric Value
Wall time 660.1s
Peak memory 146.3 MiB
Avg memory 47.0 MiB
CPU user 620.5s
CPU sys 187.9s
Peak spill 0 B

File an issue against this benchmark runner

@Rich-T-kid

Copy link
Copy Markdown
Contributor Author

Nice, regressions are gone. should re-run when 54faeda gets merged. I expected a larger improvement for larger rows/columns batches. I'll profile & update the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants