[arrow-flight][#10125] add dictionary focused benchmarks that hydrate/resend#10126
[arrow-flight][#10125] add dictionary focused benchmarks that hydrate/resend#10126Rich-T-kid wants to merge 4 commits into
Conversation
|
This should provide better visibility for any optimizations we to the encoding path in |
|
run benchmark flight |
alamb
left a comment
There was a problem hiding this comment.
looks good to me -- thank you @Rich-T-kid
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing rich-t-kid/investigate-arrow-flight (ef4ddcd) to 826b808 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
@alamb updating this PR to include the case for 4 columns as well as [1,8]. This is mostly to illustrate a point in #10137. I updated my local benchmarks on main & my branch to use [1,4,8,16] as it shows a more gradual trend line. 16 is overkill I think 8 columns is a good max but 1 -> 8 is too steep of a jump. |
|
@Jefffrey do you mind running the arrow-flight benchmarks on this PR? Thx |
|
run benchmark flight |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing rich-t-kid/investigate-arrow-flight (54faeda) to 826b808 (merge-base) diff File an issue against this benchmark runner |
|
I have too many PR tabs open, I meant to write that for #10137 |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
Which issue does this PR close?
Part of
Rationale for this change
Going through the arrow-flight codebase I noticed that by default
DictionaryHandlingis set to Hydrate. This means it expands the arrays out to their logical form. In other words when the variant is set to hydrate,arrow-ipc::IpcDataGenerator::encode_all_dicts()never actually runs.This is important due to the arrow-ipc work that @alamb , @JakeDern & myself have been working on. Efforts are being made to optimize arrow-ipc's use of dictionaries. This PR allows those chanages to be visible through arrow-flight benchmarks
What changes are included in this PR?
This PR adds a benchmark for arrow-flight's
do_putendpoint using dictionary arrays, measuring the latency difference between the two DictionaryHandling variants.Are these changes tested?
changes are benchmarks
Are there any user-facing changes?
no