Fix shared TopK early exit with global prefix threshold#1
Open
geoffreyclaude wants to merge 1 commit into
Conversation
Author
|
@codex review |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Supplements apache#22852 for apache#22849.
Rationale for this change
PR apache#22852 fixes the local all-filtered-batch path by calling
attempt_early_completionbefore returning. The remaining regression is in partitionedSortExec: every localTopKshares oneTopKDynamicFilters. One partition can establish a global threshold before another partition has filled its local heap. The second partition then sees fully rejected batches, butheap.max()is stillNonelocally, so it cannot prove completion and keeps draining sorted input.This patch stores the common-prefix row for the shared global threshold, and each local
TopKchecks that shared prefix before falling back to its local heap prefix. It also prevents local partitionTopKs from marking a shared dynamic filter complete while sibling partitions can still tighten it. Single-partition behavior is unchanged.What changes are included in this PR?
TopKDynamicFilters.attempt_early_completionbefore local heap fallback.TopKfilters complete from individual local partitions.Are these changes tested?
Commands run on the final rebased branch:
cargo fmt --allcargo test -p datafusion-physical-plan topk --libcargo test -p datafusion-physical-plan sort --libcargo clippy -p datafusion-physical-plan --lib -- -D warningscargo clippy --all-targets --all-features -- -D warningscargo build --release --bin dfbenchBenchmark command:
target/release/dfbench sort-tpch --sorted --limit 10 --iterations 5 --path /tmp/df-topk-bench-data/tpch_sf1 -o /tmp/df-patched-rerun2-top10_sorted_tpch.jsonResults, using the clean reruns and ignoring the earlier noisy iteration:
Key comparisons:
5.60 -> 3.98 ms, Q97.77 -> 5.89 ms, Q1010.20 -> 8.16 ms.28.48 -> 3.98 ms, Q939.47 -> 5.89 ms, Q1057.70 -> 8.16 ms.Debug proof for the bounded-read shape after this patch:
DataSourceExecoutput_rows=81.92K, output_batches=10, files_processed=0, bytes_scanned=15.79M.DataSourceExecoutput_rows=81.92K, output_batches=10, files_processed=0, bytes_scanned=20.89M.DataSourceExecoutput_rows=81.92K, output_batches=10, files_processed=0, bytes_scanned=34.69M.Those Q8/Q9/Q10 debug runs show the scan returns to one batch per partition instead of draining millions of rows across the remaining file ranges.
Are there any user-facing changes?
No. This is an internal physical execution optimization fix.