Skip to content

perf: Improve ConstantVector performance in OptimizedHashPartitionFunction#2014

Open
yingsu00 wants to merge 3 commits into
IBM:optimized_partitionedoutputfrom
yingsu00:ConstantVectorHashing
Open

perf: Improve ConstantVector performance in OptimizedHashPartitionFunction#2014
yingsu00 wants to merge 3 commits into
IBM:optimized_partitionedoutputfrom
yingsu00:ConstantVectorHashing

Conversation

@yingsu00
Copy link
Copy Markdown
Collaborator

@yingsu00 yingsu00 commented May 9, 2026

Before

    ----------------------------------------------------------------------------
    partition_bool_remote_p16_constant_all_null                 7.15us   139.87K
    optimized_partition_bool_remote_p16_constant_al 349.51%     2.05us   488.84K
    ----------------------------------------------------------------------------
    partition_bool_remote_p100_constant_no_null                 7.14us   139.97K
    optimized_partition_bool_remote_p100_constant_n 190.04%     3.76us   265.99K
    ----------------------------------------------------------------------------
After
    ----------------------------------------------------------------------------
    partition_bool_remote_p16_constant_all_null                 7.15us   139.82K
    optimized_partition_bool_remote_p16_constant_al 15456.%    46.28ns    21.61M
    ----------------------------------------------------------------------------
    partition_bool_remote_p100_constant_no_null                 7.14us   140.12K
    optimized_partition_bool_remote_p100_constant_n 14819.%    48.16ns    20.76M
    ----------------------------------------------------------------------------
Other data types show the same pattern.

Depends on #2016

@yingsu00 yingsu00 requested a review from xin-zhang2 May 9, 2026 21:59
@yingsu00 yingsu00 self-assigned this May 9, 2026
if (!decoded_.isConstantMapping()) {
return std::nullopt;
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also need an early return when decoded_.size() == 0

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Member

@xin-zhang2 xin-zhang2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. The unit tests for ConstantVector are still using the hash function. Can you also update them?

@yingsu00 yingsu00 force-pushed the ConstantVectorHashing branch from 06dfd48 to 00f3559 Compare May 19, 2026 00:25
Introduce OptimizedHashPartitionFunction as a faster drop-in replacement
for HashPartitionFunction, gated behind a new query config flag
optimized_hash_partition_function_enabled (default false). partition()
is improved from 50% to over 200x.

Add HashPartitionFunctionBase as a common base exposing numPartitions(),
and createHashPartitionFunction() factories that select the
implementation based on the flag. Thread QueryConfig* through
PartitionFunctionSpec::create() and update callsites (LocalPartition,
PartitionedOutput, MarkDistinct, RowNumber, Window,
SubPartitionedSortWindowBuild, HiveConnector) to construct partition
functions via the factory.

Register CMake targets for the new test and benchmark binaries.
@yingsu00 yingsu00 force-pushed the ConstantVectorHashing branch from 00f3559 to 4ba78f8 Compare May 21, 2026 00:42
@yingsu00 yingsu00 requested a review from majetideepak as a code owner May 21, 2026 00:42
@yingsu00 yingsu00 changed the title perf: Improve ConstantVector hashing in OptimizedVectorHasher perf: Improve ConstantVector performance in OptimizedHashPartitionFunction May 21, 2026
@yingsu00 yingsu00 force-pushed the ConstantVectorHashing branch from 4ba78f8 to 740ba4d Compare May 21, 2026 01:22
…ction

Before
----------------------------------------------------------------------------
partition_bool_remote_p16_constant_all_null                 7.15us   139.87K
optimized_partition_bool_remote_p16_constant_al 349.51%     2.05us   488.84K
----------------------------------------------------------------------------
partition_bool_remote_p100_constant_no_null                 7.14us   139.97K
optimized_partition_bool_remote_p100_constant_n 190.04%     3.76us   265.99K
----------------------------------------------------------------------------

After
----------------------------------------------------------------------------
partition_bool_remote_p16_constant_all_null                 7.15us   139.82K
optimized_partition_bool_remote_p16_constant_al 15456.%    46.28ns    21.61M
----------------------------------------------------------------------------
partition_bool_remote_p100_constant_no_null                 7.14us   140.12K
optimized_partition_bool_remote_p100_constant_n 14819.%    48.16ns    20.76M
----------------------------------------------------------------------------

Other data types show the same pattern.
@yingsu00
Copy link
Copy Markdown
Collaborator Author

Looks good. The unit tests for ConstantVector are still using the hash function. Can you also update them?

The hash function is still the entry point for vectors. I added new tests scalarHashPrecomputed, scalarHashConstant, scalarHashConstantEmpty in OptimizedVectorHasherTest

@yingsu00 yingsu00 force-pushed the ConstantVectorHashing branch from 740ba4d to d2e4084 Compare May 21, 2026 01:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants