[feat] Hybrid layout for HashJoin/Sort by ruochenj123 · Pull Request #119 · bytedance/bolt

ruochenj123 · 2026-01-14T20:41:58Z

What problem does this PR solve?

Issue Number: #11

Type of Change

🐛 Bug fix (non-breaking change which fixes an issue)
✨ New feature (non-breaking change which adds functionality)
🚀 Performance improvement (optimization)
⚠️ Breaking change (fix or feature that would cause existing functionality to change)
🔨 Refactoring (no logic changes)
🔧 Build/CI or Infrastructure changes
📝 Documentation only

Description

Currently HashJoin and Sort operations store data in row-based RowContainer, which incurs non-trivial layout conversion overhead. This PR introduces a hybrid storage design that keeps payload columns in their original columnar format while only storing keys in RowContainer, reducing this layout conversion overhead.

Main Changes

HybridContainer
- Separates key storage (in RowContainer) from payload storage (kept as RowVectorPtr).
- Introduces encoded HybridRowId to reference payload rows.
- HybridRowId encodes {containerId, rowId} to support multi-driver parallel execution in HashJoin.
Multi-driver support in HashJoin
- Each driver builds its own HybridContainer.
- After table merge, the allContainers_ map enables cross-container payload extraction during the probe phase.
Extraction optimizations
- coalesceBatches() flattens multiple payload batches into a single contiguous batch to reduce TLB misses during extraction.
- sortByContainerId() reorders rows by containerId before extraction to improve cache locality in multi-container scenarios.
- Prefetching during extraction to hide memory latency and reduce data loading time.
- isSingleContainer() provides a fast path that skips sorting overhead in the single-driver scenario.
Configuration options
- hybrid_join_enabled / hybrid_sort_enabled to opt in to hybrid execution.
- hybrid_join_reorder_enabled to control row reordering (disabled in tests to preserve deterministic output).
- hybrid_join_scattered_mode_enabled/hybrid_sort_scattered_mode_enabled to control the reconstruction methods. scattered_mode requires less memory but might lead to overhead of reconstruction due to cache/TLB misses.

Performance Impact

[] No Impact: This change does not affect the critical path (e.g., build system, doc, error handling).
Positive Impact: I have run benchmarks.

Click to view Benchmark Results

Production Workload Results

Sort + Window Queries

Queries with Sort followed by Window functions (RANK, ROW_NUMBER, LEAD/LAG).

Metric Q1 Q2 Q3 Q4 Q5

Total Impr. 1.04x 1.22x 1.08x 1.03x 1.47x

Sort Impr. 1.41x 1.09x 1.69x 1.60x 2.81x

Summary: Sort speedups 1.09x–2.81x; Total speedups 1.03x–1.47x.

Dynamic Partition Insert Queries

Queries writing to partitioned tables with Sort for partition ordering.

Metric Q1 Q2 Q3 Q4 Q5

Total Impr. 1.14x 1.15x 1.05x 1.16x 1.03x

Sort Impr. 1.50x 2.08x 2.19x 2.18x 0.93x

Summary: Sort speedups 1.50x–2.19x (excluding Q5 regression); Total speedups 1.03x–1.16x.

Suboptimal Join Order Queries

Queries with suboptimal join orders placing larger tables on the build side.

Metric Q1 Q2 Q3 Q4 Q5

Total Impr. 1.01x 1.07x 1.06x 1.02x 1.05x

Join Impr. 1.19x 1.53x 1.04x 1.05x 0.97x

Summary: Join speedups 1.04x–1.53x; Total speedups 1.01x–1.07x.
Negative Impact: Explained below (e.g., trade-off for correctness).

Release Note

Release Note:
- Add hybrid execution model for HashJoin and Sort.
- Add configurations about extraction methods for hybrid model.
- The experiments on production workloads show up 1.5X improvement.

Checklist (For Author)

I have added/updated unit tests (ctest).
I have verified the code with local build (Release/Debug).
I have run clang-format / linters.
(Optional) I have run Sanitizers (ASAN/TSAN) locally for complex C++ changes.
No need to test or manual test.

Breaking Changes

No

Yes (Description: ...)

Click to view Breaking Changes

Breaking Changes:
- Description of the breaking change.
- Possible solutions or workarounds.
- Any other relevant information.

CLAassistant · 2026-01-14T20:42:05Z

All committers have signed the CLA.

yangzhg · 2026-01-21T02:24:47Z

            << ", spill enabled: " << spillEnabled()
-            << ", maxHashTableSize = " << maxHashTableBucketCount_;
+            << ", maxHashTableSize = " << maxHashTableBucketCount_
+            << ", hybrid mode " << (hybridJoin_ ? "enabled" : "disbaled");


typo disabled

yangzhg · 2026-01-21T02:25:36Z

+  if (hybridJoin_) {
+    BOLT_CHECK_LE(
+        driverId_,
+        255,


why hardcode limit to 255?

Currently we store a BIGINT (64 bits) of rowId where the top 8 bits represents the driverId and the remaining 56 bits represents the rowId for each driver. So the max # of driver it supports is 255. Maybe we can make it as a config.

yangzhg · 2026-01-21T03:10:09Z

+    const T* rawValues = flatChild->rawValues();
+    const uint64_t* rawNulls = flatChild->rawNulls();
+
+    constexpr vector_size_t kPrefetchDist = 16;


Does the magic number 16 fit for all the arch? e.g. x86 arm?

I'm not sure. I tuned it on a x86_64 machine, k = 8-32 perform similarly.

…es empty batch

kexianda · 2026-04-01T15:19:48Z

+
+        const auto rid = rowIdPtr[idx].rowId_;
+        if (rawNulls != nullptr && bits::isBitNull(rawNulls, rid)) {
+          bits::setNull(nulls, resultIndex, true);


isBitNull & setNull are not very efficient. How about set 4/8 bits at a time？

bits::setNull(nulls, index, byte, mask) // add a new interface ?

We can only use bitwise OR/AND to eliminate this if branch.

Yeah we can setNull for 4 or 8 bits (rows) at a time by modifying a uint_8. But for if branch, we also use that condition check to avoid the value copy if it is null, so I'm not sure if we can really eliminate it.

Yeah, it not easy to eliminate if branch.
we can use gather instruction to get the scattered bytes(nulls), then movemask it as a 8-bit byte result.
But gather instruction shows no obvious advantage. It is OK to keep the if branch.

kexianda · 2026-04-02T08:41:22Z

@ruochenj123 I have completed the code review. The design and implementation look fine to me, and I have left some minor comments.

ruochenj123 changed the title ~~Hybrid layout design for HashJoin/Sort~~ [WIP] Hybrid layout design for HashJoin/Sort Jan 14, 2026

markjin1990 requested review from kexianda and markjin1990 January 14, 2026 21:34

markjin1990 added the performance performance improvement needed label Jan 14, 2026

markjin1990 requested a review from fzhedu January 14, 2026 21:35

ruochenj123 force-pushed the hybrid-design branch 2 times, most recently from 03120aa to dfe7dce Compare January 15, 2026 15:11

markjin1990 changed the title ~~[WIP] Hybrid layout design for HashJoin/Sort~~ Hybrid layout design for HashJoin/Sort Jan 20, 2026

yangzhg reviewed Jan 21, 2026

View reviewed changes

ruochenj123 force-pushed the hybrid-design branch from dfe7dce to 42a50e9 Compare February 11, 2026 20:32

ruochenj123 force-pushed the hybrid-design branch from 2d125ec to f8bb01e Compare February 24, 2026 22:23

guhaiyan0221 force-pushed the main branch from 906a3a9 to b88fc0e Compare March 4, 2026 15:38

ruochenj123 force-pushed the hybrid-design branch 3 times, most recently from 7fbf61f to 8822918 Compare March 26, 2026 00:38

markjin1990 requested changes Mar 31, 2026

View reviewed changes

ruochenj123 and others added 11 commits March 31, 2026 14:23

WIP hybrid design

f865dad

implement hybrid for spill

f9f76cf

add fast path

d10ec5e

clang-format

d982d9b

fix lisence

de2ed7d

fix empty container case

7ce60d1

Fix: Disable sorting in extractColumns when probe columns exist

a505d55

Fix empty build side bug in hybrid join: ensure coalesceBatches creat…

ea15132

…es empty batch

add scattered mode support for hybrid join

087672f

Fix hybrid sort spill memory accounting and add scattered mode

2c2bb27

[fix] Add support for ROW

e3bf098

Address code review feedback

142e8ff

ruochenj123 force-pushed the hybrid-design branch from 8822918 to 142e8ff Compare March 31, 2026 18:23

markjin1990 changed the title ~~Hybrid layout design for HashJoin/Sort~~ [feat] Hybrid layout design for HashJoin/Sort Mar 31, 2026

markjin1990 reviewed Mar 31, 2026

View reviewed changes

Comment thread .gitignore Outdated

Comment thread .gitignore Outdated

ruochenj123 added 4 commits March 31, 2026 15:34

Revert .gitignore to upstream version

6bdd35c

Revert TpchBenchmark.cpp to upstream version

6c95d95

clang-format

fd46eb2

Fix typo: disbaled -> disabled

dbc1099

markjin1990 added the enhancement New feature or request label Apr 1, 2026

markjin1990 changed the title ~~[feat] Hybrid layout design for HashJoin/Sort~~ [feat] Hybrid layout for HashJoin/Sort Apr 1, 2026

kexianda reviewed Apr 1, 2026

View reviewed changes

markjin1990 approved these changes Apr 2, 2026

View reviewed changes

markjin1990 added this pull request to the merge queue Apr 8, 2026

Merged via the queue into bytedance:main with commit 537fbdf Apr 8, 2026
7 checks passed

Metric	Q1	Q2	Q3	Q4	Q5
Total Impr.	1.04x	1.22x	1.08x	1.03x	1.47x
Sort Impr.	1.41x	1.09x	1.69x	1.60x	2.81x

Metric	Q1	Q2	Q3	Q4	Q5
Total Impr.	1.14x	1.15x	1.05x	1.16x	1.03x
Sort Impr.	1.50x	2.08x	2.19x	2.18x	0.93x

Metric	Q1	Q2	Q3	Q4	Q5
Total Impr.	1.01x	1.07x	1.06x	1.02x	1.05x
Join Impr.	1.19x	1.53x	1.04x	1.05x	0.97x

Conversation

ruochenj123 commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

Type of Change

Description

Main Changes

Performance Impact

Production Workload Results

Sort + Window Queries

Dynamic Partition Insert Queries

Suboptimal Join Order Queries

Release Note

Checklist (For Author)

Breaking Changes

Uh oh!

CLAassistant commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kexianda commented Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ruochenj123 commented Jan 14, 2026 •

edited

Loading

CLAassistant commented Jan 14, 2026 •

edited

Loading