Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions native-engine/datafusion-ext-plans/src/joins/join_hash_map.rs
Original file line number Diff line number Diff line change
Expand Up @@ -253,15 +253,23 @@ impl Table {
let mut e = entries![i] as usize;
loop {
let hash_matched = self.map[e].hashes.simd_eq(Simd::splat(hashes[i]));
let empty = self.map[e].hashes.simd_eq(Simd::splat(0));

if let Some(pos) = (hash_matched | empty).first_set() {
// Fast path: check hash match first (common case)
if let Some(pos) = hash_matched.first_set() {
hashes[i] = unsafe {
// safety: transmute MapValue(u32) to u32
std::mem::transmute(self.map[e].values[pos])
};
break;
}

// Slow path: check empty slot only when no match
let empty = self.map[e].hashes.simd_eq(Simd::splat(0));
if empty.any() {
Comment on lines 255 to +268
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correctness of checking hash_matched.first_set() before computing the empty mask relies on an invariant that, within a MapValueGroup, all occupied lanes are packed from the beginning (i.e., there cannot be an empty lane before a later occupied lane). That invariant currently holds because insertion always uses empty.first_set(), but it's not stated here and a future change (e.g., tombstones/deletes or a different insertion strategy) could break this lookup logic. Please document this invariant explicitly (or add a debug-only assertion) so this fast-path doesn't become subtly incorrect later.

Copilot uses AI. Check for mistakes.
hashes[i] = MapValue::EMPTY.0;
break;
}

e += 1;
e %= 1 << self.map_mod_bits;
}
Expand Down
Loading