Optimize `RightSemi`, `RightAnti` joins with existence-probes

### Is your feature request related to a problem or challenge?

Suppose we have A semijoin B, and the optimizer chooses to implement that as as a `RightSemi` join. That means we build B and stream A, doing lookups against the B hash table. The current implementation does the following:

1. For a given A value, we lookup all the matching B values, walking the hash chain and checking for equality.
2. For each match, produce a `(probe_idx, build_idx)` pair.
3. Apply the join filter if there's a non-equijoin filter
4. Remove duplicates from the list of all `probe_idx` values
5. Materialize the output `RecordBatch` using the distinct `probe_idx` values

In part we do this because we share code with other join implementations, but some of this work is redundant for `RightSemi` joins (and `RightAnti` joins as well):

1. We can stop once we hit the first matching B value, rather than walking the rest of the chain
2. We never need `build_idx` values
3. We don't need to remove duplicates from the `probe_idx` array, since we never produced them in the first place

The win is biggest for semijoins where many build values match each probe value, but it should still be a win for lower fan-out semi-joins.

We could go further and specialize the hash table for semijoins: we don't need to keep duplicate build-side values in the hash table in the first place, the `visited` bitmap, and so on. But just optimizing the join algorithm itself should get us most of the win.

### Describe the solution you'd like

_No response_

### Describe alternatives you've considered

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `RightSemi`, `RightAnti` joins with existence-probes #22854

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Optimize RightSemi, RightAnti joins with existence-probes #22854

Description

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Optimize `RightSemi`, `RightAnti` joins with existence-probes #22854