Skip to content

Optimize Arrow boolean/null filtering with BMI isntructions #10098

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
arrow-select::filter::filter_null_mask filters validity bitmaps through filter_bits, which currently gathers selected bits via index iteration or copies contiguous slices. For dense or irregular predicates, this can do more per-bit work than necessary when compacting a source bitmap by a predicate bitmap.

Describe the solution you'd like
Add or reuse a compress(value: u64, mask: u64) -> u64 bit utility, equivalent to Intel BMI2 _pext_u64, in arrow-buffer and use it in arrow-select::filter_bits. The implementation could process 64-bit chunks of (source_bits, predicate_bits), append compress(source_bits, predicate_bits) with predicate_bits.count_ones() bits, and fall back to existing handling for offsets/remainders.

Describe alternatives you've considered
The existing index and slice strategies are general and correct, and slice copying remains good for long contiguous true runs. Another option is to keep the helper local to parquet, but filter_null_mask lives in arrow-select, so sharing it from arrow-buffer seems more reusable.

Additional context
PR #9848 adds a parquet-local compress helper for compacting validity bits while decoding definition levels. The same primitive appears applicable to filtering BooleanBuffer values and null masks in arrow-select::filter_bits, especially for boolean arrays and filtered validity buffers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelog

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions