Is your feature request related to a problem or challenge? Please describe what you are trying to do.
arrow-select::filter::filter_null_mask filters validity bitmaps through filter_bits, which currently gathers selected bits via index iteration or copies contiguous slices. For dense or irregular predicates, this can do more per-bit work than necessary when compacting a source bitmap by a predicate bitmap.
Describe the solution you'd like
Add or reuse a compress(value: u64, mask: u64) -> u64 bit utility, equivalent to Intel BMI2 _pext_u64, in arrow-buffer and use it in arrow-select::filter_bits. The implementation could process 64-bit chunks of (source_bits, predicate_bits), append compress(source_bits, predicate_bits) with predicate_bits.count_ones() bits, and fall back to existing handling for offsets/remainders.
Describe alternatives you've considered
The existing index and slice strategies are general and correct, and slice copying remains good for long contiguous true runs. Another option is to keep the helper local to parquet, but filter_null_mask lives in arrow-select, so sharing it from arrow-buffer seems more reusable.
Additional context
PR #9848 adds a parquet-local compress helper for compacting validity bits while decoding definition levels. The same primitive appears applicable to filtering BooleanBuffer values and null masks in arrow-select::filter_bits, especially for boolean arrays and filtered validity buffers.
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
arrow-select::filter::filter_null_maskfilters validity bitmaps throughfilter_bits, which currently gathers selected bits via index iteration or copies contiguous slices. For dense or irregular predicates, this can do more per-bit work than necessary when compacting a source bitmap by a predicate bitmap.Describe the solution you'd like
Add or reuse a
compress(value: u64, mask: u64) -> u64bit utility, equivalent to Intel BMI2_pext_u64, inarrow-bufferand use it inarrow-select::filter_bits. The implementation could process 64-bit chunks of(source_bits, predicate_bits), appendcompress(source_bits, predicate_bits)withpredicate_bits.count_ones()bits, and fall back to existing handling for offsets/remainders.Describe alternatives you've considered
The existing index and slice strategies are general and correct, and slice copying remains good for long contiguous true runs. Another option is to keep the helper local to parquet, but
filter_null_masklives inarrow-select, so sharing it fromarrow-bufferseems more reusable.Additional context
PR #9848 adds a parquet-local
compresshelper for compacting validity bits while decoding definition levels. The same primitive appears applicable to filteringBooleanBuffervalues and null masks inarrow-select::filter_bits, especially for boolean arrays and filtered validity buffers.