You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
perf: Add bulkGet64WithBaseline and 8-byte fast path for FixedBitWidthEncoding (facebookincubator#641)
Summary:
Referenced from MRS AusList decode optimization D98819389 (AusLongListForBitpackEncoder). Ports the key branchless byte-aligned load technique to Nimble's FixedBitWidthEncoding for general use.
Add bulk decode optimizations for 64-bit types in FixedBitWidthEncoding, targeting the selective reader and serializer/deserializer materialize() paths.
Changes:
FixedBitArray: Add bulkGet64WithBaseline() for 64-bit output with arbitrary bitWidth. Three code paths by bit width:
- bitWidth <= 32: delegates to the optimized template-unrolled 32-bit path (bulkGetWithBaseline32Into64).
- bitWidth 33-57: branchless byte-aligned loads — since the sub-byte offset is at most 7, bitWidth + remainder <= 57 + 7 = 64, so each value fits in a single 64-bit load with no cross-word boundary branch. This eliminates the branch in the hot loop and enables better instruction-level parallelism.
- bitWidth > 57: falls back to per-element get() for cross-word handling.
FixedBitWidthEncoding: Extend the selective reader fast path (bulkScan + readWithVisitorFast) from 4-byte-only to also support 8-byte integral types (int64/uint64). Previously, 64-bit columns always used the slow per-element path.
Legacy FixedBitWidthEncoding: Updated materialize() to use bulkGet64WithBaseline for 8-byte types.
Differential Revision: D99154749
0 commit comments