You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: Add kCompactRaw raw encoding support and simplify deserialization API (facebookincubator#633)
Summary:
CONTEXT: The nimble serializer/deserializer needed several improvements:
(1) Stream sizes in the kCompactRaw trailer only supported Trivial and Varint
encoding. Stream sizes tend to be similar across streams, making delta encoding
a natural fit. However, in practice delta-varint provides marginal savings over
plain varint for stream sizes since the values vary enough that deltas don't
compress much better.
(2) DeserializerOptions required callers to pass a full SerializationVersion,
but the deserializer auto-detects version from the header byte — only a bool
flag is needed.
(3) StreamData took two separate bool flags (encodingEnabled, useVarintRowCount)
instead of deriving them from the version.
(4) Several helper functions were missing for version classification.
WHAT:
- Add Zigzag.h utility (branchless zigzag encode/decode for int32/int64) and
Delta encoding support for kCompactRaw stream sizes. Delta encoding stores
the first value as-is, then zigzag+varint encodes consecutive deltas.
Wire: [0x08][count:varint][first:varint][zigzag(delta_1):varint]...[trailer_size:u32]
- Simplify DeserializerOptions: replace `std::optional<SerializationVersion> version`
with `bool hasHeader`. When true, auto-detect version from the first byte of
serialized data. When false, assume kLegacy (no version header).
- Simplify StreamData: take SerializationVersion instead of two bool flags
(encodingEnabled, useVarintRowCount), derive both internally.
- Add version helper functions in Options.h:
- `nonLegacyFormat()`: returns true for kCompact/kCompactRaw/kTabletRaw
- `isRawFormat()`: returns true for kCompactRaw/kTabletRaw
- `isTabletRawFormat()`: returns true for kTabletRaw only
- Add validation in Serializer constructor: reject explicit kLegacy and kTabletRaw
versions, validate streamSizesEncodingType requires non-legacy version.
- Extract `StreamDataReader::readVersion()` method for version header parsing.
- Add fuzz tests for mixed-version serialization and projection (cycles through
kCompact, kCompactRaw, kCompactRaw+Delta) with decode verification.
- SST writer benchmark: remove sst_partitioned_index flag, change
nimble_stream_sizes_encoding default from Trivial to Delta.
Reviewed By: srsuryadev, jiahaol-work, tanjialiang
Differential Revision: D98941222
0 commit comments