Conversation
Implements the FSST12 codec described in the FastLanes File Format paper: 12-bit codes (4096 entries), the first 256 codes reserved as single-byte identity codes, no escape mechanism, bit-packed output at 1.5 bytes per code. Closes #142. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: mprammer <martin@spiraldb.com>
Merging this PR will not alter performance
Performance Changes
Comparing |
FSST12 originates in the cwida/fsst reference implementation by the original FSST authors. The FastLanes paper only mentions it in passing. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: mprammer <martin@spiraldb.com>
…nces Move the FSST12 generations schedule into a private GENERATIONS12 constant in src/fsst12/builder.rs so it can be tuned independently of the classic 8-bit codec, and update the PHT-size comment with the same rationale. Sweep on dbtext (wikipedia / l_comment / urls) and tests/fixtures showed that adopting cwida/fsst's FSST12-specific knobs ([14,52,90,128] schedule and 1<<14 PHT) trades 5-10% worse text compression for ~3% better URL compression. Keep our values, document the divergence. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: mprammer <martin@spiraldb.com>
The link pointed at a docs.rs page that does not yet exist (crate is still at 0.0.0 and FSST12 has not been released), and the README is embedded in the crate's rustdoc, where docs.rs is a roundabout target anyway. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: mprammer <martin@spiraldb.com>
Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: mprammer <martin@spiraldb.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds the FSST12 codec as a sibling module to the classic 8-bit codec, matching the variant defined in the cwida/fsst reference implementation: 12-bit codes (4096 entries), first 256 reserved as single-byte identity codes, no escape mechanism, output bit-packed at 1.5 bytes per code. Closes #142. Public surface is
fsst::fsst12::{Compressor12, CompressorBuilder12, Decompressor12}mirroring the classic API; the classic 8-bit codec is unchanged.🤖 Generated with Claude Code