Skip to content

feat: add FSST12 (12-bit-code) variant#216

Draft
mprammer wants to merge 5 commits into
developfrom
mp/fsst12
Draft

feat: add FSST12 (12-bit-code) variant#216
mprammer wants to merge 5 commits into
developfrom
mp/fsst12

Conversation

@mprammer
Copy link
Copy Markdown

@mprammer mprammer commented May 12, 2026

Adds the FSST12 codec as a sibling module to the classic 8-bit codec, matching the variant defined in the cwida/fsst reference implementation: 12-bit codes (4096 entries), first 256 reserved as single-byte identity codes, no escape mechanism, output bit-packed at 1.5 bytes per code. Closes #142. Public surface is fsst::fsst12::{Compressor12, CompressorBuilder12, Decompressor12} mirroring the classic API; the classic 8-bit codec is unchanged.

🤖 Generated with Claude Code

Implements the FSST12 codec described in the FastLanes File Format
paper: 12-bit codes (4096 entries), the first 256 codes reserved as
single-byte identity codes, no escape mechanism, bit-packed output at
1.5 bytes per code. Closes #142.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: mprammer <martin@spiraldb.com>
@mprammer mprammer marked this pull request as draft May 12, 2026 20:33
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 12, 2026

Merging this PR will not alter performance

✅ 25 untouched benchmarks
🆕 17 new benchmarks

Performance Changes

Mode Benchmark BASE HEAD Efficiency
🆕 Simulation compress N/A 2.2 ms N/A
🆕 Simulation decompress N/A 1.1 ms N/A
🆕 Simulation compress N/A 10.8 ms N/A
🆕 Simulation decompress N/A 5.6 ms N/A
🆕 Simulation 1mb-abcdefgh N/A 10.1 ms N/A
🆕 Simulation train-and-compress N/A 22.3 ms N/A
🆕 Simulation compress-only N/A 31.3 ms N/A
🆕 Simulation decompress N/A 15 ms N/A
🆕 Simulation train-and-compress N/A 41.5 ms N/A
🆕 Simulation compress-only N/A 17.5 ms N/A
🆕 Simulation decompress N/A 8.2 ms N/A
🆕 Simulation train-and-compress N/A 29.9 ms N/A
🆕 Simulation compress-only N/A 3.3 µs N/A
🆕 Simulation decompress N/A 2.3 µs N/A
🆕 Simulation train-and-compress N/A 7.7 ms N/A
🆕 Simulation compress-only N/A 11.9 ms N/A
🆕 Simulation decompress N/A 5.6 ms N/A

Comparing mp/fsst12 (17bfe80) with develop (76baf7a)

Open in CodSpeed

mprammer and others added 4 commits May 12, 2026 16:35
FSST12 originates in the cwida/fsst reference implementation by the
original FSST authors. The FastLanes paper only mentions it in passing.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: mprammer <martin@spiraldb.com>
…nces

Move the FSST12 generations schedule into a private GENERATIONS12 constant
in src/fsst12/builder.rs so it can be tuned independently of the classic
8-bit codec, and update the PHT-size comment with the same rationale.

Sweep on dbtext (wikipedia / l_comment / urls) and tests/fixtures showed
that adopting cwida/fsst's FSST12-specific knobs ([14,52,90,128] schedule
and 1<<14 PHT) trades 5-10% worse text compression for ~3% better URL
compression. Keep our values, document the divergence.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: mprammer <martin@spiraldb.com>
The link pointed at a docs.rs page that does not yet exist (crate is still
at 0.0.0 and FSST12 has not been released), and the README is embedded in
the crate's rustdoc, where docs.rs is a roundabout target anyway.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: mprammer <martin@spiraldb.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: mprammer <martin@spiraldb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consider FSST12 (12 bit codes, avoid escape character)

1 participant