feat: block splitting for improved compression ratio

## Summary

The C reference can split blocks to use different entropy tables for different sections of data. This improves compression ratio when data characteristics change within a block (e.g., text followed by binary data).

## C reference implementation

### Superblock (zstd_compress_superblock.c)
- Analyzes symbol distribution within a block
- Splits into sub-blocks with independently optimized entropy tables
- Each sub-block gets its own Huffman/FSE tables tuned to its content

### Pre-split (zstd_preSplit.c)
- Pre-splitting for parallelization and better entropy adaptation
- Detects content boundaries before compression

### Literal compression decisions (zstd_compress_literals.c)
- **RLE detection**: `allBytesIdentical()` for repeated bytes
- **Min threshold**: btultra2 ≥6 bytes, dfast ≥256 bytes
- **Stream count**: <256 bytes → single stream, ≥256 → 4 streams

## Current Rust state

- Single block = single set of entropy tables
- No content-aware splitting
- No superblock optimization

## What needs to be implemented

1. **Content analysis** — detect entropy changes within block
2. **Split point detection** — find optimal boundaries for table changes
3. **Per-sub-block encoding** — independent Huffman/FSE tables per section
4. **Threshold tuning** — when to split vs. keep unified

## Acceptance criteria
- [ ] Blocks with mixed content get split appropriately
- [ ] Compression ratio improves on heterogeneous data
- [ ] Valid output decompressible by C zstd
- [ ] No regression on homogeneous data

## Dependencies
- #5 (Default level) minimum for meaningful testing

## Time estimate
3d

## Blocked by
- #5 (Default level) — need working compression for meaningful block splitting


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: block splitting for improved compression ratio #23

Summary

C reference implementation

Superblock (zstd_compress_superblock.c)

Pre-split (zstd_preSplit.c)

Literal compression decisions (zstd_compress_literals.c)

Current Rust state

What needs to be implemented

Acceptance criteria

Dependencies

Time estimate

Blocked by

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: block splitting for improved compression ratio #23

Description

Summary

C reference implementation

Superblock (zstd_compress_superblock.c)

Pre-split (zstd_preSplit.c)

Literal compression decisions (zstd_compress_literals.c)

Current Rust state

What needs to be implemented

Acceptance criteria

Dependencies

Time estimate

Blocked by

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions