Summary
The highest compression ratios in zstd come from optimal parsing strategies (levels 16-22). These use price-based optimal parsing with binary trees — Dijkstra-style shortest path through a match/literal graph.
C reference implementation (zstd_opt.c)
Algorithm
- Build match graph — at each position, find all possible matches (btree) and literals
- Price calculation — each edge (match or literal) has a bit cost based on entropy statistics
- Optimal path — find minimum-cost path through the graph (dynamic programming)
- Adaptive statistics — frequency tables updated as encoding progresses
Key structures
optState_t: Frequency tables, match storage, price cache
ZSTD_optimal_t: Per-position state (price, offset, match_len, literal_len)
BITCOST_MULTIPLIER = 256: Fractional bit precision (fixed-point)
Strategies
- btopt (levels 16-17): Optimal parsing with moderate search depth
- btultra (levels 18-19): Full search, larger match graph
- btultra2 (levels 20-22): Ultra with additional second pass for parameter tuning
What needs to be implemented
- Price calculation — fixed-point bit cost based on symbol frequency
- Match graph construction — all matches at each position via binary tree
- Dynamic programming solver — minimum cost path selection
- Adaptive statistics — rescale frequency tables to prevent overflow
- btopt/btultra/btultra2 variants — different search depths and pass counts
Acceptance criteria
Dependencies
Time estimate
5d
Blocked by
Summary
The highest compression ratios in zstd come from optimal parsing strategies (levels 16-22). These use price-based optimal parsing with binary trees — Dijkstra-style shortest path through a match/literal graph.
C reference implementation (zstd_opt.c)
Algorithm
Key structures
optState_t: Frequency tables, match storage, price cacheZSTD_optimal_t: Per-position state (price, offset, match_len, literal_len)BITCOST_MULTIPLIER = 256: Fractional bit precision (fixed-point)Strategies
What needs to be implemented
Acceptance criteria
Dependencies
Time estimate
5d
Blocked by