perf(lzma2/xz): linear-time match finder + faster optimal parse by MagicalTux · Pull Request #105 · KarpelesLab/compcol

MagicalTux · 2026-06-30T08:53:30Z

Problem

The xz/lzma2 optimal-parse encoder used a fixed 64 Ki-bucket hash head table. For incompressible or low-match input, distinct 3-byte prefixes collided into the same buckets as the input grew, so per-bucket chains lengthened with length and every probe walked O(n/table) work — encode was effectively O(n²) until the max_chain cap engaged. xz encode of 4 MiB of random data took ~6.7 s and kept worsening; native xz handles it in constant time.

Changes (compressed output is byte-for-byte unchanged at every level)

Linear match finder — size the hash head table to the match-finder window (a Vec + head_mask), like liblzma sizes its hash to the dictionary, so average chain length stays O(1). hash3 now returns a full 32-bit mix masked per-probe.
Length-price cache — cache length-symbol prices per pos_state, refreshed every 128 committed decisions, instead of an 8-bit bittree walk per length per position.
Distance-price hoist — the new-match distance price depends on length only through the dist-state bucket (saturates at DIST_STATES), so it's recomputed only when that bucket changes (one call for the common length≥5 band).
Word-at-a-time match comparison — match_len_at compares 8 bytes per step via LE u64 + trailing_zeros.

Results

Deterministic instruction counts (2 MiB, before → after): text 3.1×, all-zeros 4.0×, mixed source code 1.6× fewer instructions. Random encode is now linear and ~1.1× faster than native xz -6; realistic source code is ~0.85–0.9× native (was far slower).

Verification

Full suite: 61/61 test binaries pass.
60-case randomized roundtrip fuzz (sizes incl. 64 KiB chunk boundaries, levels 0–9) decoded by both native xz and our own decoder.
Cross-checked output decodes under native xz for text/zeros/random/source.

🤖 Generated with Claude Code

The xz/lzma2 optimal-parse encoder had a fixed 64 Ki-bucket hash head table, so for incompressible or low-match input the per-bucket chains lengthened with the input and every probe walked work that scaled with length — encoding 4 MiB of random data took ~6.7 s and kept worsening. Size the hash head table to the match-finder window (as liblzma sizes its hash to the dictionary) so chains stay O(1) and encode is linear. Then cut constant factors in the optimal parser: cache length-symbol prices per pos_state (refreshed periodically instead of an 8-bit bittree walk per length per position), compute the new-match distance price once per dist-state band, and compare match bytes eight at a time. Deterministic instruction counts (2 MiB, before -> after): ~3x fewer on text, ~4x on long-run data, ~1.6x on mixed source code; random encode is now linear and ~1.1x faster than native xz. Compressed output is byte-for-byte unchanged at every level. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

MagicalTux force-pushed the perf/lzma2-xz-encoder branch from 8bca525 to 6093af7 Compare June 30, 2026 08:55

MagicalTux merged commit 47e11cf into master Jun 30, 2026
42 checks passed

MagicalTux deleted the perf/lzma2-xz-encoder branch June 30, 2026 08:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(lzma2/xz): linear-time match finder + faster optimal parse#105

perf(lzma2/xz): linear-time match finder + faster optimal parse#105
MagicalTux merged 1 commit into
masterfrom
perf/lzma2-xz-encoder

MagicalTux commented Jun 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

MagicalTux commented Jun 30, 2026

Problem

Changes (compressed output is byte-for-byte unchanged at every level)

Results

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant