KarpelesLab · MagicalTux · Jun 14, 2026 · Jun 14, 2026 · Jun 14, 2026 · Jun 14, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,37 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+### Added
+
+- **Raw LZMA2 encoder** (`lzma2`): `compcol::lzma2::Lzma2` now encodes as well
+  as decodes — it emits the raw 7-Zip LZMA2 chunk stream (full dict/props/state
+  reset per chunk, uncompressed-chunk fallback when compression would expand,
+  `0x00` end marker), reusing the xz LZMA2 chunk codec. The dictionary size is
+  out of band (the 7z coder property); the encoder uses the 4 MiB default so a
+  default-config decoder round-trips. Validated by round-trip and by decoding
+  the output through the shared xz LZMA2 codec.
+- **LZFSE `bvx2` decoding** (`lzfse`): the core LZFSE v2 block type (LZ77 +
+  Finite State Entropy) now decodes — full v2 header parse, 4-way interleaved
+  literal FSE, three interleaved L/M/D FSE streams (reverse bitstreams), and LZ
+  reconstruction. The FSE table construction matches Apple's general
+  `fse_init_decoder_table` (the `k`/`k-1` split), so arbitrary frequency tables
+  are handled, not just power-of-two ones. Validated by round-trip against an
+  in-crate v2 encoder plus a frozen hand-written non-dyadic vector; there is no
+  Apple `lzfse` tool in the build environment, so real-stream interop is
+  best-effort but follows the documented format precisely. `bvx1` (v1) remains
+  `Unsupported`.
+
+### Changed
+
+- **lz5 (Lizard) Huffman sub-streams** stay `Unsupported`, now with a precise
+  rationale in the module docs: the Huff0 entropy stage selects X1/X2 from
+  `(regenSize, comprLen)` at runtime and there is no reference encoder or
+  fixture available to validate a decoder bit-exactly, so — consistent with the
+  crate's `lzham`/`sit13` policy — it is left honest rather than shipped blind.
+  The docs record the concrete reuse path (zstd's X1 Huff0 decoder + an X2
+  decoder + the `HUF_selectDecoder` heuristic) for a future round with fixtures.
+
+
 ### Added
 
 - **HTTP/3 QPACK header compression** (RFC 9204) behind the new `qpack`

diff --git a/README.md b/README.md
@@ -47,14 +47,14 @@ flag, and a `compcol` binary turns the library into a Unix-style filter.
 | LZW (`compress(1)` `.Z`) | `lzw` | `.lzw` | full | full | `compress(1)` / `uncompress(1)` |
 | LZMA (legacy `.lzma`) | `lzma` | `.lzma` | full | full | `python3 -m lzma` (FORMAT_ALONE) |
 | xz | `xz` | `.xz` | compressed-LZMA2 chunks + uncompressed fallback | full envelope + all reset variants | `xz(1)` both directions |
-| Raw LZMA2 (7z coder 21) | `lzma2` | `.lzma2` | `Unsupported` (decode-only) | full (raw LZMA2 chunk stream; reuses the xz LZMA2 engine) | round-trip vs the xz LZMA2 encoder |
+| Raw LZMA2 (7z coder 21) | `lzma2` | `.lzma2` | full (raw LZMA2 chunk stream; reuses the xz LZMA2 engine) | full (raw LZMA2 chunk stream; reuses the xz LZMA2 engine) | round-trip + cross-decode via the shared xz LZMA2 codec |
 | Zstandard (RFC 8478) | `zstd` | `.zst` | LZ77 + Huffman literals + FSE_Compressed_Mode sequences + repeat offsets + RLE blocks | full Compressed_Block | `zstd(1)` both directions |
 | Brotli (RFC 7932) | `brotli` | `.br` | LZ77 + length-limited Huffman + 704-symbol IC alphabet + static-dictionary refs | full (with 122 KiB static dictionary) | `brotli(1)` both directions |
 | LZO (LZO1X-1) | `lzo` | `.lzo` | LZ77 hash matcher | full | `python3 -c "import lzo"` |
 | LZX (Microsoft CAB / WIM) | `lzx` | `.lzx` | uncompressed blocks only | full (verbatim + aligned-offset + uncompressed; E8 filter) | — |
 | Amiga LZX (original 1995 Forbes) | `amiga_lzx` | — (`.lzx` claimed by MS LZX) | uncompressed blocks only | full (verbatim + aligned + uncompressed; fixed 64 KiB window, no chunk reset, no E8 filter) | — |
 | Quantum (Stac, old CAB) | `quantum` | `.q` | `Unsupported` (no public encoder exists) | full (libmspack-equivalent) | libmspack regression fixtures |
-| LZFSE (Apple) | `lzfse` | `.lzfse` | `Unsupported` (decoder-only) | `bvx-` raw + `bvxn` (LZVN); `bvx2` returns `Unsupported` | hand-built fixtures (no Apple toolchain bundled) |
+| LZFSE (Apple) | `lzfse` | `.lzfse` | `Unsupported` (decoder-only) | `bvx-` raw + `bvxn` (LZVN) + `bvx2` (LZ77 + FSE); `bvx1` returns `Unsupported` | round-trip (bvx2 vs own FSE encoder; no Apple toolchain bundled) |
 | ADC (Apple DMG) | `adc` | `.adc` | LZSS-style greedy match-finder | full | hand-built fixtures |
 | bzip2 | `bzip2` | `.bz2` | full (RLE-1 + SA-IS BWT + MTF + RLE-2 + dynamic Huffman) | full | `bzip2(1)` both directions |
 | PPMd (Shkarin's PPMII variant H) | `ppmd` | `.ppmd` | `Unsupported` (decoder-only; PPM model is intricate) | full (used in 7z / RAR3+ / ZIP method 98) | `python3 ppmd-cffi` |
@@ -427,7 +427,7 @@ lzw     = ["alloc"]
 lzo     = ["alloc"]
 lzx     = ["alloc"]
 quantum = ["alloc"]
-lzfse   = ["alloc"]            # decoder-only, bvx2 returns Unsupported
+lzfse   = ["alloc"]            # decoder-only; bvx-/bvxn/bvx2, bvx1 Unsupported
 adc     = ["alloc"]
 rar1    = ["alloc"]
 rar2    = ["alloc"]

diff --git a/src/lz5/block.rs b/src/lz5/block.rs
@@ -10,6 +10,19 @@
 //! Only the LZ4-codeword sequence loop (levels 10..=19, 30..=39) with
 //! all sub-streams stored raw (no Huffman entropy stage) is
 //! implemented; everything else returns [`Error::Unsupported`].
+//!
+//! Two paths stay `Unsupported` for documented, validation-driven
+//! reasons (see the inline comments at the `huffman_bits` and LIZv1
+//! rejections below):
+//!
+//!  * **Huff0 entropy stage** (any sub-stream flag bit set): Lizard's
+//!    generic `HUF_decompress` recomputes an X1-vs-X2 decoder choice
+//!    that is never carried in the stream; the crate has only an X1
+//!    Huff0 decoder (private to `zstd`), and there is no `lizard` CLI
+//!    or fixture here to validate an X2 decoder against. A round-trip
+//!    against our own X1-only encoder would prove nothing.
+//!  * **LIZv1 codewords** (levels 20..=29, 40..=49): a separate, larger
+//!    sequence format, out of scope for this round.
 
 use alloc::vec::Vec;
 
@@ -61,6 +74,12 @@ pub fn decode_compressed_block(input: &[u8], out: &mut Vec<u8>, cap: usize) -> R
     // Lizard groups levels by decompression strategy:
     //   10..=19, 30..=39  →  LZ4 codewords (this build supports)
     //   20..=29, 40..=49  →  LIZv1 codewords (not supported)
+    //
+    // LIZv1 is a distinct, larger sequence format (`Lizard_decompress_LIZv1`
+    // vs `Lizard_decompress_LZ4` in the reference): different token layout,
+    // explicit `lengths`/`offset16`/`offset24` streams, and a 24-bit offset
+    // path. Implementing it is a separate effort from the Huffman stage and
+    // is out of scope for this round, so it stays `Unsupported`.
     let is_lz4_mode = matches!(clevel, 10..=19 | 30..=39);
     if !is_lz4_mode {
         return Err(Error::Unsupported);
@@ -96,8 +115,38 @@ pub fn decode_compressed_block(input: &[u8], out: &mut Vec<u8>, cap: usize) -> R
     if res & FLAG_LEN != 0 {
         return Err(Error::Corrupt);
     }
-    // Any Huffman bit set on a sub-stream means we'd need to FSE-Huffman
-    // decode that stream. Out of scope.
+    // Any Huffman bit set on a sub-stream means the stream is entropy-coded
+    // with Huff0 (Yann Collet's FiniteStateEntropy library) and must be
+    // `HUF_decompress`'d before the sequence loop runs. Each such sub-stream
+    // is framed as a 6-byte header (3-byte LE regenerated size + 3-byte LE
+    // compressed size) followed by `compressed_size` bytes of Huff0 payload
+    // (`Lizard_readStream` → `HUF_decompress(op, regenSize, ip + 6, comprLen)`).
+    //
+    // This stays `Unsupported`. The decision is deliberate, not a TODO —
+    // there is no faithful way to *validate* such a decoder in this
+    // environment, and the crate's policy (see `lzham`, `sit13`) is to mark
+    // formats we cannot validate bit-exactly as `Unsupported` rather than
+    // ship a blind decoder. Concretely:
+    //
+    //   * The crate already has a Huff0 decoder in `src/zstd/huffman.rs`, but
+    //     it is (a) private to the `zstd` module (`mod huffman;`, not
+    //     reachable from here without re-exporting it) and (b) implements
+    //     only the **X1** (single-symbol) decode table that zstd's *literals*
+    //     spec restricts itself to.
+    //   * Lizard calls the *generic* `HUF_decompress`, which selects **X1 or
+    //     X2** (double-symbol) at runtime via `HUF_selectDecoder`. That
+    //     choice is **recomputed from (regenSize, comprLen)** and is **never
+    //     stored in the stream**, so a conformant decoder must implement both
+    //     X1 and X2 *and* reproduce `HUF_selectDecoder`'s timing heuristic
+    //     exactly. The crate has no X2 decoder anywhere. (The 4-stream jump
+    //     table — three LE u16 sizes — does match zstd's literals framing, so
+    //     that part would be reusable; the X1/X2 split is the blocker.)
+    //   * The lz5 encoder here is store-only, and there is no `lizard` CLI or
+    //     Huff0 fixture in this environment. A round-trip against a
+    //     hand-written X1-only encoder would always select X1 and "pass"
+    //     while proving nothing about a real (possibly X2) Lizard block — a
+    //     self-validating fiction. Absent a real fixture or reference
+    //     encoder there is no honest round-trip, so we do not ship.
     let huffman_bits = res & (FLAG_LITERALS | FLAG_FLAGS | FLAG_OFFSET16 | FLAG_OFFSET24);
     if huffman_bits != 0 {
         return Err(Error::Unsupported);

diff --git a/src/lz5/mod.rs b/src/lz5/mod.rs
@@ -29,9 +29,34 @@
 //! **Decoder**: implemented for the **LZ4 codeword path with all
 //! sub-streams stored raw** (the most common shape produced by the
 //! reference CLI at levels 10..=19 on non-tiny inputs). Frames whose
-//! blocks use the LIZv1 sequence format (levels 20..=29) or any
-//! Huffman-coded sub-stream (levels 30+) are rejected with
-//! [`Error::Unsupported`]. The frame-level uncompressed block path
+//! blocks use the LIZv1 sequence format (levels 20..=29, 40..=49) or any
+//! Huffman-coded sub-stream are rejected with [`Error::Unsupported`].
+//!
+//! The Huffman path stays `Unsupported` for a concrete, validation-first
+//! reason rather than mere absence of effort. Lizard's entropy stage is
+//! Huff0 (`HUF_decompress` from Yann Collet's FiniteStateEntropy), the
+//! same family as zstd's literals Huffman, and each Huffman sub-stream is
+//! framed as a 6-byte header (3-byte LE regenerated size + 3-byte LE
+//! compressed size) then the Huff0 payload. But the *generic*
+//! `HUF_decompress` Lizard calls selects between **X1** (single-symbol)
+//! and **X2** (double-symbol) decode tables via `HUF_selectDecoder`, and
+//! that choice is **recomputed from the regenerated/compressed sizes,
+//! never stored in the stream**. This crate's Huff0 decoder
+//! (`src/zstd/huffman.rs`) is X1-only and is private to the `zstd`
+//! module; it covers neither X2 nor the size-driven selector. With no
+//! `lizard` CLI and no Huff0 fixtures in this environment, the only
+//! "test" available would be a round-trip against a hand-written
+//! X1-only encoder, which would always pick X1 and therefore validate
+//! nothing about real (possibly X2) blocks. Per the crate's
+//! `lzham`/`sit13` policy, an unvalidatable decoder is worse than an
+//! honest `Unsupported`, so we do not ship one.
+//!
+//! A future round could lift this once validation is possible: expose
+//! zstd's X1 Huff0 decoder as `pub(crate)`, add an X2 decoder plus the
+//! `HUF_selectDecoder` heuristic, and validate against fixtures from the
+//! `lizard` CLI (e.g. `lizard -30`). The 6-byte sub-stream header and the
+//! 4-stream jump table (three LE u16 sizes) already match formats this
+//! crate parses elsewhere. The frame-level uncompressed block path
 //! (high bit on block-size word) is handled fully, so frames where
 //! every block stored raw decode without ever exercising the sequence
 //! loop. Block checksums (FLG bit 4) and external dictionaries are

diff --git a/src/lzfse/decoder.rs b/src/lzfse/decoder.rs
@@ -59,10 +59,12 @@ enum State {
 enum BlockKind {
     Uncompressed,
     Lzvn,
-    /// `bvx2` returns Unsupported once we've parsed its header far enough
-    /// to know we hit it; this variant exists so the state machine can
-    /// surface that decision uniformly with the other block kinds.
+    /// `bvx2` (LZFSE v2): FSE + LZ77. Decoded by [`lzfse_v2::decode_block`]
+    /// once the whole block (variable-length header + both payload streams)
+    /// is buffered.
     V2,
+    /// `bvx1` (LZFSE v1, uncompressed-freq variant): not emitted by modern
+    /// encoders; returns [`Error::Unsupported`].
     V1,
 }
 
@@ -216,23 +218,56 @@ impl Decoder {
                         };
                     }
                     BlockKind::V2 => {
-                        // We don't decode v2 in this build, but we need to
-                        // skip past the block cleanly so callers don't
-                        // confuse "block we can't decode" with "garbage".
-                        // Parse the n_payload_bytes field from the header.
-                        if self.input_buf.len() < lzfse_v2::V2_HEADER_FIXED_BYTES {
+                        // The v2 header is variable-length (FSE frequency
+                        // tables follow the fixed packed fields). Buffer the
+                        // fixed 28 bytes (post-magic: n_raw + three u64 words)
+                        // first so we can read `header_size` and the payload
+                        // sizes, then arrange to buffer the whole block (header
+                        // + payload) before decoding it in one shot.
+                        let fixed = lzfse_v2::V2_HEADER_FIXED_BYTES;
+                        if self.input_buf.len() < fixed {
                             return Ok(RawProgress {
                                 consumed,
                                 written,
                                 done: false,
                             });
                         }
-                        // We *could* skip past the v2 block, but the spec is
-                        // explicit that the encoder may mix block types
-                        // freely. Returning Unsupported here is the
-                        // documented behaviour for v2 in this build.
-                        self.poisoned = true;
-                        return Err(Error::Unsupported);
+                        let header_size = match lzfse_v2::parse_header_size(&self.input_buf) {
+                            Ok(h) => h as usize,
+                            Err(e) => {
+                                self.poisoned = true;
+                                return Err(e);
+                            }
+                        };
+                        let n_payload = match lzfse_v2::parse_payload_size(&self.input_buf) {
+                            Ok(n) => n as usize,
+                            Err(e) => {
+                                self.poisoned = true;
+                                return Err(e);
+                            }
+                        };
+                        // `header_size` includes the 4-byte magic we already
+                        // dropped; remaining block bytes after the magic are
+                        // `header_size - 4 + n_payload`.
+                        let header_len = match header_size.checked_sub(4) {
+                            Some(h) if h >= fixed => h,
+                            _ => {
+                                self.poisoned = true;
+                                return Err(Error::Corrupt);
+                            }
+                        };
+                        let block_len = match header_len.checked_add(n_payload) {
+                            Some(b) => b,
+                            None => {
+                                self.poisoned = true;
+                                return Err(Error::Corrupt);
+                            }
+                        };
+                        self.state = State::AwaitPayload {
+                            kind: BlockKind::V2,
+                            payload_len: block_len,
+                            decoded_size: 0,
+                        };
                     }
                     BlockKind::V1 => {
                         self.poisoned = true;
@@ -287,7 +322,33 @@ impl Decoder {
                             self.input_buf.drain(..payload_len);
                             self.state = State::AwaitMagic;
                         }
-                        BlockKind::V2 | BlockKind::V1 => {
+                        BlockKind::V2 => {
+                            // The whole block (header + both payload streams)
+                            // is now buffered in `payload_len` bytes. Decode in
+                            // one shot. Bound the up-front output reservation by
+                            // a payload-derived hint (an FSE block can expand
+                            // more than LZVN, but is still bounded; the decoder
+                            // enforces the exact `n_raw_bytes` internally).
+                            let cap_hint = payload_len.saturating_mul(32).saturating_add(1 << 16);
+                            let mut block_out = Vec::new();
+                            match lzfse_v2::decode_block(
+                                &self.input_buf[..payload_len],
+                                &mut block_out,
+                                cap_hint,
+                            ) {
+                                Ok(consumed_block) => {
+                                    debug_assert_eq!(consumed_block, payload_len);
+                                }
+                                Err(e) => {
+                                    self.poisoned = true;
+                                    return Err(e);
+                                }
+                            }
+                            self.output_buf.append(&mut block_out);
+                            self.input_buf.drain(..payload_len);
+                            self.state = State::AwaitMagic;
+                        }
+                        BlockKind::V1 => {
                             // Unreachable — header step would have errored.
                             self.poisoned = true;
                             return Err(Error::Unsupported);