Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,21 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added

- **HTTP/2 HPACK header compression** (RFC 7541) behind the new `hpack`
feature. `compcol::hpack::{HpackEncoder, HpackDecoder}` implement the full
header codec — static + dynamic indexing tables, N-bit-prefix integers,
string literals, and all field representations (indexed, literal
with/without indexing, never-indexed, dynamic-table size update). Validated
byte-for-byte against the RFC 7541 Appendix C worked examples. The §5.2
string Huffman primitive is also exposed as the `Http2Huffman` codec
(name `h2-huffman`) through the uniform `Encoder`/`Decoder` traits.
- **LHA `-lh2-`** added to the `lha` feature: 8 KiB-window LZSS with adaptive
(dynamic) Huffman for both literals/lengths and match positions. Like `lh1`
it is continuous and size-terminated, so its decoder takes the uncompressed
length via `DecoderConfig::with_len`. Clean-room, round-trip validated.

## [0.6.0](https://github.com/KarpelesLab/compcol/compare/v0.5.1...v0.6.0) - 2026-06-03

### Other
Expand Down
7 changes: 7 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ all = [
"lha",
"bcj", "bcj2", "delta",
"arc_crunch", "arc_squeeze", "arc_squash",
"hpack",
]
# Enables `alloc`-backed conveniences (e.g. the `factory` module, the
# `compcol::vec` one-shot helpers). Pulled in automatically by features
Expand Down Expand Up @@ -239,6 +240,12 @@ arc_squeeze = ["alloc"]
# block-mode CLEAR code and no header byte (no RLE pre-pass). Encoder and
# decoder both implemented and validated by round-trip.
arc_squash = ["alloc"]
# HTTP/2 HPACK header compression (RFC 7541): static + dynamic indexing
# tables, integer/string coding, and the §5.2 string Huffman code. The
# Huffman primitive is also exposed as the `Http2Huffman` codec
# (name `"h2-huffman"`). The full header codec lives behind its own
# `compcol::hpack` API (it is stateful over header lists, not a byte stream).
hpack = ["alloc"]
# `compcol::tokio_io` — async mirrors of compcol::io for the tokio
# runtime. Pulls the tokio dependency for its AsyncRead/AsyncWrite
# trait definitions; the rest of the crate stays dep-free.
Expand Down
9 changes: 8 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ flag, and a `compcol` binary turns the library into a Unix-style filter.
| Microsoft Xpress (plain LZ77) | `xpress` | `.xpress` | full | full (per [MS-XCA] §2.2) | hand-built fixtures |
| Microsoft Xpress Huffman | `xpress_huffman` | `.xph` | full (LZ77 + canonical Huffman) | full (per [MS-XCA] §2.1; used in WIM / CompactOS NTFS) | hand-built fixtures |
| LZNT1 (NTFS native compression) | `lznt1` | `.lznt1` | full | full (per [MS-XCA] §2.5; 4 KiB-chunked LZ77, no entropy coding) | hand-built fixtures |
| LHA / LZH (`-lh1-`/`-lh4-`/`-lh5-`/`-lh6-`/`-lh7-`) | `lha` | `.lzh` | full (lh1 adaptive Huffman; lh4/5/6/7 static Huffman) | full (clean-room from Okumura LZHUF / ar002) | own round-trip (no reference fixture) |
| LHA / LZH (`-lh1-`/`-lh2-`/`-lh4-`/`-lh5-`/`-lh6-`/`-lh7-`) | `lha` | `.lzh` | full (lh1/lh2 adaptive Huffman; lh4/5/6/7 static Huffman) | full (clean-room from Okumura LZHUF / ar002) | own round-trip (no reference fixture) |
| BCJ branch filters (x86, ARM, ARMT, ARM64, PPC, SPARC, IA-64, RISC-V) | `bcj` | `bcj-<arch>` | full (reversible filter) | full | round-trip identity (public-domain LZMA SDK transform) |
| BCJ2 (7z 4-stream x86 filter) | `bcj2` | — | `bcj2::encode` (fn API) | `bcj2::decode` (fn API) | round-trip identity (LZMA SDK algorithm) |
| Delta filter (distance 1..=256) | `delta` | `delta` | full (reversible filter) | full | round-trip identity |
Expand All @@ -76,6 +76,13 @@ flag, and a `compcol` binary turns the library into a Unix-style filter.
| RAR 2.x | `rar2` | `.rar` | `Unsupported` (license) | full LZ77+Huffman + audio predictor | real rar-2.60 fixtures |
| RAR 3.x | `rar3` | `.rar` | `Unsupported` (license) | full LZ77+Huffman + E8 filter; PPMd & VM filters refused | libarchive RAR3 fixtures |
| RAR 5.x | `rar5` | `.rar` | `Unsupported` (license) | full LZ77+Huffman + x86 filter; Delta/ARM refused | RARLAB-CLI fixtures |
| HTTP/2 HPACK (RFC 7541) | `hpack` | — | full (header codec + `h2-huffman` string codec) | full (static+dynamic tables, integer/string coding) | RFC 7541 Appendix C vectors |

HPACK is HTTP/2's header-compression codec, not a byte-stream codec: it
operates on `(name, value)` header lists with per-connection dynamic-table
state, so it lives behind its own `compcol::hpack` API (`HpackEncoder` /
`HpackDecoder`). The §5.2 string Huffman primitive is also exposed as the
`Http2Huffman` codec (name `h2-huffman`) through the uniform trait surface.

The RAR encoders are permanently `Unsupported` per RARLAB's unRAR
license terms (every clean-room RAR reader — libarchive, The
Expand Down
7 changes: 7 additions & 0 deletions fuzz/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,13 @@ test = false
doc = false
bench = false

[[bin]]
name = "decoder_h2_huffman"
path = "fuzz_targets/decoder_h2_huffman.rs"
test = false
doc = false
bench = false

[[bin]]
name = "decoder_bcj"
path = "fuzz_targets/decoder_bcj.rs"
Expand Down
20 changes: 20 additions & 0 deletions fuzz/fuzz_targets/decoder_h2_huffman.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#![no_main]
use compcol::hpack::{huffman, HpackDecoder};
use libfuzzer_sys::fuzz_target;

// Smoke property: neither the HPACK header decoder nor the standalone
// "h2 huffman" string decoder may panic on arbitrary attacker-controlled
// input. libfuzzer feeds us garbage; any panic/abort trips the harness.
//
// Both are pure whole-buffer transforms (the HPACK header block decoder is
// the primary attack surface — it walks the integer/string/index
// representations), so we just call them and discard the result.
fuzz_target!(|data: &[u8]| {
// HPACK header block: bounded table so a hostile size update can't grow
// state without limit.
let mut dec = HpackDecoder::with_max_table_size(4096);
let _ = dec.decode(data);

// The §5.2 Huffman string primitive on its own.
let _ = huffman::decode(data);
});
19 changes: 19 additions & 0 deletions src/factory.rs
Original file line number Diff line number Diff line change
Expand Up @@ -127,13 +127,19 @@ pub fn encoder_by_name(name: &str) -> Option<Box<dyn Encoder>> {
#[cfg(feature = "lha")]
crate::lha::Lh1::NAME => Some(Box::new(<crate::lha::Lh1 as Algorithm>::encoder())),
#[cfg(feature = "lha")]
crate::lha::Lh2::NAME => Some(Box::new(<crate::lha::Lh2 as Algorithm>::encoder())),
#[cfg(feature = "lha")]
crate::lha::Lh4::NAME => Some(Box::new(<crate::lha::Lh4 as Algorithm>::encoder())),
#[cfg(feature = "lha")]
crate::lha::Lh5::NAME => Some(Box::new(<crate::lha::Lh5 as Algorithm>::encoder())),
#[cfg(feature = "lha")]
crate::lha::Lh6::NAME => Some(Box::new(<crate::lha::Lh6 as Algorithm>::encoder())),
#[cfg(feature = "lha")]
crate::lha::Lh7::NAME => Some(Box::new(<crate::lha::Lh7 as Algorithm>::encoder())),
#[cfg(feature = "hpack")]
crate::hpack::Http2Huffman::NAME => Some(Box::new(
<crate::hpack::Http2Huffman as Algorithm>::encoder(),
)),
#[cfg(feature = "bcj")]
crate::bcj::BcjX86::NAME => Some(Box::new(<crate::bcj::BcjX86 as Algorithm>::encoder())),
#[cfg(feature = "bcj")]
Expand Down Expand Up @@ -359,13 +365,19 @@ pub fn decoder_by_name(name: &str) -> Option<Box<dyn Decoder>> {
#[cfg(feature = "lha")]
crate::lha::Lh1::NAME => Some(Box::new(<crate::lha::Lh1 as Algorithm>::decoder())),
#[cfg(feature = "lha")]
crate::lha::Lh2::NAME => Some(Box::new(<crate::lha::Lh2 as Algorithm>::decoder())),
#[cfg(feature = "lha")]
crate::lha::Lh4::NAME => Some(Box::new(<crate::lha::Lh4 as Algorithm>::decoder())),
#[cfg(feature = "lha")]
crate::lha::Lh5::NAME => Some(Box::new(<crate::lha::Lh5 as Algorithm>::decoder())),
#[cfg(feature = "lha")]
crate::lha::Lh6::NAME => Some(Box::new(<crate::lha::Lh6 as Algorithm>::decoder())),
#[cfg(feature = "lha")]
crate::lha::Lh7::NAME => Some(Box::new(<crate::lha::Lh7 as Algorithm>::decoder())),
#[cfg(feature = "hpack")]
crate::hpack::Http2Huffman::NAME => Some(Box::new(
<crate::hpack::Http2Huffman as Algorithm>::decoder(),
)),
#[cfg(feature = "bcj")]
crate::bcj::BcjX86::NAME => Some(Box::new(<crate::bcj::BcjX86 as Algorithm>::decoder())),
#[cfg(feature = "bcj")]
Expand Down Expand Up @@ -701,6 +713,9 @@ pub const fn extension(name: &str) -> Option<&'static str> {
if str_eq(name, "lh1") && cfg!(feature = "lha") {
return Some("lzh");
}
if str_eq(name, "lh2") && cfg!(feature = "lha") {
return Some("lzh");
}
if str_eq(name, "lh4") && cfg!(feature = "lha") {
return Some("lzh");
}
Expand Down Expand Up @@ -861,13 +876,17 @@ pub const fn names() -> &'static [&'static str] {
#[cfg(feature = "lha")]
crate::lha::Lh1::NAME,
#[cfg(feature = "lha")]
crate::lha::Lh2::NAME,
#[cfg(feature = "lha")]
crate::lha::Lh4::NAME,
#[cfg(feature = "lha")]
crate::lha::Lh5::NAME,
#[cfg(feature = "lha")]
crate::lha::Lh6::NAME,
#[cfg(feature = "lha")]
crate::lha::Lh7::NAME,
#[cfg(feature = "hpack")]
crate::hpack::Http2Huffman::NAME,
#[cfg(feature = "bcj")]
crate::bcj::BcjX86::NAME,
#[cfg(feature = "bcj")]
Expand Down
Loading
Loading