Skip to content

fix(decoder): report OutputFull when the output buffer is full#106

Merged
MagicalTux merged 1 commit into
masterfrom
fix/bzip2-decoder-bridge
Jun 30, 2026
Merged

fix(decoder): report OutputFull when the output buffer is full#106
MagicalTux merged 1 commit into
masterfrom
fix/bzip2-decoder-bridge

Conversation

@MagicalTux

Copy link
Copy Markdown
Member

Problem

compcol's bzip2 decoder failed a naive decode loop with UnexpectedEnd — even on its own encoder's output — whenever a decoded block was larger than the caller's output buffer.

Root cause is in the shared RawDecoderDecoder bridge, not the bzip2 logic. bzip2's decoder absorbs the entire input slice up front (it needs random access over a whole BWT block, up to 900 KB), then drains the decoded block into output. When output filled mid-block, the bridge derived Status purely from consumed >= input.len() — which was true (all input already swallowed) — so it returned InputEmpty instead of OutputFull. A caller following the documented decode loop breaks on InputEmpty with the block half-drained, calls finish(), and gets UnexpectedEnd.

Existing tests missed it because the test helper had a defensive "drain after InputEmpty" loop that papered over the contract violation.

Fix

Return OutputFull whenever the output buffer is full (and non-empty) — always the correct "drain and call again" signal. It converges for every decoder (a later call with no remaining input simply yields InputEmpty), and genuine truncation still errors (that path returns with output not full). The change is in the shared bridge, so any internally-buffering decoder benefits.

Test

Adds round_trip_small_output_buffer_naive_loop, which drives the exact documented decode loop (no workaround) with output buffers of 1 / 64 / 4096 / 65536 B over 100 KB–1 MB inputs. Confirmed it fails with UnexpectedEnd without the fix and passes with it. Full suite: 61/61 test binaries pass; verified our decoder also handles native bzip2 -9 output with small buffers.

🤖 Generated with Claude Code

@MagicalTux MagicalTux force-pushed the fix/bzip2-decoder-bridge branch from fc30171 to d96fe4b Compare June 30, 2026 08:56
A RawDecoder that buffers a whole block internally (notably bzip2, which
absorbs an entire BWT block before draining it) could make a naive decode
loop fail with UnexpectedEnd. When the caller's output buffer filled
mid-block, the RawDecoder->Decoder bridge derived Status purely from
consumed >= input.len(); since the decoder had already swallowed all the
input, it returned InputEmpty instead of OutputFull. A loop that stops on
InputEmpty then called finish() on a half-drained stream and got
UnexpectedEnd — even on the decoder's own encoder output.

Return OutputFull whenever the output buffer is full (and non-empty),
which is always the correct "drain and call again" signal; a later call
with no remaining input yields InputEmpty once pending bytes are out.
Genuine truncation still errors (that path returns with output not full).

Adds round_trip_small_output_buffer_naive_loop, which drives the exact
documented decode loop with 1/64/4096/65536-byte output buffers over
100 KB-1 MB inputs and failed with UnexpectedEnd before this change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@MagicalTux MagicalTux force-pushed the fix/bzip2-decoder-bridge branch from d96fe4b to fe1327d Compare June 30, 2026 09:00
@MagicalTux MagicalTux merged commit 3f62896 into master Jun 30, 2026
42 checks passed
@MagicalTux MagicalTux deleted the fix/bzip2-decoder-bridge branch June 30, 2026 09:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant