fix(decoder): report OutputFull when the output buffer is full#106
Merged
Conversation
fc30171 to
d96fe4b
Compare
A RawDecoder that buffers a whole block internally (notably bzip2, which absorbs an entire BWT block before draining it) could make a naive decode loop fail with UnexpectedEnd. When the caller's output buffer filled mid-block, the RawDecoder->Decoder bridge derived Status purely from consumed >= input.len(); since the decoder had already swallowed all the input, it returned InputEmpty instead of OutputFull. A loop that stops on InputEmpty then called finish() on a half-drained stream and got UnexpectedEnd — even on the decoder's own encoder output. Return OutputFull whenever the output buffer is full (and non-empty), which is always the correct "drain and call again" signal; a later call with no remaining input yields InputEmpty once pending bytes are out. Genuine truncation still errors (that path returns with output not full). Adds round_trip_small_output_buffer_naive_loop, which drives the exact documented decode loop with 1/64/4096/65536-byte output buffers over 100 KB-1 MB inputs and failed with UnexpectedEnd before this change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
d96fe4b to
fe1327d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
compcol's bzip2 decoder failed a naive decode loop withUnexpectedEnd— even on its own encoder's output — whenever a decoded block was larger than the caller's output buffer.Root cause is in the shared
RawDecoder→Decoderbridge, not the bzip2 logic. bzip2's decoder absorbs the entire input slice up front (it needs random access over a whole BWT block, up to 900 KB), then drains the decoded block intooutput. Whenoutputfilled mid-block, the bridge derivedStatuspurely fromconsumed >= input.len()— which was true (all input already swallowed) — so it returnedInputEmptyinstead ofOutputFull. A caller following the documented decode loop breaks onInputEmptywith the block half-drained, callsfinish(), and getsUnexpectedEnd.Existing tests missed it because the test helper had a defensive "drain after
InputEmpty" loop that papered over the contract violation.Fix
Return
OutputFullwhenever the output buffer is full (and non-empty) — always the correct "drain and call again" signal. It converges for every decoder (a later call with no remaining input simply yieldsInputEmpty), and genuine truncation still errors (that path returns with output not full). The change is in the shared bridge, so any internally-buffering decoder benefits.Test
Adds
round_trip_small_output_buffer_naive_loop, which drives the exact documented decode loop (no workaround) with output buffers of 1 / 64 / 4096 / 65536 B over 100 KB–1 MB inputs. Confirmed it fails withUnexpectedEndwithout the fix and passes with it. Full suite: 61/61 test binaries pass; verified our decoder also handles nativebzip2 -9output with small buffers.🤖 Generated with Claude Code