Skip to content

test_bz2.testDecompressorChunksMaxsize is flaky due to non-deterministic BIG_TEXT #145607

@colesbury

Description

@colesbury

Bug report

test_bz2 concatenates a bunch of Python test files to get 128 KiB of test data:

# Some tests need more than one block of uncompressed data. Since one block
# is at least 100,000 bytes, we gather some data dynamically and compress it.
# Note that this assumes that compression works correctly, so we cannot
# simply use the bigger test data for all tests.
test_size = 0
BIG_TEXT = bytearray(128*1024)
for fname in glob.glob(os.path.join(glob.escape(os.path.dirname(__file__)), '*.py')):
with open(fname, 'rb') as fh:
test_size += fh.readinto(memoryview(BIG_TEXT)[test_size:])
if test_size > 128*1024:
break
BIG_DATA = bz2.compress(BIG_TEXT, compresslevel=1)

The exact contents depends on the order of results returned glob.glob(), which is in arbitrary, but typically consistent on a single machine. Some of these orderings of globbed files lead to test failures.

Below is mostly Claude's summary, which seems right to me:

The testDecompressorChunksMaxsize test feeds BIG_DATA[:len(BIG_DATA)-64] to BZ2Decompressor.decompress with max_length=100 and asserts needs_input is False. This assumes the truncated data contains at least one complete bz2 block so the decompressor can produce output. But bz2 is a block compressor - it cannot produce any output until an entire compressed block is available.

With certain file orderings, the first bz2 block's compressed data extends into the last 64 bytes of BIG_DATA. The truncation then produces an incomplete block, the decompressor consumes all input, returns 0 bytes, and correctly sets needs_input=True.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    testsTests in the Lib/test dirtype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions