gh-151497: Avoid huge pre-allocation for oversized tarfile extended headers by iamsharduld · Pull Request #151498 · python/cpython

iamsharduld · 2026-06-15T10:33:42Z

tarfile reads a member's extended header (a GNU long name/link, or a pax
header) with a single read sized directly by the header's size field:

buf = tarfile.fileobj.read(self._block(self.size))

self.size is taken from the archive and is not validated, so a ~512-byte
crafted file can claim several gigabytes (or, via base-256 encoding, far more)
and make read() pre-allocate that much memory — on open/iterate
(tarfile.open(...).getmembers()), before any extraction filter runs. A
512-byte archive claiming 1 GiB drives a ~950 MiB resident allocation; a claim
of 1 TiB raises MemoryError even on high-RAM machines.

This reads the extended-header data in bounded chunks instead, so an oversized
or truncated header can no longer force a huge up-front allocation. The bytes
returned for valid archives are unchanged, and the change is safe for both
seekable and streaming (r|) tars.

Issue: tarfile: memory exhaustion via oversized extended-header (GNU long name / pax) size field #151497

…nded headers tarfile reads a member's extended header (a GNU long name/link or a pax header) with a single read sized by the header's size field: buf = tarfile.fileobj.read(self._block(self.size)) The size is taken from the archive and is not validated, so a ~512-byte crafted file can claim several gigabytes (or, via base-256 encoding, far more) and make read() pre-allocate that much memory -- on open/iterate, before any extraction filter runs. Read the extended-header data in bounded chunks instead, so an oversized or truncated header can no longer force a huge allocation. The bytes returned for valid archives are unchanged.

iamsharduld requested a review from ethanfurman as a code owner June 15, 2026 10:33

bedevere-app Bot added the awaiting review label Jun 15, 2026

bedevere-app Bot mentioned this pull request Jun 15, 2026

tarfile: memory exhaustion via oversized extended-header (GNU long name / pax) size field #151497

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gh-151497: Avoid huge pre-allocation for oversized tarfile extended headers#151498

gh-151497: Avoid huge pre-allocation for oversized tarfile extended headers#151498
iamsharduld wants to merge 1 commit into
python:mainfrom
iamsharduld:gh-tarfile-extheader-memory

iamsharduld commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

iamsharduld commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant