Skip to content

gh-151497: Avoid huge pre-allocation for oversized tarfile extended headers#151498

Open
iamsharduld wants to merge 1 commit into
python:mainfrom
iamsharduld:gh-tarfile-extheader-memory
Open

gh-151497: Avoid huge pre-allocation for oversized tarfile extended headers#151498
iamsharduld wants to merge 1 commit into
python:mainfrom
iamsharduld:gh-tarfile-extheader-memory

Conversation

@iamsharduld

Copy link
Copy Markdown
Contributor

tarfile reads a member's extended header (a GNU long name/link, or a pax
header) with a single read sized directly by the header's size field:

buf = tarfile.fileobj.read(self._block(self.size))

self.size is taken from the archive and is not validated, so a ~512-byte
crafted file can claim several gigabytes (or, via base-256 encoding, far more)
and make read() pre-allocate that much memory — on open/iterate
(tarfile.open(...).getmembers()), before any extraction filter runs. A
512-byte archive claiming 1 GiB drives a ~950 MiB resident allocation; a claim
of 1 TiB raises MemoryError even on high-RAM machines.

This reads the extended-header data in bounded chunks instead, so an oversized
or truncated header can no longer force a huge up-front allocation. The bytes
returned for valid archives are unchanged, and the change is safe for both
seekable and streaming (r|) tars.

…nded headers

tarfile reads a member's extended header (a GNU long name/link or a pax
header) with a single read sized by the header's size field:

    buf = tarfile.fileobj.read(self._block(self.size))

The size is taken from the archive and is not validated, so a ~512-byte
crafted file can claim several gigabytes (or, via base-256 encoding, far
more) and make read() pre-allocate that much memory -- on open/iterate,
before any extraction filter runs.

Read the extended-header data in bounded chunks instead, so an oversized
or truncated header can no longer force a huge allocation. The bytes
returned for valid archives are unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant