Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 118 additions & 6 deletions THREAT_MODEL.md
Original file line number Diff line number Diff line change
Expand Up @@ -538,11 +538,123 @@ client-side, the writer adds masks to outgoing frames.
- No formal CVE has been published against the WebSocket framing layer to
date.

---

### 5.4. Multipart parsing & encoding

**Scope.** Parsing of incoming `multipart/*` bodies (`MultipartReader`,
`BodyPartReader`) and construction of outgoing multipart bodies
(`MultipartWriter`, `FormData`). This includes boundary handling, per-part
header parsing, `Content-Disposition` extraction, `Content-Transfer-Encoding`
decoding, charset detection, and the `Request.post()` path that wraps
`multipart/form-data`. Out of scope: the underlying compression codecs ([§5.5](#55-compression-codecs)),
streams/payloads ([§5.6](#56-streams--payloads)), and the connection-level read/write timeouts that
back-stop slow drips ([§5.7](#57-server-connection-lifecycle)).

**Components covered.**

- `aiohttp/multipart.py` — `MultipartReader`, `BodyPartReader`,
`MultipartWriter`, `parse_content_disposition`,
`content_disposition_filename`.
- `aiohttp/formdata.py` — `FormData` builder.
- `aiohttp/web_request.py` — `Request.post()`, `Request.multipart()`.
- `aiohttp/payload.py` — multipart payload wrapping (overlap with [§5.6](#56-streams--payloads)).

**Trust boundaries & data flow.**

```mermaid
flowchart LR
Wire([Untrusted body]) --> Reader[MultipartReader]
Reader --> Parts[BodyPartReader per part]
Parts -->|headers, name, filename| App([User handler])
Parts -->|streamed body| App
Reader -. nested .-> Reader

AppOut([User code]) --> FD[FormData] --> Writer[MultipartWriter]
Writer --> WireOut([Outbound body])
```

The reader's input is fully attacker-controlled on the server side and
upstream-controlled on the client side. Per-part values surfaced to user
code (`headers`, `name`, `filename`, body bytes) are all untrusted. The
writer's input is trusted in the threat-model sense, but `FormData` is the
boundary at which user-supplied strings can become wire bytes.

**Assets at risk (chunk-specific).**

- **Process memory** — number of parts, per-part body size, recursion depth
for nested multipart.
- **Filesystem safety in user code** — `filename` round-trips from wire to
application without sanitisation.
- **Outbound framing integrity** — boundary uniqueness, header validity in
`FormData.add_field`.

**Threats (STRIDE).**

| # | Component / Vector | STRIDE | Threat | Risk |
| :--- | :--- | :--- | :--- | :--- |
| 4.1 | Boundary parameter parsing | T | Malformed boundary parameter (oversized, missing, or containing bytes outside the RFC 2046 §5.1.1 safe set — digits, letters, and a small punctuation set) could enable multipart parser confusion or smuggling. | Low |
| 4.2 | Number of parts per body | D | A peer submits a body packed with many tiny parts (e.g. ten thousand 100-byte parts inside a 1 MiB body). Each part allocates a `BodyPartReader` plus header dict, so the live-Python-object footprint is far larger than the on-wire byte count. `client_max_size` caps the wire bytes but not the per-part allocation amplification. | Low |
| 4.3 | Nested multipart recursion | D | `MultipartReader.next()` recurses into nested multiparts without a depth cap; deeply nested input can hit `RecursionError`. `Request.post()` short-circuits this by rejecting any nested multipart it sees, but the bare API does not. | Medium |
| 4.4 | Per-part header block size | D | A peer submits a part with an oversized header block (very long field values, or hundreds of headers per part) to drive memory growth at parse time, multiplied across many parts. | Low |
| 4.5 | Per-part body size | D | A peer submits a single part with a body that grows arbitrarily large before any framing boundary — if size checking happens only after buffering the whole part, memory blows up before the cap fires. | Low |
| 4.6 | `Content-Disposition` filename | T / I | Filename is parsed (incl. RFC 2231 `filename*=`) and returned verbatim to the application. Path-traversal strings (`../`), absolute paths, NUL/CR/LF all reach user code unsanitised. | Medium |
| 4.7 | `Content-Disposition` name | T | Field name returned verbatim. CR/LF in a part's `name` doesn't cross the framing boundary (the multipart parser delimits parts by boundary, not by CR/LF in field names), but downstream loggers / dashboards may misrender. | Low |
| 4.8 | `_charset_` magic form field | T | An attacker can include a form field named `_charset_` whose value retroactively becomes the *default charset* used to decode subsequent text fields (`multipart.py:MultipartReader.next _charset_ form handling`). Surprising behaviour; documented in the standard. | Low |
| 4.9 | `Content-Transfer-Encoding` (base64 / quoted-printable) | T / D | A peer sets `Content-Transfer-Encoding: base64` (or `quoted-printable`) and submits malformed encoded data — the decoder raises mid-stream and the partially-decoded part is surfaced to the handler with an exception. | Low |
| 4.10 | `Content-Encoding` (gzip/deflate within a part) | D | A peer submits a multipart part with `Content-Encoding: gzip` (or `deflate`) carrying a zip-bomb payload — a small compressed body that expands to a vastly larger decoded body, driving memory exhaustion in the receiving handler. | Low |
| 4.11 | `MultipartWriter` boundary collision | T | If a part body contains a byte sequence matching the boundary string, the writer emits it as-is — a receiver parsing the multipart will see a fake part break early. Negligible against the default `uuid.uuid4().hex` boundary, but real if user code supplies a short or predictable one. | Low |
| 4.12 | `FormData.add_field` argument injection | T | When user code opts into `FormData(quote_fields=False)`, attacker-controlled `name` or `filename` reach the outbound `Content-Disposition` value with only `\` / `"` escaped — CR/LF/NUL pass through, enabling injection of extra disposition parameters or a fake-part break-out. With the default `quote_fields=True` the values are percent-encoded and safe. `add_field` itself only validates `content_type` (Feb 2026 fix). | Low–Med |
| 4.13 | `Request.post()` temp-file lifetime | I / D | Uploaded files are streamed to `tempfile.TemporaryFile()` and lifetime is bound to the `FileField` object. If user code drops the field reference without reading or closing, garbage collection eventually closes the temp file. | Low |

**Mitigations.**

| # | Threat | Existing | Recommended |
| :--- | :--- | :--- | :--- |
| 4.1 | Boundary parameter | 70-char cap; missing-boundary raises; HTTP header layer ([§5.1](#51-http1-parser)) catches CR/LF/NUL. | None. |
| 4.2 | Many small parts | `client_max_size` caps total bytes. | Documented design decision: rely on `client_max_size` rather than introducing a `max_parts` knob. **User**: operators sensitive to live-object count should reduce `client_max_size`. |
| 4.3 | Nested-multipart recursion | `Request.post()` rejects any nested multipart with `ValueError` ("To decode nested multipart you need to use custom reader") (`web_request.py:BaseRequest.post`). | **Open question.** Direct `MultipartReader` users get unlimited recursion. Consider adding a `max_nesting_depth` parameter (default e.g. 10) to fail cleanly before `RecursionError`. Tracked as a follow-up. |
| 4.4 | Per-part headers bounded | `max_field_size` / `max_headers` plumbed since 5fe9dfb64 (Mar 2026). | None. |
| 4.5 | Per-part body bounded | Per-iteration size check since 9cc4b917c (Mar 2026). | None. |
| 4.6 | Filename path traversal | None; verbatim return is the documented behaviour. | **User**: sanitise `BodyPartReader.filename` before opening / saving / reflecting (strip path components, control bytes, known-bad sequences). **Open question** for aiohttp: ship a public `aiohttp.web.safe_filename()` helper and/or a clearer docstring warning; default behaviour stays as-is. |
| 4.7 | Field-name pass-through | None. | **User**: `request.post()` keys come from attacker-controlled `name` parameters; do not feed them directly to filesystem / DB / template paths. |
| 4.8 | `_charset_` magic field | Behaviour follows the HTML5 spec for multipart form charset selection. | **Document the surprising behaviour: an attacker who can post any field can change how subsequent fields are decoded.** **User**: if your form-handling code makes decisions based on string content, be aware that the decode charset is attacker-influenceable. |
| 4.9 | CTE decoding | `BodyPartReader._decode_content_transfer` dispatches to `base64.b64decode` / `binascii.a2b_qp` / pass-through for `binary`/`8bit`/`7bit`, and raises `RuntimeError` on any other token. Both decoders raise on malformed input; decoded output is still subject to the per-part `client_max_size` cap. For base64, `BodyPartReader.read_chunk` extends each chunk read until its base64-significant character count is a multiple of 4, so mid-quartet splits cannot silently corrupt the decoded output. | None. |
| 4.10 | Content-Encoding decompression | Per-chunk `max_decompress_size`, per-part `client_max_size` cap. | Inherits the [§5.5](#55-compression-codecs) caveat about backend `max_length` honouring. |
| 4.11 | Outbound boundary collision | UUID4-hex default; HMAC-grade entropy means collision with random body bytes is statistically negligible. | **User**: callers supplying their own boundary must use a random, sufficiently long value. |
| 4.12 | `FormData` argument injection | `content_type` rejects `\r` / `\n` at `add_field` (`formdata.py:FormData.add_field`, dab9e879b, Feb 2026). `name` / `filename` are percent-encoded by `content_disposition_header` (`helpers.py:content_disposition_header`) when `quote_fields=True` (default). | **Extend the CR/LF/NUL check to `name` and `filename` at `add_field` so the `quote_fields=False` opt-out path is also defended.** **User**: do not pass `quote_fields=False` when any of `name` / `filename` is attacker-influenced. |
| 4.13 | Temp-file lifetime | `tempfile.TemporaryFile()` is unlinked at creation on POSIX; closing the FD (whether explicit or via GC) reclaims the disk. | **User**: explicitly close `FileField`s you don't intend to consume to release the temp file promptly under high load. |

**Past advisories / hardening (recap).**

- **GHSA-5m98-qgg9-wh84 (CVE-2024-30251)** (3.9.4) — infinite loop in multipart
`_read_chunk_from_length` on a malformed POST body.
- **GHSA-jj3x-wxrx-4x23 (CVE-2025-69227)** (3.13.3) — `Request.post()` entered
an infinite loop on malformed bodies when `PYTHONOPTIMIZE` stripped `assert`
statements (asserts were doing real control-flow work in the
multipart reader). Fixed by replacing the assertions with explicit
checks. Code audit follow-up: any other `assert` used for control
flow should be identified and removed.
- **GHSA-6jhg-hg63-jvvf (CVE-2025-69228)** (3.13.3) — DoS via large
`Request.post()` payloads.
- **GHSA-2vrm-gr82-f7m5 (CVE-2026-34514)** (3.13.4) —
`FormData.add_field` rejects CR/LF in `content_type` (header
injection on outbound multipart).
- **GHSA-m5qp-6w8w-w647 (CVE-2026-34516)** (3.13.4) — `MultipartReader`
now honours `max_field_size` and `max_headers` from the underlying
protocol, plugging an unbounded per-part header memory growth path.
- **GHSA-3wq7-rqq7-wx6j (CVE-2026-34517)** (3.13.4) — `Request.post()` enforces
`client_max_size` during iteration rather than after buffering,
plugging a memory-blow-up on large form fields.

These are all currently in place; this section assumes no regression.

**Open questions.**

1. Should server-side reader reject unmasked frames (and client-side reject
masked ones) per RFC 6455 §5.1? (Threat 3.1 — recommended.)
2. Is the PRNG mask source (`random.getrandbits`) sufficient, or should it be
migrated to `secrets`/`os.urandom`? (Threat 3.2.)
3. For long-lived WebSocket sessions, is there a use case for *forcing*
`no_context_takeover` defaults to limit memory growth? (Threat 3.10.)
1. Should `MultipartReader` accept a `max_nesting_depth` parameter to fail
cleanly on deeply nested input (threat 4.3)?
2. Should aiohttp expose an opt-in `safe_filename()` helper, or improve
filename docstrings, to nudge applications away from path-traversal
pitfalls (threat 4.6)?
3. Should `FormData.add_field` extend its CR/LF rejection to `name` and
`filename` symmetrically with `content_type` (threat 4.12)?
Loading