Skip to content

fix: correctness and security hardening across the HTTP stack#6

Merged
OmarAlJarrah merged 2 commits into
mainfrom
fix/http-stack-correctness-hardening
Jun 10, 2026
Merged

fix: correctness and security hardening across the HTTP stack#6
OmarAlJarrah merged 2 commits into
mainfrom
fix/http-stack-correctness-hardening

Conversation

@OmarAlJarrah

Copy link
Copy Markdown
Member

Summary

A focused correctness and security pass over the HTTP stack — the pipeline policies, the four transport adapters, and the serde/pagination/streaming layers. The issues here mostly live in cross-layer interactions and transport wire behaviour that the existing unit tests didn't exercise.

Pipeline & auth

  • Credentials no longer follow cross-origin redirects. The credential policies are now origin-aware: after a redirect to a different host they stop re-stamping the bearer/basic/key credential onto the foreign request. The redirect policy strips a caller-set Authorization only on cross-origin hops, so ordinary same-origin redirects (e.g. a trailing-slash 301) keep it.
  • Retry body handling. Single-use body buffering is decided from the effective per-call retry total (not just the constructor default), so a per-call retry_total override no longer crashes a streamed body mid-retry. Non-idempotent methods (POST/PATCH) are no longer retried on read-phase errors, where the request may already have been processed. Async body buffering runs off the event loop.
  • Pipeline ownership. Building a pipeline from a policy instance already wired into another pipeline now raises instead of silently corrupting the first pipeline's chain.

Transports

  • requests: never send Content-Length and Transfer-Encoding: chunked together (the previous behaviour sent an unframed body under both headers). Known-length bodies are framed by length and streamed without buffering into memory; unknown-length bodies are chunked with any stale Content-Length removed. Only a session the client created is closed; repeated response headers (e.g. Set-Cookie) are preserved; the negotiated HTTP version is reported; a Content-Length that would misdescribe a decompressed body is dropped.
  • urllib: stop following redirects inside the transport so 3xx responses reach the pipeline's redirect policy (and its credential-stripping) instead of being followed silently with the original headers. Response-read failures are mapped into the SDK error hierarchy.
  • asyncio: send Host with the port for non-default ports; apply a caller-supplied SSL context only to https URLs; reject chunked responses and read connection-close-framed bodies to EOF instead of fabricating an empty body; detect chunked across multiple Transfer-Encoding lines; buffer the request body off the event loop.
  • httpx / aiohttp: set Content-Length for known-length bodies (instead of always chunking) and pump sync body iterators on a worker thread rather than the event loop.
  • All transports preserve a valid-but-unregistered status code instead of discarding the live response.

Serde, pagination, streaming, multipart

  • Codec decodes Annotated[...] fields (including annotated Tristate) correctly, guards recursion depth, and wraps conversion failures in CodecError.
  • Link-header pagination reads every Link line and tolerates commas inside the target URI; cursor pagination accepts non-string cursors; malformed JSON pages surface DeserializationError.
  • JSONL and SSE decoding map invalid UTF-8 onto the streaming error contract.
  • Multipart rejects CR/LF/NUL in field names, filenames, media types, custom part headers, and the boundary, closing several header-injection vectors.

Also includes the digest nonce-count reset, per-operation (rather than per-attempt) tracing events, fail-closed URL redaction, more robust proxy-configuration parsing, and a number of smaller correctness and documentation fixes.

Public API

One intentional signature change: HttpTracer.request_sent now accepts int | None so an unknown-length upload still emits the event. The committed surface baseline is regenerated accordingly.

Testing

uv run pytest -q1398 passed (242 new regression tests), uv run mypy --strict clean, uv run ruff check / ruff format --check clean.

🤖 Generated with Claude Code

OmarAlJarrah and others added 2 commits June 10, 2026 15:53
…and serde

Resolve a batch of correctness and security issues found while reviewing the
HTTP stack end to end.

Pipeline & auth:
- Credential policies are origin-aware: after a cross-origin redirect the
  bearer/basic/key credential is no longer re-stamped onto the foreign host,
  and the redirect policy strips a caller-set Authorization only on
  cross-origin hops (same-origin hops keep it).
- Retry decides single-use body buffering from the effective per-call retry
  total and no longer retries non-idempotent methods on read-phase errors;
  async body buffering runs off the event loop.
- Pipeline construction rejects reuse of a policy instance already wired into
  another pipeline.

Transports:
- requests: never emit Content-Length and Transfer-Encoding together; frame
  known-length bodies by length (streamed, not buffered into memory) and chunk
  unknown-length bodies with any stale Content-Length removed. Close only a
  session the client owns; preserve repeated response headers; report the
  negotiated HTTP version; drop a misleading Content-Length under
  Content-Encoding.
- urllib: stop following redirects inside the transport so the 3xx reaches the
  pipeline; map response-read failures into the SDK error hierarchy.
- asyncio: send Host with the port for non-default ports; apply an SSL context
  only to https URLs; reject chunked and read connection-close framing instead
  of fabricating an empty body; detect chunked across multiple
  Transfer-Encoding lines; buffer the request body off the event loop.
- httpx/aiohttp: set Content-Length for known-length bodies and pump sync body
  iterators on a worker thread.
- All transports preserve a valid-but-unregistered status code instead of
  discarding the response.

Serde, pagination, streaming, multipart:
- Codec decodes Annotated[...] fields (including annotated Tristate) correctly,
  guards recursion depth, and wraps conversion failures in CodecError.
- Link-header pagination reads every Link line and tolerates commas inside the
  target URI; cursor pagination accepts non-string cursors; JSON page parsing
  surfaces DeserializationError.
- JSONL and SSE decoding map invalid UTF-8 onto the streaming error contract.
- Multipart rejects CR/LF/NUL in field names, filenames, media types, custom
  part headers, and the boundary to prevent header injection.

Also fixes the digest nonce-count reset, per-operation tracing event emission,
URL-redactor fail-closed behaviour, proxy configuration parsing, and a number
of smaller correctness and documentation issues. Adds regression tests
throughout; the full suite, mypy --strict, and ruff all pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Decoding a PEP 695 generic dataclass (e.g. `Box[int]`, declared `class Box[T]`)
raised `CodecError: name 'T' is not defined` on Python 3.12. The field
annotations reference the class type parameters, and 3.12's `get_type_hints`
does not place those parameters in scope (3.13+ resolves them automatically).

Pass the class `__type_params__` through `get_type_hints`'s `localns` so the
annotations resolve on every supported interpreter. Verified against the full
suite on 3.12, 3.13, and 3.14.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@OmarAlJarrah OmarAlJarrah merged commit 7dcca56 into main Jun 10, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant