Streaming response consumption validates full event union per event on the loop, even for get_final_message()

`await stream.get_final_message()` runs the full per-event pipeline synchronously on the loop: for every wire event it validates the whole `RawMessageStreamEvent` union, folds the delta into a snapshot, and builds a higher-level event object to yield. \~30-55 µs/event (in part on whether asyncio debug is enabled); on a real batch run that added up to \~100s of event-loop time for stream consumption alone, and a large single response is a few-hundred-ms stall. Much of that is avoidable — validating the whole union per event costs \~60% more than validating just the matched variant, and the event objects are discarded when the caller only wants the final `Message`.

Compared to performance ticket #1195 (on the request side), this is significantly more severe: On the (py-spy-sampled) run from which this ticket's numbers are taken it was \~50x the larger cost (request-side `_transform_recursive` is 1.4s, vs \~70s for consumption).

## Environment

`anthropic` 0.105.2, also reproduced on 0.80.0. Python 3.14, pydantic 2.13.x.

## Where the time goes

Measuring current timing and proposed fixes, see [repro.py](https://github.com/user-attachments/files/28604386/repro.py) — real `messages.stream()` path against an in-process httpx mock transport, no network or spend.

Self-time from a py-spy profile of a real service consuming `messages.stream()` across many concurrent calls, summed across the batch (arm64, asyncio debug enabled):

| function | self s | role |
|---|---|---|
| `accumulate_event` | 18.2 | builds the snapshot per delta; needed for the final message |
| `construct_type` | 8.0 | validates the wire-event union per event |
| `build_events` | 2.9 | builds a per-delta event object every delta |
| `is_union` + `is_type_alias_type` + `is_annotated_type` | \~6.2 | per-event type introspection |

Plus 31.2s of pydantic self-time called by the Anthropic client, 26.7s of it under `construct_type` below — \~100s of loop time total for stream consumption.

The union decode is the largest leg — \~45s of the \~100s — and the one our production profiler flagged (`construct_type` → `validate_python` under `get_final_message()`). `construct_type` (`_models.py:609`) validates the whole `RawMessageStreamEvent` union per event; selecting the variant by its `type` and validating just that returns an identical object for \~60% less (benchmarked, `repro.py`). The SDK's own discriminator fallback (`:629`) doesn't capture this — it routes to the unvalidated `.construct()` path, which measures \~2x slower. The per-call typing introspection (`is_union`/`get_args`, \~1.2 µs/event; the pydantic `TypeAdapter` is already `lru_cache`d) is a separate, independently memoizable slice. Every consumer pays this leg, iteration or final-message-only.

`build_events` (`_messages.py:284`) is smaller but plainer waste: it runs for every event inside `__stream__`, which `get_final_message()` drains through, building a `TextEvent`/`InputJsonEvent`/etc. per delta that a final-message-only caller never reads.

`accumulate_event` is the one genuinely required leg — a caller that wants the final message has to fold in each delta — though it carries the latent tail below.

## Magnitude

The consumer logged `content_block_delta` count per response — the `n` cost scales in, server-chosen, not the output-token count. 1,880 responses:

| statistic | deltas per response |
|---|---|
| min / median / mean | 1 / 476 / 990 |
| p90 / p95 / p99 | 2699 / 3359 / 4183 |
| max | 5671 |

1.86M deltas over the \~100s above is ≈54 µs/event on that host; the max response (5,671 deltas) lands near 0.31s, matching the production stalls that first flagged this.

## Super-linear tail (latent)

`accumulate_event` has two O(n²)-in-block-length shapes. The attribute-target `+=` — `content.text += delta` (`:468`) and the identical `content.thinking += delta` (`:491`, often the longest block) — doesn't get CPython's in-place concat optimization (local targets only), so each delta reallocs the buffer; the `input_json_delta` branch (`:477,:480`) does the same with `json_buf += bytes(...)` and then re-parses the whole buffer with `from_json(json_buf, partial_mode=True)` every delta. Paired micro-benchmarks against linear controls confirm both (local concat slope 1.0, attribute-target 2.05). Crossover with the linear floor is \~50–80k events and the observed max is 5,671, so this isn't biting today — worth fixing because it pads the per-event constant, not for the asymptote.

## Fixes

1. Decode by validating the discriminated variant rather than the whole union — \~60% of the decode leg, benchmarked, same returned object. Validate the variant, don't `.construct()` it: the SDK's current discriminator fallback constructs, measuring \~2x slower. Memoize the per-call typing introspection as a separate, smaller win.
2. Skip or only lazily run `build_events` (keeping `accumulate_event`) when draining for the final message only. The per-delta events it builds are entirely unused on that path.
3. Accumulate text and tool-input into a buffer joined/parsed once instead of per delta — safe only on the final-message drain; on the iterate path it must be a lazy value computed on access, not deferred to `content_block_stop`, or per-delta `snapshot`/`.input` reads regress. Drops the latent tail and trims the constant.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming response consumption validates full event union per event on the loop, even for get_final_message() #1649

Environment

Where the time goes

Magnitude

Super-linear tail (latent)

Fixes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

function	self s	role
`accumulate_event`	18.2	builds the snapshot per delta; needed for the final message
`construct_type`	8.0	validates the wire-event union per event
`build_events`	2.9	builds a per-delta event object every delta
`is_union` + `is_type_alias_type` + `is_annotated_type`	~6.2	per-event type introspection

statistic	deltas per response
min / median / mean	1 / 476 / 990
p90 / p95 / p99	2699 / 3359 / 4183
max	5671

Streaming response consumption validates full event union per event on the loop, even for get_final_message() #1649

Description

Environment

Where the time goes

Magnitude

Super-linear tail (latent)

Fixes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions