Skip to content

recorder: don't drop fixtures when SDK closes socket after data: [DONE]#288

Open
linyijie wants to merge 1 commit into
CopilotKit:mainfrom
linyijie:fix/record-sdk-early-close
Open

recorder: don't drop fixtures when SDK closes socket after data: [DONE]#288
linyijie wants to merge 1 commit into
CopilotKit:mainfrom
linyijie:fix/record-sdk-early-close

Conversation

@linyijie

@linyijie linyijie commented Jul 3, 2026

Copy link
Copy Markdown

Summary

In record mode, clientRes.on("close") currently calls req.destroy() on the upstream request whenever the client closes its socket before clientRes.writableFinished, and downstream logic then skips saving the fixture entirely.

That's a reasonable guard against saving a truncated body when the client really did die mid-stream, but it also fires in a common benign case: SDKs that eagerly close their HTTP response the instant they consume the terminating SSE frame. The OpenAI Python SDK does exactly this — its Stream.__stream__ breaks out of the iterator loop on data: [DONE] and its finally: await response.aclose() closes the socket immediately, often before upstream has fired its own end event.

Result: record mode silently produces zero fixtures for any OpenAI-SDK-driven traffic, even though the full response was received and rendered end-to-end by the caller. This is very easy to hit and hard to diagnose (the only signal in the logs is Proxy request failed: aborted, which looks like an upstream problem).

Fix

Two small changes in src/recorder.ts:

  1. In the clientRes.on("close") handler for progressive-stream responses: stop destroying the upstream request and stop discarding the already-buffered chunks. Just record that the client disconnected. Upstream is allowed to run to completion so the fixture body is complete. The existing !clientDisconnected guard in onUpstreamData already prevents any further writes to the closed client socket, so no data is written to a dead peer.

  2. In the post-collapse if (clientDisconnected) branch: remove the return "relayed" — since we no longer destroy upstream on client close, reaching that point means upstream did complete cleanly and the buffered body is intact, so persisting the fixture is safe. The warn message is kept (retitled) so the disconnect is still observable.

Repro (before fix)

Start aimock in record mode against any OpenAI-compatible upstream:

npx @copilotkit/aimock llmock \
  -p 4010 -f ./fixtures \
  --record --provider-openai https://your-openai-compatible-gw

Then drive it with the OpenAI Python SDK:

from openai import AsyncOpenAI
import asyncio

async def main():
    client = AsyncOpenAI(
        base_url="http://127.0.0.1:4010/v1",
        api_key="sk-...",
        timeout=3600.0,
    )
    stream = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "hi"}],
        stream=True,
    )
    async for _ in stream:
        pass

asyncio.run(main())

Observed logs:

[aimock] NO FIXTURE MATCH — proxying to https://.../v1/chat/completions
[aimock] Proxy request failed: aborted

No fixture is written. The SDK, however, receives the full response — the abort is aimock destroying upstream in reaction to the SDK's normal post-[DONE] socket close.

After fix

Same repro produces:

[aimock] NO FIXTURE MATCH — proxying to https://.../v1/chat/completions
[aimock] Streaming response detected (text/event-stream) — collapsing to fixture
[aimock] Client closed connection before upstream end — upstream response completed, recording full fixture
[aimock] Response recorded → ./fixtures/recorded/openai-YYYY-...-.json

Verified locally against a live OpenAI-compatible gateway with the OpenAI Python SDK 2.30.0 + httpx 0.28.1; 714 SSE chunks streamed to the SDK, fixture written to disk, replay works.

Safety notes

  • The "avoid saving truncated data" intent of the original guard is preserved for real mid-stream failures: those manifest as upstream errors (res.on("error")) or upstream timeouts (req.on("timeout")req.destroy(Error(...))), and makeUpstreamRequest rejects with that error — control never reaches the clientDisconnected branch, and no fixture is written.
  • The only behavior change for genuine client aborts is that upstream now runs to completion in the background. In practice this is a few KB of buffered SSE frames per request; unbounded growth is still bounded by the existing maxProxyBufferBytes / maxProxyBufferFrames caps.

…NE]`

SDKs like the OpenAI Python SDK close the response socket the moment they consume `data: [DONE]`, before upstream fires its `end` event. The current clientRes.on("close") handler treats that as a mid-stream disconnect, destroys the upstream request, and the downstream `if (clientDisconnected) return "relayed"` branch skips fixture persistence entirely — so record mode silently produces no fixture for any OpenAI-SDK-driven traffic even though the full response was received.

Fix: stop destroying upstream on client close (the existing !clientDisconnected guard in onUpstreamData already prevents writes to the closed peer, so nothing is written to a dead socket), and remove the early return in the clientDisconnected branch — reaching that branch now means upstream ran to completion cleanly, so persisting the fixture is safe.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant