Skip to content

Outbound WebSocket in Durable Object closed (code 1005) after streaming subrequest body completes #6774

@jmcclanahan

Description

@jmcclanahan

Summary

An outbound WebSocket opened from a Durable Object via fetch(url, { headers: { Upgrade: 'websocket' } }).webSocket.accept() is reliably torn down by the runtime with close code 1005 + an error event reading "Network connection lost", shortly after a concurrent streaming HTTP subrequest's response body has been fully read via reader.read().

The remote WebSocket server is not initiating the close (no proper close frame — code 1005 = "No Status Received"). The remote server is healthy, the WS was carrying data in both directions until the streaming subrequest's body reader drained.

Observed

[warn]  WS closed unexpectedly: { code: 1005, reason: "(none)" } — reconnecting
[error] WS error event: { message: "Uncaught Error: Network connection lost.",
                          error: "Error: Network connection lost." }

Close fires consistently 1–2 seconds after the streaming reader.read() loop completes (or shortly after done: true is returned). Repeats on every cycle of streaming fetch → drain → other work while the outbound WS is open.

Reproduction (pattern)

Production pattern. Happy to provide a public minimal repro on request.

import { DurableObject } from 'cloudflare:workers'

export class Repro extends DurableObject<Env> {
  private outboundWs?: WebSocket

  async fetch(request: Request): Promise<Response> {
    if (request.headers.get('Upgrade') !== 'websocket') {
      return new Response(null, { status: 426 })
    }
    const pair = new WebSocketPair()
    const [client, server] = Object.values(pair)
    this.ctx.acceptWebSocket(server)

    // Long-lived outbound WS via fetch upgrade.
    const resp = await fetch('wss://api.deepgram.com/v2/listen?model=flux-general-en', {
      headers: { Upgrade: 'websocket', Authorization: 'Token …' },
    })
    const ws = (resp as Response & { webSocket?: WebSocket }).webSocket!
    ws.accept()
    this.outboundWs = ws
    ws.addEventListener('close', (e) =>
      console.warn('outbound closed', { code: e.code, reason: e.reason }))
    ws.addEventListener('error', (e: any) =>
      console.error('outbound error', { message: e.message, error: String(e.error) }))

    return new Response(null, { status: 101, webSocket: client })
  }

  async webSocketMessage(_ws: WebSocket, _msg: string | ArrayBuffer): Promise<void> {
    // Trigger: any streaming-response subrequest whose body we drain to completion.
    const res = await fetch('https://api.cerebras.ai/v1/chat/completions', {
      method: 'POST',
      headers: { Authorization: 'Bearer …', 'Content-Type': 'application/json' },
      body: JSON.stringify({ model: '…', stream: true, messages: [...] }),
    })
    const reader = res.body!.getReader()
    while (true) {
      const { done } = await reader.read()
      if (done) break
    }
    // ~1–2s later, this.outboundWs fires `close` with code 1005
    // and an `error` event with "Network connection lost".
  }
}

Outbound WS is otherwise healthy: data flowing continuously in both directions, application-level keep-alive every 5s, remote server's protocol keeps sessions open across many turns.

Expected

The outbound WebSocket should remain open. Completion of an unrelated streaming response body in the same DO should not terminate the outbound WS at the TCP layer.

What we ruled out

  • Server-initiated idle close — remote keeps sessions open per its docs. Code 1005 (no payload) is inconsistent with a graceful server close (1000/1011/4xxx). App keep-alive every 5s, continuous data flowing.
  • Floating-promise / lost I/O context — tested both void p.catch(...) and await p.catch(...) in the inbound message handlers. No change.
  • handle_cross_request_promise_resolution compat flag — already default-on at our compatibility_date (wrangler rejected explicit re-add: [code: 10021] became the default as of 2024-10-14).
  • Hibernation API vs legacy server.accept() — same behavior under both inbound WS patterns.
  • Idle timeout — Twilio (inbound) sends ~50 frames/sec continuously, which feed the outbound WS. No idle window.
  • CPU/eviction budget — each inbound WS message resets the 30s CPU budget. Nowhere near it.

The fail mode disappears when the streaming fetch() is replaced with a non-streaming await res.json(). This isolates the trigger to reading a streaming response body to completion while an outbound WS is open in the same DO.

Environment

  • Wrangler: 4.93.0
  • @cloudflare/workers-types: 4.20260519.1
  • compatibility_date: "2026-05-18"
  • compatibility_flags: ["nodejs_compat"]
  • Production deploy (not Miniflare / local dev)
  • DO uses ctx.acceptWebSocket(server) for inbound, fetch().webSocket.accept() for outbound
  • DO is sqlite_classes per [[migrations]]

Related

Impact

Live voice agent (Twilio Media Streams in + Deepgram STT WS out + streaming LLM + streaming TTS): fires once per conversational turn. Reconnect recovers but adds ~200–500 ms latency on the next caller utterance and produces noisy logs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions