Skip to content

feat(proxy): add per-chunk timeout control for SSE streaming responses #40

@coderabbitai

Description

@coderabbitai

Background

When a model has a timeout configured, it currently acts as a request timeout — i.e., it only covers the time until response headers are received. Once the SSE stream is established, individual chunk delivery is unbounded.

A separate, configurable per-chunk (or inter-token) timeout is needed to detect stalled upstream streams after the response headers have already been received.

Considerations

  • Some providers under high load may return response headers immediately but queue the actual LLM computation, meaning the first token could arrive much later. An overly aggressive chunk timeout could hurt user experience and waste upstream cost.
  • The timeout should be configurable independently from the request timeout, and should likely apply between chunks (idle timeout) rather than as an absolute deadline from stream open.
  • On chunk timeout expiry, the handler should emit a structured Anthropic-style error SSE event and terminate the stream cleanly, ensuring usage_rx is closed and request_ctx/span_ctx are finalized.
  • Both src/proxy/handlers/chat_completions/mod.rs and src/proxy/handlers/messages/mod.rs are affected.

References

/cc @bzp2010

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions