feat(proxy): add per-chunk timeout control for SSE streaming responses

## Background

When a model has a `timeout` configured, it currently acts as a *request timeout* — i.e., it only covers the time until response headers are received. Once the SSE stream is established, individual chunk delivery is unbounded.

A separate, configurable per-chunk (or inter-token) timeout is needed to detect stalled upstream streams after the response headers have already been received.

## Considerations

- Some providers under high load may return response headers immediately but queue the actual LLM computation, meaning the **first token** could arrive much later. An overly aggressive chunk timeout could hurt user experience and waste upstream cost.
- The timeout should be configurable independently from the request timeout, and should likely apply *between* chunks (idle timeout) rather than as an absolute deadline from stream open.
- On chunk timeout expiry, the handler should emit a structured Anthropic-style `error` SSE event and terminate the stream cleanly, ensuring `usage_rx` is closed and `request_ctx`/`span_ctx` are finalized.
- Both `src/proxy/handlers/chat_completions/mod.rs` and `src/proxy/handlers/messages/mod.rs` are affected.

## References

- PR: https://github.com/api7/aisix/pull/39
- Review comment: https://github.com/api7/aisix/pull/39#discussion_r3077955369

/cc @bzp2010

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(proxy): add per-chunk timeout control for SSE streaming responses #40

Background

Considerations

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat(proxy): add per-chunk timeout control for SSE streaming responses #40

Description

Background

Considerations

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions