Stream Fake Plugin is a specialized plugin designed to solve timeout issues with non-streaming requests. When AI models take a long time to respond, non-streaming requests may timeout while waiting for the complete response. This plugin avoids timeout issues by internally converting non-streaming requests to streaming requests, then reassembling the streaming response back to non-streaming format for the client, thus solving timeout problems while maintaining client compatibility.
- Timeout Avoidance: Prevents request timeouts caused by long waits through streaming transmission
- Transparent Conversion: Automatically converts non-streaming requests to streaming format, transparent to clients
- Response Reconstruction: Collects all streaming data chunks and reconstructs them into complete non-streaming responses
- Content Integrity: Ensures all content types are properly processed and aggregated:
- Regular content
- Reasoning content (for models that support thinking processes)
- Tool calls and their proper merging
- Log probabilities
- Connection Keep-Alive: Maintains active connections through streaming transmission to avoid network timeouts
- Long Response Timeout: When AI models generate long texts or complex responses, non-streaming requests are prone to timeout
- Network Timeout: In unstable network environments, long waits for complete responses cause connection timeouts
- Proxy Timeout: When going through proxy servers, proxies may disconnect due to prolonged periods without data
Through internal streaming transmission, connections remain active at all times, avoiding various timeout issues while clients still receive the expected non-streaming response format.
- Long Text Generation: Avoiding timeouts when generating long articles, reports, or code
- Complex Reasoning Tasks: Handling complex problems that require extended thinking time
- Unstable Network Environments: Environments with high latency or unstable networks
- Strict Timeout Restrictions: Clients or middleware with strict timeout limitations
- Legacy System Compatibility: Legacy systems where client timeout settings cannot be modified
- Detects non-streaming chat completion requests (
"stream": falseor not set) - Identifies scenarios with long responses that may cause timeouts
- Modifies the request to streaming format (
"stream": true) - Forwards the modified request to upstream API
- Begins receiving streaming response data
- Receives streaming data chunks in real-time, keeping connection active
- Aggregates all response content
- Processes different types of content fragments
- Reconstructs complete non-streaming response format
- Sets correct response headers and returns to client
- Continuous Data Flow: Streaming responses ensure connections always have data transmission
- Connection Keep-Alive: Avoids disconnection due to prolonged periods without response
- Progressive Processing: Receives and processes simultaneously, reducing overall wait time
{
"model": "gpt-4",
"type": 1,
"plugin": {
"stream-fake": {
"enable": true
}
}
}| Field | Type | Required | Default | Description |
|---|---|---|---|---|
enable |
bool | Yes | false | Whether to enable Stream Fake Plugin to avoid timeout issues |
Problem: Requesting generation of a 5000-word technical document, non-streaming request times out after 60 seconds
Original Request:
{
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "Please write a detailed 5000-word technical document introducing microservice architecture design principles and best practices"
}
],
"stream": false,
"max_tokens": 4000
}Plugin Processing:
- Automatically converts to
"stream": true - Receives response fragments in real-time, avoiding timeout
- Reconstructs into complete non-streaming response for return
Problem: Complex mathematical problems require long thinking time, causing request timeout
Solution:
- Plugin ensures connection remains active during model thinking process
- No timeout occurs even with extended reasoning time
- Client ultimately receives complete reasoning results
- Eliminates Connection Timeouts: Streaming transmission keeps connections active
- Avoids Proxy Timeouts: Intermediate proxies won't disconnect due to prolonged periods without data
- Reduces Retry Attempts: Avoids request retries caused by timeouts
- Faster Perceived Response: While total time remains essentially the same, timeout retries are avoided
- Better User Experience: Avoids request failures and the need to reinitiate requests
- Improved Resource Utilization: Reduces resource waste caused by timeouts