fix(streaming): resolve architectural flaws in SSE streaming#368
Open
chandan-1427 wants to merge 1 commit intoGetBindu:mainfrom
Open
fix(streaming): resolve architectural flaws in SSE streaming#368chandan-1427 wants to merge 1 commit intoGetBindu:mainfrom
chandan-1427 wants to merge 1 commit intoGetBindu:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
Following a review of the recent message/stream implementation, this PR addresses 4 critical production-grade architectural issues that would prevent the streaming feature from safely scaling under load.
Architectural Fixes
Resolved Database DDoS Risk (Polling Hammer): > * Issue: The while True loop was polling the storage layer every 100ms infinitely, which would instantly max out DB connection pools under multi-user load.
Fix: Implemented an exponential backoff for poll_interval capping at 2.0s to drastically reduce storage queries while maintaining responsiveness.
Fixed Telemetry Memory Leak: > * Issue: track_active_task was incrementing bindu_active_tasks on stream creation but failing to decrement on successful completion, leading to permanently inflated active task metrics.
Fix: Added decrement logic for create_operations upon successful function exit.
Closed Protocol Security Bypass: > * Issue: The isinstance(jsonrpc_response, Response) check was too broad, intercepting any Starlette response and bypassing strict a2a_response_ta validation.
Fix: Narrowed the type check strictly to StreamingResponse.
Fixed Infinite Loading on Stream Crash: > * Issue: If the stream generator crashed (e.g. DB outage) and the subsequent "failed" state update also failed, it yielded the last known state (working).
Fix: Forced the error_event to explicitly yield state: "failed" and final: True to guarantee the client stream closes gracefully.
Testing
Ran pytest suite locally (665 passing). Verified core A2A and message handlers remain stable.