Conversation
… mechanisms - Added heartbeat functionality to SSEStreamer for improved client connection management. - Implemented periodic cleanup in RedisEventSubscriber to remove stale subscriptions. - Updated internal prediction and prediction controllers to handle client disconnects and manage subscriptions more effectively. - Refactored RedisEventPublisher to streamline event publishing with error handling. These changes improve the reliability and performance of server-side event streaming and Redis event handling.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses critical reliability issues in SSE streaming, particularly when Flowise operates in QUEUE mode. The changes focus on preventing resource leaks, such as unmanaged Redis Pub/Sub subscriptions and stale SSE client connections, which previously led to progressive degradation and eventual failure of streaming services. By implementing robust cleanup mechanisms, client disconnect detection, and safe communication patterns, the update significantly enhances the stability and longevity of SSE streaming under sustained usage. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request provides a comprehensive fix for a critical Redis Pub/Sub subscription leak and improves SSE streaming reliability by introducing proper cleanup for subscriptions, handling client disconnects, and ensuring safe writes to SSE streams. However, a security audit identified critical and high-severity vulnerabilities in client connection management: the use of user-controlled chatId as a key in a plain JavaScript object exposes the server to Prototype Pollution, potentially leading to server-wide information disclosure, and the lack of uniqueness and ownership checks for chatId allows for SSE session hijacking. It is strongly recommended to use a Map for client management and to generate chatIds server-side using a secure random generator to mitigate these risks. Additionally, there are suggestions to further improve code clarity and maintainability by reducing code duplication and adopting safer iteration patterns.
| private safeWrite(chatId: string, data: string): boolean { | ||
| const client = this.clients[chatId] |
There was a problem hiding this comment.
The safeWrite method performs a property lookup on the this.clients object using the chatId parameter, which is directly controlled by the user. Because clients is initialized as a plain JavaScript object ({}), an attacker can provide __proto__ as the chatId in a request. This pollutes the prototype of the clients object.
Consequently, any subsequent lookup for a non-existent or disconnected chatId (e.g., during event streaming or periodic cleanup) will return the attacker's connection from the prototype instead of undefined. This leads to a critical server-wide information leak where chat events intended for other users are routed to the attacker's connection. To remediate this, initialize this.clients using Object.create(null) or, preferably, use a Map<string, Client> object which is not susceptible to prototype pollution.
| const apiResponse = await predictionsServices.buildChatflow(req) | ||
| sseStreamer.streamMetadataEvent(apiResponse.chatId, apiResponse) |
There was a problem hiding this comment.
The chatId used to identify and route Server-Sent Events (SSE) is taken directly from the user-supplied request body without any uniqueness or ownership validation. An attacker can provide a chatId that is already in use by another user's active session. This will overwrite the victim's connection in the server's internal clients map. As a result, any subsequent events generated for the victim's chat session will be routed to the attacker's connection, allowing for session hijacking. The server should be responsible for generating unique, cryptographically secure chatIds (e.g., using uuidv4()) and returning them to the client, rather than allowing the client to provide them.
| const apiResponse = await utilBuildChatflow(req, true) | ||
| sseStreamer.streamMetadataEvent(apiResponse.chatId, apiResponse) |
There was a problem hiding this comment.
Similar to the external predictions controller, the chatId here is user-controlled and used to route SSE events. An attacker with access to the internal API can hijack another user's chat stream within the same workspace by providing a colliding chatId. This allows the attacker to receive real-time chat events intended for another user. Ensure that chatId is generated server-side or validated for uniqueness and ownership before being used to establish an SSE connection.
| if (process.env.MODE === MODE.QUEUE) { | ||
| getRunningExpressApp().redisSubscriber.unsubscribe(chatId) | ||
| } |
There was a problem hiding this comment.
For consistency, you can use the isQueueMode constant defined on line 45. Also, the cleanup logic is duplicated from the res.on('close') handler. It would be cleaner to extract this into a shared function to avoid duplication and potential future bugs if one place is updated and the other is not.
| if (process.env.MODE === MODE.QUEUE) { | |
| getRunningExpressApp().redisSubscriber.unsubscribe(chatId) | |
| } | |
| if (isQueueMode) { | |
| getRunningExpressApp().redisSubscriber.unsubscribe(chatId) | |
| } |
| if (process.env.MODE === MODE.QUEUE) { | ||
| getRunningExpressApp().redisSubscriber.unsubscribe(chatId) | ||
| } |
There was a problem hiding this comment.
For consistency, you can use the isQueueMode constant defined on line 75. Also, the cleanup logic is duplicated from the res.on('close') handler. It would be cleaner to extract this into a shared function to avoid duplication and potential future bugs if one place is updated and the other is not.
| if (process.env.MODE === MODE.QUEUE) { | |
| getRunningExpressApp().redisSubscriber.unsubscribe(chatId) | |
| } | |
| if (isQueueMode) { | |
| getRunningExpressApp().redisSubscriber.unsubscribe(chatId) | |
| } |
| let cleaned = 0 | ||
| for (const channel of this.subscribedChannels) { | ||
| if (!this.sseStreamer.clients[channel]) { | ||
| this.unsubscribe(channel) | ||
| cleaned++ | ||
| } | ||
| } | ||
| if (cleaned > 0) { | ||
| logger.info( | ||
| `[RedisEventSubscriber] Periodic cleanup: removed ${cleaned} stale subscriptions. Remaining: ${this.subscribedChannels.size}` | ||
| ) | ||
| } |
There was a problem hiding this comment.
Modifying a collection (this.subscribedChannels) while iterating over it can be a source of bugs, even though it might be safe for Set in this specific case. A clearer and safer pattern is to first collect the items to be removed and then perform the removal in a separate loop. This improves readability and avoids potential issues.
const staleChannels = Array.from(this.subscribedChannels).filter(
(channel) => !this.sseStreamer.clients[channel]
)
if (staleChannels.length > 0) {
for (const channel of staleChannels) {
this.unsubscribe(channel)
}
logger.info(
`[RedisEventSubscriber] Periodic cleanup: removed ${staleChannels.length} stale subscriptions. Remaining: ${this.subscribedChannels.size}`
)
}
Redis Pub/Sub subscription leak causing SSE streaming failure in QUEUE mode
Summary
Fix critical Redis Pub/Sub subscription leak and multiple SSE streaming reliability issues that cause streaming to fail after hours of sustained usage in
QUEUEmode.Problem
When running Flowise in QUEUE mode, SSE streaming progressively degrades and eventually stops working entirely after approximately one day of usage. Events return empty, and only a fresh deployment resolves the issue temporarily.
Root Cause Analysis
Three compounding issues were identified:
1. Redis Pub/Sub channel subscriptions never unsubscribed (PRIMARY)
Each SSE streaming request subscribes the Redis subscriber client to a unique
chatIdchannel viaredisSubscriber.subscribe(chatId). When the request completes,sseStreamer.removeClient(chatId)cleans up the SSE client — but the Redis channel subscription persists forever. ThesubscribedChannelsSet and the underlying Redis SUBSCRIBE list grow without bound.Over ~24 hours with thousands of requests, the subscriber accumulates thousands of dead channel subscriptions. Redis enforces
client-output-buffer-limitfor pubsub clients (default:pubsub 32mb 8mb 60). When messages flowing to dead channels fill this buffer, Redis forcefully disconnects the subscriber client, causing all SSE streaming to fail globally.2. No client disconnect detection
When SSE clients disconnect prematurely (browser close, network drop, ALB idle timeout at 2000s), the server never learns about it. There are no
res.on('close')handlers, so both the SSE client object and Redis subscription leak until the server process restarts.3. Unsafe writes to closed SSE connections
SSEStreamer.removeClient()callsresponse.write()followed byresponse.end(), thendelete this.clients[chatId]. If the client already disconnected,response.write()throws an error, and thedeleteline never executes — causing the client object to leak permanently in theclientsdictionary.Changes
packages/server/src/queue/RedisEventSubscriber.tsunsubscribe(channel)method that callsthis.redisSubscriber.unsubscribe(channel)and removes fromsubscribedChannelsSetgetSubscriptionCount()for observabilitystartPeriodicCleanup()— every 60s, unsubscribes channels with no active SSE client as a safety netdisconnect()to clear the cleanup intervalpackages/server/src/controllers/predictions/index.tsres.on('close')handler to detect client disconnects (browser close, ALB timeout, network drop)redisSubscriber.unsubscribe(chatId)in both theclosehandler andfinallyblock to ensure cleanuppackages/server/src/controllers/internal-predictions/index.tspackages/server/src/utils/SSEStreamer.tssafeWrite()helper that wrapsresponse.write()in try/catch and auto-removes dead clients on failureremoveClient()in try/finally sodelete this.clients[chatId]always executes even ifresponse.write()throwsstreamTTSAbortEvent()which also calls write+end+deletesafeWrite()in allstreamXxxEventmethodsstartHeartbeat()/stopHeartbeat()— sends:heartbeat\n\n(SSE comment) to all active clients every 30s to keep ALB connections alive and detect dead clients earlypackages/server/src/queue/RedisEventPublisher.tssafePublish()helper that checksisReadybefore publishing,awaits the publish call (was fire-and-forget), and usesloggerinstead ofconsole.errorsafePublish— reduces boilerplate significantlypackages/server/src/index.tsTest plan
:heartbeat) appear in SSE stream every 30s when inspecting the raw event stream[RedisEventSubscriber]log lines in CloudWatch — subscription count should stay bounded proportional to concurrent active streamsPeriodic cleanup: removed N stale subscriptions) appear when stale subscriptions existPUBSUB NUMSUBandCLIENT LIST TYPE pubsubvia db-proxy to verify subscription count stays lowBytesUsedForCachetrends stable/downward compared to pre-fix baselineDiagnostics (for verifying the fix in production)
SSH into the db-proxy and run these commands to monitor Redis pub/sub health: