feat: Add Redis-based workspace stream quota for WebRTC sessions by rafel-roboflow · Pull Request #2025 · roboflow/inference

rafel-roboflow · 2026-02-20T14:14:06Z

Limit concurrent WebRTC streams per workspace (default: 10)
Return HTTP 429 when quota exceeded
Add heartbeat endpoint for Modal workers to refresh session TTL

What does this PR do?

Related Issue(s): DG-232

Type of Change

New feature (non-breaking change that adds functionality)

Testing

I have tested this change locally

Test details:
I put max connections=3;

easy case: one, two, three; one after each.
case 1: one, two, wait 2 min, retry 3 ... is blocked; wait 8 minutes, retry 3... is blocked.
case 2: one, two, close two, open two, ... 3rd is blocked.

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code where necessary, particularly in hard-to-understand areas
My changes generate no new warnings or errors
I have updated the documentation accordingly (if applicable)

Additional Context

Note

Medium Risk
Adds Redis-backed quota enforcement to WebRTC session startup and a new heartbeat/end API, which can block new sessions (HTTP 429) and depends on Redis/heartbeat timing for correct slot cleanup.

Overview
Implements a Redis-backed per-workspace concurrent WebRTC stream quota (default 10) enforced when starting Modal-based WebRTC workers; excess sessions raise WorkspaceStreamQuotaError and now return HTTP 429.

Adds session tracking + TTL refresh via Redis sorted sets (register_webrtc_session, refresh_webrtc_session, deregister_webrtc_session) and introduces new HTTP endpoints POST /webrtc/session/heartbeat and POST /webrtc/session/heartbeat/end for Modal workers to keep sessions alive and free quota slots on shutdown.

Extends the Modal worker Watchdog to periodically POST heartbeats (and send an end signal on stop) and passes workspace_id/session_id through WebRTCWorkerRequest; also guards heartbeat_callback calls to handle it being None.

^{Written by Cursor Bugbot for commit bfd3f93. This will update automatically on new commits. Configure here.}

- Limit concurrent WebRTC streams per workspace (default: 10) - Return HTTP 429 when quota exceeded - Add heartbeat endpoint for Modal workers to refresh session TTL

inference/core/interfaces/webrtc_worker/__init__.py

inference/core/interfaces/webrtc_worker/utils.py

…streams-and-update

PawelPeczek-Roboflow · 2026-02-20T15:04:24Z

@rafel-roboflow - sorry, will not be added to todays release

…-set-rate-limit-to-10-concurrent-streams-and-update

…-update' of github.com:roboflow/inference into feat/dg-232-set-rate-limit-to-10-concurrent-streams-and-update

codeflash-ai · 2026-02-20T19:21:36Z

⚡️ Codeflash found optimizations for this PR

📄 153% (1.53x) speedup for `with_route_exceptions_async` in `inference/core/interfaces/http/error_handlers.py`

⏱️ Runtime : 538 microseconds → 212 microseconds (best of 5 runs)

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function with_route_exceptions_async by 153% in PR #2025 (feat/dg-232-set-rate-limit-to-10-concurrent-streams-and-update) #2027

If you approve, it will be merged into this PR (branch feat/dg-232-set-rate-limit-to-10-concurrent-streams-and-update).

inference/core/interfaces/webrtc_worker/utils.py

inference/core/interfaces/http/http_api.py

…etr' into feat/dg-232-set-rate-limit-to-10-concurrent-streams-and-update

…streams-and-update

…-set-rate-limit-to-10-concurrent-streams-and-update

…-update' of github.com:roboflow/inference into feat/dg-232-set-rate-limit-to-10-concurrent-streams-and-update

…streams-and-update

inference/core/interfaces/webrtc_worker/utils.py

inference/core/interfaces/webrtc_worker/watchdog.py

cursor · 2026-02-27T10:52:44Z

inference/core/interfaces/webrtc_worker/__init__.py

+                    session_id,
+                    workspace_id,
+                )
+


Missing session cleanup on Modal spawn failure leaks quota

Medium Severity

register_webrtc_session is called before spawn_rtc_peer_connection_modal, but there's no try/finally to call deregister_webrtc_session if the spawn fails. spawn_rtc_peer_connection_modal has many failure points before a watchdog is ever created (plan validation, Modal client auth, app lookup, workflow spec retrieval, etc.). Each failure leaves a phantom session in Redis that occupies a quota slot until TTL expiry (default 60s). With a low quota (e.g. 3 during testing), a few rapid retries of a failing request can lock the workspace out entirely.

…-set-rate-limit-to-10-concurrent-streams-and-update

inference/core/interfaces/webrtc_worker/__init__.py

This reverts commit 5f7889f.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Unconditional blocking HTTP call in async function
- start_worker now resolves workspace/session only when quota or heartbeat is enabled and uses await get_roboflow_workspace_async(...) to avoid blocking the event loop.

Or push these changes by commenting:

@cursor push d569c37645

Preview (d569c37645)

diff --git a/inference/core/interfaces/webrtc_worker/__init__.py b/inference/core/interfaces/webrtc_worker/__init__.py
--- a/inference/core/interfaces/webrtc_worker/__init__.py
+++ b/inference/core/interfaces/webrtc_worker/__init__.py
@@ -6,6 +6,7 @@
     WEBRTC_MODAL_TOKEN_ID,
     WEBRTC_MODAL_TOKEN_SECRET,
     WEBRTC_MODAL_USAGE_QUOTA_ENABLED,
+    WEBRTC_SESSION_HEARTBEAT_URL,
     WEBRTC_WORKSPACE_STREAM_QUOTA,
     WEBRTC_WORKSPACE_STREAM_QUOTA_ENABLED,
     WEBRTC_WORKSPACE_STREAM_TTL_SECONDS,
@@ -19,7 +20,7 @@
     WebRTCWorkerResult,
 )
 from inference.core.logger import logger
-from inference.core.roboflow_api import get_roboflow_workspace
+from inference.core.roboflow_api import get_roboflow_workspace_async
 
 
 async def start_worker(
@@ -56,10 +57,14 @@
                 raise CreditsExceededError("API key over quota")
 
         workspace_id = None
-        session_id = str(uuid.uuid4())
-        workspace_id = get_roboflow_workspace(api_key=webrtc_request.api_key)
-        webrtc_request.workspace_id = workspace_id
-        webrtc_request.session_id = session_id
+        session_id = None
+        if WEBRTC_WORKSPACE_STREAM_QUOTA_ENABLED or WEBRTC_SESSION_HEARTBEAT_URL:
+            session_id = str(uuid.uuid4())
+            workspace_id = await get_roboflow_workspace_async(
+                api_key=webrtc_request.api_key
+            )
+            webrtc_request.workspace_id = workspace_id
+            webrtc_request.session_id = session_id
 
         if WEBRTC_WORKSPACE_STREAM_QUOTA_ENABLED:
             if workspace_id and is_over_workspace_session_quota(
@@ -77,7 +82,7 @@
                     f"concurrent streams."
                 )
 
-            if workspace_id:
+            if workspace_id and session_id:
                 register_webrtc_session(
                     workspace_id=workspace_id,
                     session_id=session_id,

_{This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.}

cursor · 2026-03-02T13:42:48Z

inference/core/interfaces/webrtc_worker/__init__.py

+        session_id = str(uuid.uuid4())
+        workspace_id = get_roboflow_workspace(api_key=webrtc_request.api_key)
+        webrtc_request.workspace_id = workspace_id
+        webrtc_request.session_id = session_id


Unconditional blocking HTTP call in async function

Medium Severity

get_roboflow_workspace is a synchronous function that makes an HTTP request to the Roboflow API, but it's called directly inside async def start_worker, blocking the event loop. Additionally, this call runs unconditionally for every WebRTC session, even when both WEBRTC_WORKSPACE_STREAM_QUOTA_ENABLED (default: False) and WEBRTC_SESSION_HEARTBEAT_URL (default: None) are disabled — meaning the resulting workspace_id is never actually used. The async counterpart get_roboflow_workspace_async exists and is already used in the heartbeat endpoints.

…streams-and-update

…-set-rate-limit-to-10-concurrent-streams-and-update

…endpoints The webrtc_session_heartbeat and webrtc_session_end endpoints were missing the error handler decorator that other endpoints use. This ensures unhandled exceptions are properly logged and return appropriate error responses instead of generic 500 errors.

…dpoints - Add missing @with_route_exceptions_async decorator to webrtc_session_heartbeat and webrtc_session_end endpoints for consistent exception handling - Create WebRTCSessionHeartbeatRequest Pydantic model for request body validation

…-update' of github.com:roboflow/inference into feat/dg-232-set-rate-limit-to-10-concurrent-streams-and-update

inference/core/interfaces/webrtc_worker/utils.py

+    count = get_concurrent_session_count(workspace_id, ttl_seconds)
+    logger.info(
+        "Workspace %s has %d concurrent sessions (quota: %d)",
+        workspace_id,


In general, the fix is to ensure that data derived from the API key (or otherwise considered sensitive) is not written directly to logs. Here that means changing log messages which include workspace_id so they no longer output the raw identifier, while preserving the usefulness of the logs.

Concretely, we will adjust logging in inference/core/interfaces/webrtc_worker/utils.py and inference/core/interfaces/webrtc_worker/__init__.py:

In is_over_workspace_session_quota (utils.py), change the logger.info call to remove the workspace_id from the message and arguments. We will keep the count and quota so operators can still see quota usage, but not which specific workspace hit that count.

In start_worker (init.py), change the warning when quota is exceeded and the final info log about starting a session so they no longer include workspace_id. They will still report the quota value and the session id, which is sufficient for debugging without exposing workspace identifiers.

No new methods or imports are needed; we only modify existing log message strings and their parameters. Functionality (quotas, control flow, return values) remains unchanged.

cursor · 2026-03-04T12:52:29Z

inference/core/interfaces/http/http_api.py

+                session_refreshed = refresh_webrtc_session(
+                    workspace_id=workspace_id,
+                    session_id=request.session_id,
+                )


Sync blocking Redis calls in async endpoints

Low Severity

refresh_webrtc_session and deregister_webrtc_session are synchronous functions that make multiple blocking Redis calls (zscore, zadd, expire, zrem), but they're called from async def FastAPI endpoints. This blocks the event loop for the duration of each Redis round-trip. Since the heartbeat endpoint is called every 30 seconds per active session, this could accumulate under load.

Additional Locations (1)

inference/core/interfaces/http/http_api.py#L1739-L1743

…streams-and-update

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

inference/core/interfaces/http/http_api.py

The heartbeat and session-end endpoints were returning plain dicts with {"status": "error"} on auth failure, which FastAPI serialized as HTTP 200. This caused the watchdog client to log these as successful heartbeats, masking authentication failures from operators. Now raises HTTPException with 401 for unauthorized and 404 for session not found.

…-set-rate-limit-to-10-concurrent-streams-and-update

…-update' of github.com:roboflow/inference into feat/dg-232-set-rate-limit-to-10-concurrent-streams-and-update

feat: Add Redis-based workspace stream quota for WebRTC sessions

697b6e6

- Limit concurrent WebRTC streams per workspace (default: 10) - Return HTTP 429 when quota exceeded - Add heartbeat endpoint for Modal workers to refresh session TTL

rafel-roboflow requested review from PawelPeczek-Roboflow, dkosowski87, grzegorz-roboflow, hansent, probicheaux and yeldarby as code owners February 20, 2026 14:14

rafel-roboflow marked this pull request as draft February 20, 2026 14:14

github-advanced-security bot found potential problems Feb 20, 2026

View reviewed changes

Merge branch 'main' into feat/dg-232-set-rate-limit-to-10-concurrent-…

269d2c4

…streams-and-update

rafel-roboflow added 4 commits February 20, 2026 17:56

added debug

b9cefda

Merge branch 'main' of github.com:roboflow/inference into feat/dg-232…

e1af7f9

…-set-rate-limit-to-10-concurrent-streams-and-update

Merge branch 'feat/dg-232-set-rate-limit-to-10-concurrent-streams-and…

b83f828

…-update' of github.com:roboflow/inference into feat/dg-232-set-rate-limit-to-10-concurrent-streams-and-update

more debug; pass envs to modal env

fd0c661

codeflash-ai bot mentioned this pull request Feb 20, 2026

⚡️ Speed up function with_route_exceptions_async by 153% in PR #2025 (feat/dg-232-set-rate-limit-to-10-concurrent-streams-and-update) #2027

Open

feat: add detailed Redis logging for session tracking debugging"

b7c9a62

github-advanced-security bot found potential problems Feb 23, 2026

View reviewed changes

PawelPeczek-Roboflow and others added 9 commits February 23, 2026 12:01

Fix issue with RF-Detr model post-processing in TRT

3de18e7

Merge remote-tracking branch 'origin/fix/patch-post-processing-in-rfd…

4362ca0

…etr' into feat/dg-232-set-rate-limit-to-10-concurrent-streams-and-update

clean up

ee50855

Merge branch 'main' into feat/dg-232-set-rate-limit-to-10-concurrent-…

37650bd

…streams-and-update

clean up

e19d8ac

Merge branch 'main' of github.com:roboflow/inference into feat/dg-232…

400f71e

…-set-rate-limit-to-10-concurrent-streams-and-update

Merge branch 'feat/dg-232-set-rate-limit-to-10-concurrent-streams-and…

002e2a3

…-update' of github.com:roboflow/inference into feat/dg-232-set-rate-limit-to-10-concurrent-streams-and-update

set max conn webrtc to 10

aba5acd

isorted

71b3048

rafel-roboflow marked this pull request as ready for review February 24, 2026 08:26

Merge branch 'main' into feat/dg-232-set-rate-limit-to-10-concurrent-…

459bbd5

…streams-and-update

implemented deregister session to avoid keeping sessions forever

8d790fb

github-advanced-security bot found potential problems Feb 27, 2026

View reviewed changes

inference/core/interfaces/webrtc_worker/utils.py Dismissed Show dismissed Hide dismissed

cursor bot reviewed Feb 27, 2026

View reviewed changes

Merge branch 'main' of github.com:roboflow/inference into feat/dg-232…

cd8ebc3

…-set-rate-limit-to-10-concurrent-streams-and-update

rafel-roboflow marked this pull request as draft February 27, 2026 12:42

rafel-roboflow added 2 commits February 27, 2026 13:54

fix wrong endpoint refactor

49f7f15

include always workspace id

b23bd72

github-advanced-security bot found potential problems Feb 27, 2026

View reviewed changes

inference/core/interfaces/webrtc_worker/__init__.py Dismissed Show dismissed Hide dismissed

set ttl on redis zadd to avoid redis growing without control

3579f5d

rafel-roboflow marked this pull request as ready for review February 27, 2026 17:02

rafel-roboflow changed the title ~~WIP: feat: Add Redis-based workspace stream quota for WebRTC sessions~~ feat: Add Redis-based workspace stream quota for WebRTC sessions Feb 27, 2026

rafel-roboflow added 2 commits March 2, 2026 14:29

prevent calls if not api key to /usage/plan

5f7889f

Revert "prevent calls if not api key to /usage/plan"

bfd3f93

This reverts commit 5f7889f.

cursor bot reviewed Mar 2, 2026

View reviewed changes

grzegorz-roboflow and others added 7 commits March 4, 2026 10:30

Merge branch 'main' into feat/dg-232-set-rate-limit-to-10-concurrent-…

5a00f91

…streams-and-update

Merge branch 'main' of github.com:roboflow/inference into feat/dg-232…

1fc3b1e

…-set-rate-limit-to-10-concurrent-streams-and-update

clean up unused var

dfe6c22

Merge branch 'feat/dg-232-set-rate-limit-to-10-concurrent-streams-and…

54731ae

…-update' of github.com:roboflow/inference into feat/dg-232-set-rate-limit-to-10-concurrent-streams-and-update

added redis total sessions count

12b50de

grzegorz-roboflow previously approved these changes Mar 4, 2026

View reviewed changes

github-advanced-security bot found potential problems Mar 4, 2026

View reviewed changes

cursor bot reviewed Mar 4, 2026

View reviewed changes

Merge branch 'main' into feat/dg-232-set-rate-limit-to-10-concurrent-…

037206d

…streams-and-update

cursor bot reviewed Mar 4, 2026

View reviewed changes

inference/core/interfaces/http/http_api.py Outdated Show resolved Hide resolved

rafel-roboflow added 3 commits March 4, 2026 15:08

Merge branch 'main' of github.com:roboflow/inference into feat/dg-232…

c68fa5c

…-set-rate-limit-to-10-concurrent-streams-and-update

Merge branch 'feat/dg-232-set-rate-limit-to-10-concurrent-streams-and…

5bbca17

…-update' of github.com:roboflow/inference into feat/dg-232-set-rate-limit-to-10-concurrent-streams-and-update

rafel-roboflow dismissed grzegorz-roboflow’s stale review via 5bbca17 March 4, 2026 14:11

@@ -68,8 +68,7 @@
                             ttl_seconds=WEBRTC_WORKSPACE_STREAM_TTL_SECONDS,
                         ):
                             logger.warning(
-                                "Workspace %s has exceeded the stream quota of %d",
-                                workspace_id,
+                                "A workspace has exceeded the stream quota of %d",
                                 WEBRTC_WORKSPACE_STREAM_QUOTA,
                             )
                             raise WorkspaceStreamQuotaError(
@@ -92,9 +91,8 @@
                             )
                     logger.info(
-                        "Started WebRTC session %s for workspace %s",
+                        "Started WebRTC session %s",
                         session_id,
-                        workspace_id,
                     )
                     loop = asyncio.get_event_loop()

Conversation

rafel-roboflow commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Type of Change

Testing

Checklist

Additional Context

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PawelPeczek-Roboflow commented Feb 20, 2026

Uh oh!

codeflash-ai bot commented Feb 20, 2026

⚡️ Codeflash found optimizations for this PR

📄 153% (1.53x) speedup for with_route_exceptions_async in inference/core/interfaces/http/error_handlers.py

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function with_route_exceptions_async by 153% in PR #2025 (feat/dg-232-set-rate-limit-to-10-concurrent-streams-and-update) #2027

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Feb 27, 2026

Choose a reason for hiding this comment

Missing session cleanup on Modal spawn failure leaks quota

Uh oh!

Uh oh!

cursor bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 2, 2026

Choose a reason for hiding this comment

Unconditional blocking HTTP call in async function

Uh oh!

Check failure

Uh oh!

Uh oh!

Uh oh!

Copilot Autofix

cursor bot Mar 4, 2026

Choose a reason for hiding this comment

Sync blocking Redis calls in async endpoints

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rafel-roboflow commented Feb 20, 2026 •

edited

Loading

📄 153% (1.53x) speedup for `with_route_exceptions_async` in `inference/core/interfaces/http/error_handlers.py`

⚡️ Speed up function `with_route_exceptions_async` by 153% in PR #2025 (`feat/dg-232-set-rate-limit-to-10-concurrent-streams-and-update`) #2027

cursor bot left a comment •

edited

Loading