Skip to content

feat: Add Redis-based workspace stream quota for WebRTC sessions#2025

Open
rafel-roboflow wants to merge 40 commits intomainfrom
feat/dg-232-set-rate-limit-to-10-concurrent-streams-and-update
Open

feat: Add Redis-based workspace stream quota for WebRTC sessions#2025
rafel-roboflow wants to merge 40 commits intomainfrom
feat/dg-232-set-rate-limit-to-10-concurrent-streams-and-update

Conversation

@rafel-roboflow
Copy link
Contributor

@rafel-roboflow rafel-roboflow commented Feb 20, 2026

  • Limit concurrent WebRTC streams per workspace (default: 10)
  • Return HTTP 429 when quota exceeded
  • Add heartbeat endpoint for Modal workers to refresh session TTL

What does this PR do?

Related Issue(s): DG-232

Type of Change

  • New feature (non-breaking change that adds functionality)

Testing

  • I have tested this change locally

Test details:
I put max connections=3;

  • easy case: one, two, three; one after each.
  • case 1: one, two, wait 2 min, retry 3 ... is blocked; wait 8 minutes, retry 3... is blocked.
  • case 2: one, two, close two, open two, ... 3rd is blocked.

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code where necessary, particularly in hard-to-understand areas
  • My changes generate no new warnings or errors
  • I have updated the documentation accordingly (if applicable)

Additional Context


Note

Medium Risk
Adds Redis-backed quota enforcement to WebRTC session startup and a new heartbeat/end API, which can block new sessions (HTTP 429) and depends on Redis/heartbeat timing for correct slot cleanup.

Overview
Implements a Redis-backed per-workspace concurrent WebRTC stream quota (default 10) enforced when starting Modal-based WebRTC workers; excess sessions raise WorkspaceStreamQuotaError and now return HTTP 429.

Adds session tracking + TTL refresh via Redis sorted sets (register_webrtc_session, refresh_webrtc_session, deregister_webrtc_session) and introduces new HTTP endpoints POST /webrtc/session/heartbeat and POST /webrtc/session/heartbeat/end for Modal workers to keep sessions alive and free quota slots on shutdown.

Extends the Modal worker Watchdog to periodically POST heartbeats (and send an end signal on stop) and passes workspace_id/session_id through WebRTCWorkerRequest; also guards heartbeat_callback calls to handle it being None.

Written by Cursor Bugbot for commit bfd3f93. This will update automatically on new commits. Configure here.

- Limit concurrent WebRTC streams per workspace (default: 10)
- Return HTTP 429 when quota exceeded
- Add heartbeat endpoint for Modal workers to refresh session TTL
@PawelPeczek-Roboflow
Copy link
Collaborator

@rafel-roboflow - sorry, will not be added to todays release

…-set-rate-limit-to-10-concurrent-streams-and-update
…-update' of github.com:roboflow/inference into feat/dg-232-set-rate-limit-to-10-concurrent-streams-and-update
@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Feb 20, 2026

⚡️ Codeflash found optimizations for this PR

📄 153% (1.53x) speedup for with_route_exceptions_async in inference/core/interfaces/http/error_handlers.py

⏱️ Runtime : 538 microseconds 212 microseconds (best of 5 runs)

A dependent PR with the suggested changes has been created. Please review:

If you approve, it will be merged into this PR (branch feat/dg-232-set-rate-limit-to-10-concurrent-streams-and-update).

Static Badge

@rafel-roboflow rafel-roboflow marked this pull request as ready for review February 24, 2026 08:26
session_id,
workspace_id,
)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing session cleanup on Modal spawn failure leaks quota

Medium Severity

register_webrtc_session is called before spawn_rtc_peer_connection_modal, but there's no try/finally to call deregister_webrtc_session if the spawn fails. spawn_rtc_peer_connection_modal has many failure points before a watchdog is ever created (plan validation, Modal client auth, app lookup, workflow spec retrieval, etc.). Each failure leaves a phantom session in Redis that occupies a quota slot until TTL expiry (default 60s). With a low quota (e.g. 3 during testing), a few rapid retries of a failing request can lock the workspace out entirely.

Fix in Cursor Fix in Web

…-set-rate-limit-to-10-concurrent-streams-and-update
@rafel-roboflow rafel-roboflow marked this pull request as draft February 27, 2026 12:42
@rafel-roboflow rafel-roboflow marked this pull request as ready for review February 27, 2026 17:02
@rafel-roboflow rafel-roboflow changed the title WIP: feat: Add Redis-based workspace stream quota for WebRTC sessions feat: Add Redis-based workspace stream quota for WebRTC sessions Feb 27, 2026
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Unconditional blocking HTTP call in async function
    • start_worker now resolves workspace/session only when quota or heartbeat is enabled and uses await get_roboflow_workspace_async(...) to avoid blocking the event loop.

Create PR

Or push these changes by commenting:

@cursor push d569c37645
Preview (d569c37645)
diff --git a/inference/core/interfaces/webrtc_worker/__init__.py b/inference/core/interfaces/webrtc_worker/__init__.py
--- a/inference/core/interfaces/webrtc_worker/__init__.py
+++ b/inference/core/interfaces/webrtc_worker/__init__.py
@@ -6,6 +6,7 @@
     WEBRTC_MODAL_TOKEN_ID,
     WEBRTC_MODAL_TOKEN_SECRET,
     WEBRTC_MODAL_USAGE_QUOTA_ENABLED,
+    WEBRTC_SESSION_HEARTBEAT_URL,
     WEBRTC_WORKSPACE_STREAM_QUOTA,
     WEBRTC_WORKSPACE_STREAM_QUOTA_ENABLED,
     WEBRTC_WORKSPACE_STREAM_TTL_SECONDS,
@@ -19,7 +20,7 @@
     WebRTCWorkerResult,
 )
 from inference.core.logger import logger
-from inference.core.roboflow_api import get_roboflow_workspace
+from inference.core.roboflow_api import get_roboflow_workspace_async
 
 
 async def start_worker(
@@ -56,10 +57,14 @@
                 raise CreditsExceededError("API key over quota")
 
         workspace_id = None
-        session_id = str(uuid.uuid4())
-        workspace_id = get_roboflow_workspace(api_key=webrtc_request.api_key)
-        webrtc_request.workspace_id = workspace_id
-        webrtc_request.session_id = session_id
+        session_id = None
+        if WEBRTC_WORKSPACE_STREAM_QUOTA_ENABLED or WEBRTC_SESSION_HEARTBEAT_URL:
+            session_id = str(uuid.uuid4())
+            workspace_id = await get_roboflow_workspace_async(
+                api_key=webrtc_request.api_key
+            )
+            webrtc_request.workspace_id = workspace_id
+            webrtc_request.session_id = session_id
 
         if WEBRTC_WORKSPACE_STREAM_QUOTA_ENABLED:
             if workspace_id and is_over_workspace_session_quota(
@@ -77,7 +82,7 @@
                     f"concurrent streams."
                 )
 
-            if workspace_id:
+            if workspace_id and session_id:
                 register_webrtc_session(
                     workspace_id=workspace_id,
                     session_id=session_id,
This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

session_id = str(uuid.uuid4())
workspace_id = get_roboflow_workspace(api_key=webrtc_request.api_key)
webrtc_request.workspace_id = workspace_id
webrtc_request.session_id = session_id
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unconditional blocking HTTP call in async function

Medium Severity

get_roboflow_workspace is a synchronous function that makes an HTTP request to the Roboflow API, but it's called directly inside async def start_worker, blocking the event loop. Additionally, this call runs unconditionally for every WebRTC session, even when both WEBRTC_WORKSPACE_STREAM_QUOTA_ENABLED (default: False) and WEBRTC_SESSION_HEARTBEAT_URL (default: None) are disabled — meaning the resulting workspace_id is never actually used. The async counterpart get_roboflow_workspace_async exists and is already used in the heartbeat endpoints.

Fix in Cursor Fix in Web

grzegorz-roboflow and others added 7 commits March 4, 2026 10:30
…-set-rate-limit-to-10-concurrent-streams-and-update
…endpoints

The webrtc_session_heartbeat and webrtc_session_end endpoints were missing
the error handler decorator that other endpoints use. This ensures unhandled
exceptions are properly logged and return appropriate error responses instead
of generic 500 errors.
…dpoints

- Add missing @with_route_exceptions_async decorator to webrtc_session_heartbeat
  and webrtc_session_end endpoints for consistent exception handling
- Create WebRTCSessionHeartbeatRequest Pydantic model for request body validation
…-update' of github.com:roboflow/inference into feat/dg-232-set-rate-limit-to-10-concurrent-streams-and-update
count = get_concurrent_session_count(workspace_id, ttl_seconds)
logger.info(
"Workspace %s has %d concurrent sessions (quota: %d)",
workspace_id,

Check failure

Code scanning / CodeQL

Clear-text logging of sensitive information High

This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.

Copilot Autofix

AI 3 days ago

In general, the fix is to ensure that data derived from the API key (or otherwise considered sensitive) is not written directly to logs. Here that means changing log messages which include workspace_id so they no longer output the raw identifier, while preserving the usefulness of the logs.

Concretely, we will adjust logging in inference/core/interfaces/webrtc_worker/utils.py and inference/core/interfaces/webrtc_worker/__init__.py:

  1. In is_over_workspace_session_quota (utils.py), change the logger.info call to remove the workspace_id from the message and arguments. We will keep the count and quota so operators can still see quota usage, but not which specific workspace hit that count.
  2. In start_worker (init.py), change the warning when quota is exceeded and the final info log about starting a session so they no longer include workspace_id. They will still report the quota value and the session id, which is sufficient for debugging without exposing workspace identifiers.

No new methods or imports are needed; we only modify existing log message strings and their parameters. Functionality (quotas, control flow, return values) remains unchanged.


Suggested changeset 2
inference/core/interfaces/webrtc_worker/utils.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/inference/core/interfaces/webrtc_worker/utils.py b/inference/core/interfaces/webrtc_worker/utils.py
--- a/inference/core/interfaces/webrtc_worker/utils.py
+++ b/inference/core/interfaces/webrtc_worker/utils.py
@@ -410,8 +410,7 @@
     """
     count = get_concurrent_session_count(workspace_id, ttl_seconds)
     logger.info(
-        "Workspace %s has %d concurrent sessions (quota: %d)",
-        workspace_id,
+        "Workspace has %d concurrent sessions (quota: %d)",
         count,
         quota,
     )
EOF
@@ -410,8 +410,7 @@
"""
count = get_concurrent_session_count(workspace_id, ttl_seconds)
logger.info(
"Workspace %s has %d concurrent sessions (quota: %d)",
workspace_id,
"Workspace has %d concurrent sessions (quota: %d)",
count,
quota,
)
inference/core/interfaces/webrtc_worker/__init__.py
Outside changed files

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/inference/core/interfaces/webrtc_worker/__init__.py b/inference/core/interfaces/webrtc_worker/__init__.py
--- a/inference/core/interfaces/webrtc_worker/__init__.py
+++ b/inference/core/interfaces/webrtc_worker/__init__.py
@@ -68,8 +68,7 @@
                 ttl_seconds=WEBRTC_WORKSPACE_STREAM_TTL_SECONDS,
             ):
                 logger.warning(
-                    "Workspace %s has exceeded the stream quota of %d",
-                    workspace_id,
+                    "A workspace has exceeded the stream quota of %d",
                     WEBRTC_WORKSPACE_STREAM_QUOTA,
                 )
                 raise WorkspaceStreamQuotaError(
@@ -92,9 +91,8 @@
                 )
 
         logger.info(
-            "Started WebRTC session %s for workspace %s",
+            "Started WebRTC session %s",
             session_id,
-            workspace_id,
         )
 
         loop = asyncio.get_event_loop()
EOF
@@ -68,8 +68,7 @@
ttl_seconds=WEBRTC_WORKSPACE_STREAM_TTL_SECONDS,
):
logger.warning(
"Workspace %s has exceeded the stream quota of %d",
workspace_id,
"A workspace has exceeded the stream quota of %d",
WEBRTC_WORKSPACE_STREAM_QUOTA,
)
raise WorkspaceStreamQuotaError(
@@ -92,9 +91,8 @@
)

logger.info(
"Started WebRTC session %s for workspace %s",
"Started WebRTC session %s",
session_id,
workspace_id,
)

loop = asyncio.get_event_loop()
Copilot is powered by AI and may make mistakes. Always verify output.
session_refreshed = refresh_webrtc_session(
workspace_id=workspace_id,
session_id=request.session_id,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sync blocking Redis calls in async endpoints

Low Severity

refresh_webrtc_session and deregister_webrtc_session are synchronous functions that make multiple blocking Redis calls (zscore, zadd, expire, zrem), but they're called from async def FastAPI endpoints. This blocks the event loop for the duration of each Redis round-trip. Since the heartbeat endpoint is called every 30 seconds per active session, this could accumulate under load.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

The heartbeat and session-end endpoints were returning plain dicts with
{"status": "error"} on auth failure, which FastAPI serialized as HTTP 200.
This caused the watchdog client to log these as successful heartbeats,
masking authentication failures from operators.

Now raises HTTPException with 401 for unauthorized and 404 for session
not found.
…-set-rate-limit-to-10-concurrent-streams-and-update
…-update' of github.com:roboflow/inference into feat/dg-232-set-rate-limit-to-10-concurrent-streams-and-update
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants