Skip to content

Load test WebSocket broadcast with 10+ concurrent sessions #3

@amahpour

Description

@amahpour

Problem

The WebSocket broadcast in server/routes/ws.py sends every session update to every connected dashboard client sequentially:

# Lines 19-33
async def broadcast_session_update(session: dict):
    if not _dashboard_clients:
        return

    message = json.dumps({"type": "session_update", "session": session})
    disconnected = set()
    for ws in _dashboard_clients:
        try:
            await ws.send_text(message)
        except Exception:
            disconnected.add(ws)

    for ws in disconnected:
        _dashboard_clients.discard(ws)

Connected clients are tracked in a simple set:

# Lines 12-13
_dashboard_clients: set[WebSocket] = set()

Potential issues at scale

1. Broadcast is sequential, not concurrent
Each ws.send_text() is awaited one at a time. If one client has a slow connection or full send buffer, it blocks delivery to all subsequent clients. With 10+ sessions generating frequent updates and multiple dashboard clients, this could create visible lag.

2. No message batching or throttling
Every session update triggers an immediate broadcast. Consider a scenario:

  • 10 active Claude Code sessions
  • Each session generates ~2-5 hook events per second (PreToolUse, PostToolUse, etc.)
  • That's 20-50 WebSocket messages per second to each dashboard client
  • The JSONL watcher adds additional updates on top of hook events

The dashboard may be receiving more updates than it can meaningfully render, creating unnecessary network and CPU load on both server and client.

3. No backpressure handling
If a client can't keep up with the message rate, messages queue up in the WebSocket send buffer. There's no mechanism to detect slow clients, drop stale updates, or disconnect clients that are too far behind.

4. JSON serialization per message
json.dumps() is called once per broadcast (good), but each message includes the full session object. If sessions carry large task_description fields or other metadata, message sizes add up.

What needs to happen

1. Load test to establish baseline

Before optimizing, measure the actual behavior:

  • Test setup: Script that simulates N concurrent sessions sending hook events at realistic rates, with M connected dashboard WebSocket clients.
  • Metrics to capture:
    • End-to-end latency: time from hook event arrival to WebSocket message received by client
    • Message throughput: messages/second delivered to each client
    • Server memory usage under load
    • CPU usage of the broadcast loop
    • Client-side message processing lag (if using a browser-based test)

Test scenarios:

Sessions Events/sec/session Dashboard clients Expected msgs/sec/client
5 2 1 10
10 3 1 30
15 5 1 75
10 3 3 30 (x3 clients)
20 5 5 100 (x5 clients)

2. Concurrent broadcast (if sequential proves slow)

Replace the sequential send loop with asyncio.gather:

async def broadcast_session_update(session: dict):
    if not _dashboard_clients:
        return

    message = json.dumps({"type": "session_update", "session": session})
    
    async def _send(ws):
        try:
            await asyncio.wait_for(ws.send_text(message), timeout=5.0)
        except Exception:
            return ws
        return None

    results = await asyncio.gather(*[_send(ws) for ws in _dashboard_clients])
    for ws in results:
        if ws is not None:
            _dashboard_clients.discard(ws)

This ensures one slow client doesn't block others. The wait_for timeout prevents indefinite blocking on a dead connection.

3. Message batching/throttling (if message rate proves excessive)

Implement a batching layer that collects updates and flushes at a fixed interval:

  • Collect session updates in a dict keyed by session_id (latest update wins)
  • Flush every 200-500ms, sending a single batch_update message with all changed sessions
  • This reduces 50 msgs/sec down to 2-5 msgs/sec with the same information density
  • Dashboard client needs to handle batch_update message type

4. Client-side throttling (quick win)

Regardless of server-side changes, the dashboard frontend should throttle DOM updates:

  • Use requestAnimationFrame to batch UI updates
  • Only re-render a session card if its data actually changed
  • Consider virtual scrolling if the session list grows large

5. Slow client detection

Add monitoring for client health:

  • Track the last successful send time per client
  • If a client hasn't acknowledged a ping in >30 seconds, disconnect it
  • Log warning when broadcast takes >100ms (indicating a slow client is blocking)

The current ping/pong mechanism (ws.py:55-57) already exists:

if data == "ping":
    await websocket.send_text(json.dumps({"type": "pong"}))

This could be extended to track client responsiveness.

Acceptance criteria

  • Load test script that simulates N sessions and M dashboard clients with configurable parameters
  • Baseline latency and throughput measurements documented for 5, 10, 15, and 20 concurrent sessions
  • If sequential broadcast latency exceeds 50ms at 10+ sessions, switch to concurrent asyncio.gather approach
  • If message rate exceeds 30 msgs/sec/client, implement server-side batching with a configurable flush interval
  • Client-side requestAnimationFrame throttling for DOM updates
  • Slow client detection and auto-disconnect after timeout

Technical context

  • WebSocket endpoint: /ws/dashboard (ws.py:36-63)
  • Broadcast is triggered from two sources:
    1. Hook events → hooks.py_notify_update()broadcast_session_update()
    2. JSONL watcher → watcher.py → DB update → broadcast (via the same callback wired in main.py:23)
  • The watcher already has a 1-second debounce (watcher.py:19), so JSONL-sourced updates are naturally throttled
  • Hook events have no debounce — they fire immediately on each Claude Code tool invocation
  • Dashboard clients receive initial_state on connect (ws.py:49-53) with all active sessions

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions