Load test WebSocket broadcast with 10+ concurrent sessions

## Problem

The WebSocket broadcast in `server/routes/ws.py` sends every session update to every connected dashboard client sequentially:

```python
# Lines 19-33
async def broadcast_session_update(session: dict):
    if not _dashboard_clients:
        return

    message = json.dumps({"type": "session_update", "session": session})
    disconnected = set()
    for ws in _dashboard_clients:
        try:
            await ws.send_text(message)
        except Exception:
            disconnected.add(ws)

    for ws in disconnected:
        _dashboard_clients.discard(ws)
```

Connected clients are tracked in a simple set:
```python
# Lines 12-13
_dashboard_clients: set[WebSocket] = set()
```

### Potential issues at scale

**1. Broadcast is sequential, not concurrent**
Each `ws.send_text()` is awaited one at a time. If one client has a slow connection or full send buffer, it blocks delivery to all subsequent clients. With 10+ sessions generating frequent updates and multiple dashboard clients, this could create visible lag.

**2. No message batching or throttling**
Every session update triggers an immediate broadcast. Consider a scenario:
- 10 active Claude Code sessions
- Each session generates ~2-5 hook events per second (PreToolUse, PostToolUse, etc.)
- That's 20-50 WebSocket messages per second to each dashboard client
- The JSONL watcher adds additional updates on top of hook events

The dashboard may be receiving more updates than it can meaningfully render, creating unnecessary network and CPU load on both server and client.

**3. No backpressure handling**
If a client can't keep up with the message rate, messages queue up in the WebSocket send buffer. There's no mechanism to detect slow clients, drop stale updates, or disconnect clients that are too far behind.

**4. JSON serialization per message**
`json.dumps()` is called once per broadcast (good), but each message includes the full session object. If sessions carry large `task_description` fields or other metadata, message sizes add up.

## What needs to happen

### 1. Load test to establish baseline

Before optimizing, measure the actual behavior:

- **Test setup:** Script that simulates N concurrent sessions sending hook events at realistic rates, with M connected dashboard WebSocket clients.
- **Metrics to capture:**
  - End-to-end latency: time from hook event arrival to WebSocket message received by client
  - Message throughput: messages/second delivered to each client
  - Server memory usage under load
  - CPU usage of the broadcast loop
  - Client-side message processing lag (if using a browser-based test)

**Test scenarios:**
| Sessions | Events/sec/session | Dashboard clients | Expected msgs/sec/client |
|----------|-------------------|-------------------|-------------------------|
| 5        | 2                 | 1                 | 10                      |
| 10       | 3                 | 1                 | 30                      |
| 15       | 5                 | 1                 | 75                      |
| 10       | 3                 | 3                 | 30 (x3 clients)         |
| 20       | 5                 | 5                 | 100 (x5 clients)        |

### 2. Concurrent broadcast (if sequential proves slow)

Replace the sequential send loop with `asyncio.gather`:

```python
async def broadcast_session_update(session: dict):
    if not _dashboard_clients:
        return

    message = json.dumps({"type": "session_update", "session": session})
    
    async def _send(ws):
        try:
            await asyncio.wait_for(ws.send_text(message), timeout=5.0)
        except Exception:
            return ws
        return None

    results = await asyncio.gather(*[_send(ws) for ws in _dashboard_clients])
    for ws in results:
        if ws is not None:
            _dashboard_clients.discard(ws)
```

This ensures one slow client doesn't block others. The `wait_for` timeout prevents indefinite blocking on a dead connection.

### 3. Message batching/throttling (if message rate proves excessive)

Implement a batching layer that collects updates and flushes at a fixed interval:

- Collect session updates in a dict keyed by session_id (latest update wins)
- Flush every 200-500ms, sending a single `batch_update` message with all changed sessions
- This reduces 50 msgs/sec down to 2-5 msgs/sec with the same information density
- Dashboard client needs to handle `batch_update` message type

### 4. Client-side throttling (quick win)

Regardless of server-side changes, the dashboard frontend should throttle DOM updates:
- Use `requestAnimationFrame` to batch UI updates
- Only re-render a session card if its data actually changed
- Consider virtual scrolling if the session list grows large

### 5. Slow client detection

Add monitoring for client health:
- Track the last successful send time per client
- If a client hasn't acknowledged a ping in >30 seconds, disconnect it
- Log warning when broadcast takes >100ms (indicating a slow client is blocking)

The current ping/pong mechanism (`ws.py:55-57`) already exists:
```python
if data == "ping":
    await websocket.send_text(json.dumps({"type": "pong"}))
```

This could be extended to track client responsiveness.

## Acceptance criteria

- [ ] Load test script that simulates N sessions and M dashboard clients with configurable parameters
- [ ] Baseline latency and throughput measurements documented for 5, 10, 15, and 20 concurrent sessions
- [ ] If sequential broadcast latency exceeds 50ms at 10+ sessions, switch to concurrent `asyncio.gather` approach
- [ ] If message rate exceeds 30 msgs/sec/client, implement server-side batching with a configurable flush interval
- [ ] Client-side `requestAnimationFrame` throttling for DOM updates
- [ ] Slow client detection and auto-disconnect after timeout

## Technical context

- WebSocket endpoint: `/ws/dashboard` (`ws.py:36-63`)
- Broadcast is triggered from two sources:
  1. Hook events → `hooks.py` → `_notify_update()` → `broadcast_session_update()`
  2. JSONL watcher → `watcher.py` → DB update → broadcast (via the same callback wired in `main.py:23`)
- The watcher already has a 1-second debounce (`watcher.py:19`), so JSONL-sourced updates are naturally throttled
- Hook events have no debounce — they fire immediately on each Claude Code tool invocation
- Dashboard clients receive `initial_state` on connect (`ws.py:49-53`) with all active sessions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load test WebSocket broadcast with 10+ concurrent sessions #3

Problem

Potential issues at scale

What needs to happen

1. Load test to establish baseline

2. Concurrent broadcast (if sequential proves slow)

3. Message batching/throttling (if message rate proves excessive)

4. Client-side throttling (quick win)

5. Slow client detection

Acceptance criteria

Technical context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Load test WebSocket broadcast with 10+ concurrent sessions #3

Description

Problem

Potential issues at scale

What needs to happen

1. Load test to establish baseline

2. Concurrent broadcast (if sequential proves slow)

3. Message batching/throttling (if message rate proves excessive)

4. Client-side throttling (quick win)

5. Slow client detection

Acceptance criteria

Technical context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions