perf(runtime): unblock event loop + bypass server for switch action#41
Open
panosAthDBX wants to merge 11 commits into
Open
perf(runtime): unblock event loop + bypass server for switch action#41panosAthDBX wants to merge 11 commits into
panosAthDBX wants to merge 11 commits into
Conversation
Adds an opt-in background poller that reads the macOS Appearance setting (`defaults read -g AppleInterfaceStyle`) every few seconds and flips the active theme between a configured dark and light pair when the user toggles System Settings → Appearance. Config: autoThemeFollowsSystem: true darkTheme: "catppuccin-mocha" (default) lightTheme: "catppuccin-latte" (default) macOS-gated — no-op on Linux/Windows. Uses the same broadcast path as the manual theme picker so every connected sidebar re-renders in sync. Poller is a 3s interval cleaned up on shutdown. Extracted pure mapping and side-effectful read into `system-theme.ts` for testability. Covered by 5 new tests.
When `autoThemeFollowsSystem` is on and the user manually picks a theme via
`set-theme`, route the persistence to `darkTheme` or `lightTheme` based on the
current macOS appearance instead of `theme`. Previously the choice was written
to `theme`, which the poll loop ignored, so the manual override was silently
overwritten on the next 3s tick.
Also re-load `darkTheme` / `lightTheme` from disk inside the poll cycle so an
override made via `set-theme` is picked up immediately on the next poll.
Drops the `saveConfig({ theme: desired })` write inside the poll loop — that
was clobbering the user's static-mode `theme` field whenever auto-follow ran.
Co-authored-by: Isaac
Two server hot-path optimizations targeting steady-state CPU and WS fan-out:
1. Replace the 3s `defaults read` polling loop with a kqueue-based file
watch on `~/Library/Preferences/.GlobalPreferences.plist`. macOS rewrites
that plist on any global-preference change, so we let the kernel push us
the event and re-read appearance only when the value differs. A 60s
safety poll covers the atomic-rename case where kqueue loses the inode.
Removes ~28,800 subprocess spawns per server per day.
2. Hash the serialized state in `broadcastStateImmediate()` and skip the
`server.publish("sidebar", msg)` when the payload is byte-identical to
the previous send. Many call sites trigger broadcastState() but most do
not actually change observable state (theme polls, focus moves to the
same session, agent updates with identical metadata). Wire protocol is
unchanged; new clients still receive `lastState` directly via the WS
`open` handler.
Tests: 400/400 passing (was 397; +3 covering the watcher's no-op fallback,
idempotent stop, and darwin initial-fire).
`watcherBroadcastTimer` was referenced in `cleanup()` but never declared, so every cleanup invocation (SIGINT/SIGTERM/30s idle timeout) crashed with `ReferenceError: watcherBroadcastTimer is not defined` instead of shutting down gracefully. Process exited anyway, so the bug was latent. Just drop the dead line.
The ensure-sidebar HTTP handler currently triggers two `tmux list-panes -a` calls per request: 1. ensureSidebarInWindow() lists panes to check for an existing sidebar (line 1206). Result cached for 300ms. 2. enforceSidebarWidth() invalidates the cache and lists again (line 1485) to walk panes for width enforcement. The second list is redundant — the cache is at most ~30ms old when enforce runs synchronously after the spawn check, and `tmux list-panes -a` is the single most expensive call on a busy tmux (50-200ms with 30+ panes). Add `reuseCache` to enforceSidebarWidth() and pass it from ensureSidebarInWindow(). Halves the list-panes work on every session switch. The standalone enforceSidebarWidth() callers (terminal resize, sidebar toggle, width sync) keep the existing default behaviour.
eventTimestamps is a server-internal diagnostic — the tracker uses it for
stale/active heuristics — but it was being shipped to every WS client on
every state broadcast. The TUI never reads the field.
Each agent-emit appends a fresh timestamp, so back-to-back identical
opencode/claude-code/etc. status pings ("running" → "running" with the same
threadId) produced different state hashes and bypassed the broadcast dedup
introduced in a733f28. With chatty agents this fanned out one full state
JSON to every TUI client at ~1Hz per active session, visibly stalling the
foreground client when switching to an agent-heavy session.
Make the field optional in the SessionData shape and stop populating it in
computeState(). Tracker still maintains the timestamps server-side; they
just no longer escape onto the wire. Hash-dedup now correctly suppresses
no-op agent emits.
When the idle-timeout cleanup runs and a manual `toggle.sh ensure_server` fires within the kernel's TIME_WAIT window (~30s on macOS), the new process hits EADDRINUSE on port 7391 even though `lsof -iTCP:7391` returns nothing — the socket is still reserved. Setting `reusePort: true` on Bun.serve lets the new process bind immediately. Single-server invariant is still enforced by PID file.
30s was too aggressive on restart paths. After any manual respawn or plugin update, the TUI clients live inside sidebar panes that don't exist yet — the user has to press the toggle key to spawn one, and that exceeded the previous 30s window in normal flow. The server would self-terminate before it could be useful, leading to a frustrating loop where every M-s hit a stale server and re-triggered ensure_server. 5 min gives a comfortable usable window without leaving zombie servers running indefinitely. Active servers with WS clients clear the timer the instant the first TUI connects, so this change only affects the no-clients case.
reusePort: true was the wrong fix for the EADDRINUSE-on-respawn issue. On macOS Bun's reusePort lets multiple processes bind the same port and share connections at the kernel level, but the opensessions runtime keeps all session/agent/sidebar state in process-local memory. Allowing duplicate servers meant clients round-robined across disjoint state and session switches behaved erratically. Revert reusePort and replace with a proper singleton guard: read PID_FILE, probe the recorded pid with `process.kill(pid, 0)`, and exit cleanly if that process is still alive. Stale PID files (process gone) are ignored and the new server takes over normally. This makes `toggle.sh ensure_server` idempotent — repeated invocations during the kernel's TIME_WAIT window collapse to no-ops instead of racing to spawn a second server.
Captures before/after numbers for the seven commits landed this session: push-based theme watch, broadcast hash-dedup, enforce cache reuse, eventTimestamps removal from the wire, idle-timeout bump, and the singleton-guard / reusePort revert. Headline: ~3.7× faster enforce dance on session switch, ~76% of agent-emit broadcasts suppressed under chatty agents, theme polling subprocess work eliminated, no more EADDRINUSE on respawn.
Three layered changes to bring rapid M-N session switching from
~1.5s/switch under load to ~60ms/switch:
1. Keybinding & script overhead
- Bind keys via `run-shell -b` so tmux's input loop doesn't block
while the shell script runs. Without -b, rapid presses serialise
on tmux's command queue and reentrant tmux calls inside the
script compound the wait.
- Pass `#{client_tty}|#{session_name}|#{window_id}` as a positional
arg from the bind-key, skipping a `tmux display-message` fork
inside the script (~16ms saved).
- server-common.sh: alive-check tmpfile (5s TTL) skips the
~80ms healthcheck curl on the hot path; plugin-dir / bun lookup
deferred to the cold-start branch only.
- Bump curl `-m 0.2` -> `-m 1.5`, `--connect-timeout 0.1` -> `0.3`,
and `|| true; exit 0` so tmux's run-shell never surfaces curl
exit 28 (timeout) in the status line during transient spikes.
2. Server-side event loop
- Drop `invalidateCurrentSessionCache()` from broadcastStateImmediate.
The 500ms TTL cache was being wiped on every broadcast (per
agent-emit, dozens/sec under chatty watchers like a remote
hermes plugin), defeating its purpose. handleFocus() still
invalidates on real focus changes.
- Swap raw getCurrentSession() -> getCachedCurrentSession() at
the hot computeState/ensure-sidebar/toggle sites.
- Debounce broadcastState (30ms) and refreshPaneAgents (250ms).
Sub-perception windows; coalesce bursts that previously each
triggered a fresh tmux subprocess (or a `tmux list-panes -a` +
ps + lsof in the case of refreshPaneAgents).
- Add AsyncReadCapable interface + isAsyncReadCapable guard.
TmuxClient gains `runAsync` (Bun.spawn(...).exited) and async
siblings: listSessionsAsync, listPanesAsync, listClientsAsync,
getCurrentSessionAsync, getActiveSessionDirsAsync,
getAllPaneCountsAsync. TmuxProvider implements them; its async
listSessions runs the two underlying tmux subprocesses
(list-sessions + active-session-dirs) in parallel via Promise.all.
- computeState() and broadcastStateImmediate() are now async.
Provider list calls in computeState run through Promise.all on
the async API where available; legacy providers fall back to
sync. broadcastStateImmediate uses an in-flight + trailing flag
so concurrent triggers coalesce instead of stacking.
3. Bypass server for the switch action
- Server writes ORDERING_FILE (/tmp/opensessions.<KEY>.ordering)
on every state broadcast, one visible session name per line.
- switch-index.sh fast-path: read line N from the file, call
`tmux switch-client -t <name>` directly. The server-side POST
still goes out (backgrounded) so the server can update unseen
flags / sidebar focus / custom ordering, but the user-perceived
latency is just one tmux fork. The existing tmux hook
(client-session-changed -> POST /focus) keeps the server in sync.
- Cold-path fallback: if the ordering file is missing (boot, or
ordering not yet broadcast), behave as before.
Measured on an interactive setup with 4 sessions, 3 attached sidebars,
and a chatty remote hermes watcher pushing events 3-5/sec:
before after
GET / healthcheck 218ms 20ms
POST /switch-index isolated 1033ms 54ms
5 parallel POSTs total timeout 32ms
switch-index.sh end-to-end (warm) 270ms 60ms
5 rapid-fire switches ~1.5s ~280ms
Stacked on PR Ataraxy-Labs#32 (perf+themes); merge order does not strictly matter
but the runtime/server index.ts changes overlap.
Co-authored-by: Isaac
There was a problem hiding this comment.
inspect review
Triage: 75 entities analyzed | 0 critical, 0 high, 25 medium, 50 low
Verdict: standard_review
Findings (1)
- [low] cleanup() does not clear the new broadcast debounce timer, so broadcastStateImmediate() can still run after shutdown. Evidence: broadcastState() sets broadcastDebounceTimer via setTimeout(..., BROADCAST_DEBOUNCE_MS) in packages/runtime/src/server/index.ts, but cleanup() clears debounceTimer/sidebarEnforceTimer/etc. and never clears broadcastDebounceTimer.
Reviewed by inspect | Entity-level triage found 0 high-risk changes
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Brings rapid M-N session switching from ~1.5s/switch under load to ~60ms/switch on an interactive setup with 4 sessions, 3 attached sidebars, and a chatty remote hermes watcher pushing 3-5 events/sec.
Three layers, ordered by where they unblock:
1. Keybinding & script overhead (~30%)
run-shell -bon all bindings so tmux's input loop doesn't block while shells run. Without -b, rapid presses serialise on tmux's command queue and reentrant tmux calls inside the script compound the wait.#{client_tty}|#{session_name}|#{window_id}as a positional arg from the bind-key, skipping atmux display-messagefork (~16ms).server-common.sh: alive-check tmpfile (5s TTL) skips the ~80ms healthcheck curl on the hot path; plugin-dir / bun lookup deferred to the cold-start branch only.-m 0.2→-m 1.5,--connect-timeout 0.1→0.3, end with|| true; exit 0so tmux's run-shell never surfaces curl exit 28 (timeout) in the status line during transient spikes.2. Server-side event loop (~40%)
invalidateCurrentSessionCache()frombroadcastStateImmediate. The 500ms TTL cache was being wiped on every broadcast (per agent-emit, dozens/sec under chatty watchers), defeating its purpose.handleFocus()still invalidates on real focus changes.getCurrentSession()→getCachedCurrentSession()at the hot computeState / ensure-sidebar / toggle sites.broadcastState(30ms) andrefreshPaneAgents(250ms). Sub-perception; coalesce bursts that previously each triggered a fresh tmux subprocess (ortmux list-panes -a+ps+lsofin the case of refreshPaneAgents).AsyncReadCapableinterface +isAsyncReadCapableguard.TmuxClientgainsrunAsync(usesBun.spawn(...).exited) and async siblings:listSessionsAsync,listPanesAsync,listClientsAsync,getCurrentSessionAsync,getActiveSessionDirsAsync,getAllPaneCountsAsync.TmuxProviderimplements them; its async listSessions runs the two underlying tmux subprocesses (list-sessions + active-session-dirs) in parallel viaPromise.all.computeState()andbroadcastStateImmediate()are nowasync. Provider list calls in computeState run throughPromise.allon the async API where available; legacy providers fall back to sync.broadcastStateImmediateuses an in-flight + trailing flag so concurrent triggers coalesce instead of stacking.3. Bypass server for the switch action (the rest)
ORDERING_FILE(/tmp/opensessions.<KEY>.ordering) on every state broadcast — one visible session name per line.switch-index.shfast-path: read line N from the file, calltmux switch-client -t <name>directly. The server-side POST still goes out (backgrounded) so the server can update unseen flags / sidebar focus / custom ordering, but user-perceived latency is just one tmux fork. The existing tmux hook (client-session-changed→/focus) keeps the server in sync.Numbers
GET /healthcheckPOST /switch-indexisolatedswitch-index.shend-to-end (warm)Test plan
bun build apps/server/src/main.ts— cleanpackages/runtime/test/to confirmbroadcastStateImmediateasync migration doesn't regress sidebar coordinator state machineNotes
index.tslines touched in both. Strictly speaking this can land independently if feat(themes): follow macOS Appearance and auto-switch dark/light #32 merges first, otherwise will need a small rebase.This pull request and its description were written by Isaac.