Skip to content

fix: less eager "Reconnecting..." modal#106

Merged
wmeddie merged 1 commit into
mainfrom
fix/connection-health-check
May 13, 2026
Merged

fix: less eager "Reconnecting..." modal#106
wmeddie merged 1 commit into
mainfrom
fix/connection-health-check

Conversation

@wmeddie
Copy link
Copy Markdown
Member

@wmeddie wmeddie commented May 13, 2026

Summary

The "Reconnecting..." overlay was appearing too easily — particularly inside the Tauri desktop app, where users often see it briefly and then it disappears on its own.

Two reasons it false-fires:

  1. Wrong endpoint. The health check polls /api/agents, which does real work (registry lookup + per-agent Docker observe_with(...)). Under load — mid-stream agent response, Docker reconciliation — that handler can miss its 3s timeout even though the server is fine.
  2. No retry. A single failed poll flips serverConnected = false and pops the modal. Any one-off hiccup is enough.

Tauri makes it worse because WebKitGTK seems to abort/time-out fetches more aggressively than desktop browsers, and the Tauri build serves the frontend through the same axum runtime that handles API requests.

Changes

  • Poll /api/health (a stateless JSON response, no DB or Docker) instead of /api/agents.
  • Raise the timeout from 3s → 8s.
  • Require two consecutive failures before showing the modal.

Auto-reload-on-recover behavior is unchanged.

Test plan

  • Open the Tauri app, leave it running, kick off a long agent response, and confirm the modal does not flash.
  • Stop the server (xpressclaw down) with the app open — modal should appear within ~10s.
  • Restart the server — page should auto-reload once /api/health returns 200.
  • Same checks in a desktop browser.

The connection health check polled /api/agents — a real handler that
queries the DB and Docker — with a 3s timeout. Under load (mid-stream
agent response, Docker reconciliation) the request could miss the
timeout even though the server was fine, flashing the modal.

The modal also appeared on a single failed poll, so any one-off network
hiccup would trigger it. The behavior was reported as common in Tauri's
WebKitGTK webview and rare in desktop browsers, consistent with the
webview being more aggressive about aborts and the bundled-frontend
build sharing more runtime with the API handlers.

Changes:
- Poll /api/health instead — pure JSON response, no DB / Docker.
- Raise timeout from 3s to 8s.
- Require two consecutive failures before showing the modal.
@wmeddie wmeddie merged commit f8341ab into main May 13, 2026
4 checks passed
@wmeddie wmeddie deleted the fix/connection-health-check branch May 13, 2026 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant