Skip to content

fix(sdk): query actual relay state in RelayManager.isConnected#101

Open
variablefate wants to merge 4 commits intomainfrom
claude/relay-isconnected-truthful
Open

fix(sdk): query actual relay state in RelayManager.isConnected#101
variablefate wants to merge 4 commits intomainfrom
claude/relay-isconnected-truthful

Conversation

@variablefate
Copy link
Copy Markdown
Owner

Summary

  • RelayManager.isConnected was checking only Swift-side state — client != nil && !connectedRelayURLs.isEmpty && _handlerAlive — none of which track WebSocket health. The Client object stays alive through airplane-mode toggles and the notification handler task only dies on hard errors, so the getter has always reported "connected" once a relay had ever been reached on launch.
  • Discovered on-device while verifying feat(sdk): rethrow publishProfileAndMark + publishAndMark for eager onboarding banner #99 + feat(ui): global offline pill at top of RootView #100. With airplane mode on, the new connectivity pill never appeared and ConnectivitySheet showed all 3 relay dots green even though driver pictures clearly weren't loading.
  • Fix: query client.relays() and check each Relay.isConnected() (rust-nostr binding's live per-relay status). "At least one relay connected" → true.
  • The _handlerAlive flag stays — it's still useful for reconnectIfNeeded's liveness gate, but that's a separate concern from "is a relay reachable right now."

Impact

Pre-existing bug, not introduced by #99 or #100, but it gates both from working as intended on real devices:

Tests use FakeRelayManager which maintains its own truthful _isConnected state, so the regression never surfaced in CI.

Sequencing

This should land before #99 and #100; both PRs need to be rebased on top so they actually work on-device. Happy to do the rebase once this merges.

Test plan

🤖 Generated with Claude Code

variablefate and others added 3 commits May 6, 2026 19:03
`isConnected` returned `client != nil && !connectedRelayURLs.isEmpty &&
_handlerAlive` — none of which reflect WebSocket health. The `Client`
object stays alive through airplane-mode toggles; the notification
handler task only dies on hard error, not on transient drops. So the
getter always reported "connected" once any relay had ever been reached
on launch.

Side effect: every consumer of the lying signal — the per-tab inline
offline UI in DriversTab/HistoryTab/RideTab, the ConnectivitySheet
relay dots, the new ConnectivityPill from #100, the `guard await
isConnected` short-circuits in AppState reconnect / sync flows — was
silently broken on-device. Visible on real device only because tests
use FakeRelayManager which has its own truthful state.

Fix: query `client.relays()` and check each `Relay.isConnected()` (the
rust-nostr binding's live per-relay status). "At least one relay
connected" → true. The `_handlerAlive` flag remains relevant for
`reconnectIfNeeded`'s liveness gate; that's a separate concern from
"is a relay reachable right now."

Found during on-device verification of #99 + #100. Pre-existing bug,
not introduced by either PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… state

Discovered while retesting #101 on-device: even with the truthful
isConnected getter, plain background→foreground resume never reflected
the offline state. The handler's reconnect path was gated by
`guard client == nil || !_handlerAlive else { return }` — and on iOS
suspend, the Swift Client object stays alive and `_handlerAlive` stays
true (rust-nostr only flips it on a hard error from the notification
handler, not on a silently-killed socket).

So the foreground handler bailed out without rebuilding, the existing
client kept reporting its old per-relay statuses, and `isConnected`
(now truthful per HEAD~1) reflected that stale state.

Drop the guard entirely. `reconnectIfNeeded` is only called from
foreground handlers and explicit user reconnect actions — both are
exactly the moments when cached state can't be trusted. Cost is one
~1s handshake per foreground transition; gain is correctness.

Force-quit launch already worked because there was no existing client
to lie about; this fix extends that correctness to resume paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Even with reconnectIfNeeded force-rebuilding (HEAD~1) and isConnected
querying truthful per-relay state (HEAD~2), the offline pill still took
~2 minutes to appear after toggling airplane mode mid-foreground —
because nothing was triggering reconnectAndRestoreSession during that
window. The 10s connection watchdog kept polling isConnected, which
read rust-nostr's cached per-relay state, which stayed stale until
rust-nostr's internal heartbeat eventually noticed the dead socket.

Add an NWPathMonitor inside ConnectionCoordinator. Any iOS-observed
network path change (Wi-Fi/cellular swap, airplane toggle, captive
portal) immediately fires reconnect, bypassing the isConnected gate.
rust-nostr's stale state can't fool the OS-level signal.

The very first path update fires on monitor start with the current
path — skipped to avoid a spurious rebuild on launch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@variablefate
Copy link
Copy Markdown
Owner Author

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

- Remove `_handlerAlive` flag entirely. The PR's reconnectIfNeeded
  rewrite dropped its only reader (the cached-state guard), leaving
  three writes and zero reads. The new isConnected doc-comment was
  also factually wrong about the flag "remaining relevant for
  reconnectIfNeeded's liveness gate" — that gate was deleted in the
  same PR. Removing the flag, its writes, the markHandlerDead helper,
  and the monitor Task that called it leaves a cleaner RelayManager
  whose connection state is purely the rust-nostr per-relay query.
- ConnectionCoordinator: track path-monitor reconnect Tasks so stop()
  can cancel any in-flight reconnect. Without this, a fire-and-forget
  Task spawned by a path-update event could continue calling the
  injected `reconnect` closure after the coordinator was torn down
  (e.g. on logout / identity replacement). Mirrors PR #95's
  tracked-Task pattern for the onboarding-publish watchdog.

Code-review follow-up. No user-visible behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant