From aacc7eafc9ab922dcb90018f66597f4575f8bb7e Mon Sep 17 00:00:00 2001 From: variablefate Date: Wed, 6 May 2026 19:03:28 -0700 Subject: [PATCH 1/7] fix(sdk): query actual relay state in RelayManager.isConnected MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit `isConnected` returned `client != nil && !connectedRelayURLs.isEmpty && _handlerAlive` — none of which reflect WebSocket health. The `Client` object stays alive through airplane-mode toggles; the notification handler task only dies on hard error, not on transient drops. So the getter always reported "connected" once any relay had ever been reached on launch. Side effect: every consumer of the lying signal — the per-tab inline offline UI in DriversTab/HistoryTab/RideTab, the ConnectivitySheet relay dots, the new ConnectivityPill from #100, the `guard await isConnected` short-circuits in AppState reconnect / sync flows — was silently broken on-device. Visible on real device only because tests use FakeRelayManager which has its own truthful state. Fix: query `client.relays()` and check each `Relay.isConnected()` (the rust-nostr binding's live per-relay status). "At least one relay connected" → true. The `_handlerAlive` flag remains relevant for `reconnectIfNeeded`'s liveness gate; that's a separate concern from "is a relay reachable right now." Found during on-device verification of #99 + #100. Pre-existing bug, not introduced by either PR. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../Sources/RidestrSDK/Nostr/RelayManager.swift | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/RidestrSDK/Sources/RidestrSDK/Nostr/RelayManager.swift b/RidestrSDK/Sources/RidestrSDK/Nostr/RelayManager.swift index 35a191c..3d769d5 100644 --- a/RidestrSDK/Sources/RidestrSDK/Nostr/RelayManager.swift +++ b/RidestrSDK/Sources/RidestrSDK/Nostr/RelayManager.swift @@ -31,8 +31,23 @@ public actor RelayManager: RelayManagerProtocol { await teardownConnection(clearRelayURLs: true) } + /// At least one relay is currently connected. Queries the underlying + /// rust-nostr `Client` for the live per-relay status — the previous impl + /// returned `client != nil && !connectedRelayURLs.isEmpty && _handlerAlive` + /// which all stay true through airplane-mode toggles and other transient + /// network drops, so it lied about the actual WebSocket state. The + /// `_handlerAlive` flag remains relevant for `reconnectIfNeeded`'s + /// liveness gate; it's a separate concern from "is a relay reachable + /// right now." public var isConnected: Bool { - client != nil && !connectedRelayURLs.isEmpty && _handlerAlive + get async { + guard let client else { return false } + let relays = await client.relays() + for relay in relays.values where relay.isConnected() { + return true + } + return false + } } private func markHandlerDead() { From 6e5099698dcadf39fd0c7ab8575c46dc98ea73f3 Mon Sep 17 00:00:00 2001 From: variablefate Date: Wed, 6 May 2026 19:12:11 -0700 Subject: [PATCH 2/7] fix(sdk): reconnectIfNeeded force-rebuilds instead of trusting cached state MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Discovered while retesting #101 on-device: even with the truthful isConnected getter, plain background→foreground resume never reflected the offline state. The handler's reconnect path was gated by `guard client == nil || !_handlerAlive else { return }` — and on iOS suspend, the Swift Client object stays alive and `_handlerAlive` stays true (rust-nostr only flips it on a hard error from the notification handler, not on a silently-killed socket). So the foreground handler bailed out without rebuilding, the existing client kept reporting its old per-relay statuses, and `isConnected` (now truthful per HEAD~1) reflected that stale state. Drop the guard entirely. `reconnectIfNeeded` is only called from foreground handlers and explicit user reconnect actions — both are exactly the moments when cached state can't be trusted. Cost is one ~1s handshake per foreground transition; gain is correctness. Force-quit launch already worked because there was no existing client to lie about; this fix extends that correctness to resume paths. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../RidestrSDK/Nostr/RelayManager.swift | 20 +++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/RidestrSDK/Sources/RidestrSDK/Nostr/RelayManager.swift b/RidestrSDK/Sources/RidestrSDK/Nostr/RelayManager.swift index 3d769d5..37262a5 100644 --- a/RidestrSDK/Sources/RidestrSDK/Nostr/RelayManager.swift +++ b/RidestrSDK/Sources/RidestrSDK/Nostr/RelayManager.swift @@ -54,13 +54,25 @@ public actor RelayManager: RelayManagerProtocol { _handlerAlive = false } - /// Reconnect to relays if disconnected. Call from app foreground handler. - /// Does NOT restart subscriptions — callers must re-subscribe after this returns. + /// Force-rebuild the relay client. Call from app foreground handler. + /// + /// Always replaces the client rather than short-circuiting on cached + /// state because the cached state lies on iOS background→foreground: + /// the OS suspends WebSockets on background, rust-nostr's per-relay + /// status doesn't update until the next read/write attempt fails, and + /// the `_handlerAlive` flag only flips false on a hard error from the + /// notification handler (which doesn't fire on a silently-killed + /// socket). Tearing down and rebuilding is the only way to get + /// truthful state — the alternative is letting the user sit in + /// "everything looks fine" while their relays are dead. + /// + /// Does NOT restart subscriptions — callers must re-subscribe after + /// this returns. Cheap when relays are reachable (~1s handshake); + /// the trade-off vs. per-call cost is correctness on every foreground. public func reconnectIfNeeded() async { guard !connectedRelayURLs.isEmpty else { return } - guard client == nil || !_handlerAlive else { return } - RidestrLogger.error("[RelayManager] Reconnecting relay client") + RidestrLogger.info("[RelayManager] Force-rebuilding relay client (foreground / explicit reconnect)") do { try await replaceClient(with: connectedRelayURLs) } catch { From f309bd572ca42d979972cd1a886427632294c8e9 Mon Sep 17 00:00:00 2001 From: variablefate Date: Wed, 6 May 2026 19:26:46 -0700 Subject: [PATCH 3/7] fix(connection): NWPathMonitor reactive signal for offline detection MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Even with reconnectIfNeeded force-rebuilding (HEAD~1) and isConnected querying truthful per-relay state (HEAD~2), the offline pill still took ~2 minutes to appear after toggling airplane mode mid-foreground — because nothing was triggering reconnectAndRestoreSession during that window. The 10s connection watchdog kept polling isConnected, which read rust-nostr's cached per-relay state, which stayed stale until rust-nostr's internal heartbeat eventually noticed the dead socket. Add an NWPathMonitor inside ConnectionCoordinator. Any iOS-observed network path change (Wi-Fi/cellular swap, airplane toggle, captive portal) immediately fires reconnect, bypassing the isConnected gate. rust-nostr's stale state can't fool the OS-level signal. The very first path update fires on monitor start with the current path — skipped to avoid a spurious rebuild on launch. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../ViewModels/ConnectionCoordinator.swift | 58 +++++++++++++++++-- 1 file changed, 52 insertions(+), 6 deletions(-) diff --git a/RoadFlare/RoadFlareCore/ViewModels/ConnectionCoordinator.swift b/RoadFlare/RoadFlareCore/ViewModels/ConnectionCoordinator.swift index 99c7749..2bb1954 100644 --- a/RoadFlare/RoadFlareCore/ViewModels/ConnectionCoordinator.swift +++ b/RoadFlare/RoadFlareCore/ViewModels/ConnectionCoordinator.swift @@ -1,16 +1,35 @@ import Foundation +import Network -/// Owns the periodic relay connection watchdog task. +/// Owns the periodic relay connection watchdog task and a path-change +/// reactive signal. /// -/// Checks connectivity on a fixed interval and triggers reconnection -/// when the relay drops. All behavior is injected via closures so the -/// coordinator has no direct dependencies on AppState or SDK services. +/// Two complementary mechanisms drive reconnects: +/// +/// 1. **Periodic watchdog** — polls `isConnected` on a fixed interval and +/// triggers reconnect when it returns false. Bounded latency for any +/// drop the relay manager has noticed. +/// +/// 2. **`NWPathMonitor` reactive signal** — fires whenever iOS observes a +/// network path change (Wi-Fi/cellular swap, airplane mode toggle, +/// captive portal). Forces a reconnect immediately, bypassing the +/// `isConnected` gate, because the previous connection's cached state +/// is by definition suspect after a path change. rust-nostr's +/// per-relay status doesn't update until its next read/write fails, +/// which can take minutes on a silently-killed socket — the OS-level +/// signal is much faster. +/// +/// All behavior is injected via closures so the coordinator has no +/// direct dependencies on AppState or SDK services. @MainActor final class ConnectionCoordinator { private var watchdogTask: Task? + private var pathMonitor: NWPathMonitor? + private var pathMonitorQueue: DispatchQueue? + private var hasReceivedFirstPath = false private var isReconnecting = false - /// Start the periodic watchdog. + /// Start the periodic watchdog and the path-change reactive signal. /// /// - Parameters: /// - interval: Time between connectivity checks. @@ -34,12 +53,39 @@ final class ConnectionCoordinator { await reconnect() } } + + let monitor = NWPathMonitor() + let queue = DispatchQueue(label: "com.roadflare.connection-coordinator.path-monitor") + monitor.pathUpdateHandler = { [weak self] _ in + // The very first path update fires on monitor start with the + // current path — not a real change. Skip that to avoid a + // spurious rebuild on app launch when `setupServices` has just + // established the initial connection. + Task { @MainActor [weak self] in + guard let self else { return } + if !self.hasReceivedFirstPath { + self.hasReceivedFirstPath = true + return + } + guard !self.isReconnecting, shouldReconnect() else { return } + self.isReconnecting = true + defer { self.isReconnecting = false } + await reconnect() + } + } + monitor.start(queue: queue) + self.pathMonitor = monitor + self.pathMonitorQueue = queue } - /// Stop the watchdog and cancel any in-flight reconnection. + /// Stop the watchdog, cancel the path monitor, and any in-flight reconnection. func stop() { watchdogTask?.cancel() watchdogTask = nil + pathMonitor?.cancel() + pathMonitor = nil + pathMonitorQueue = nil + hasReceivedFirstPath = false isReconnecting = false } } From f397a4a5970fd152a14b4d9a3a1c0fcb1df918e4 Mon Sep 17 00:00:00 2001 From: variablefate Date: Wed, 6 May 2026 20:29:24 -0700 Subject: [PATCH 4/7] chore: address sub-80 code-review polish MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Remove `_handlerAlive` flag entirely. The PR's reconnectIfNeeded rewrite dropped its only reader (the cached-state guard), leaving three writes and zero reads. The new isConnected doc-comment was also factually wrong about the flag "remaining relevant for reconnectIfNeeded's liveness gate" — that gate was deleted in the same PR. Removing the flag, its writes, the markHandlerDead helper, and the monitor Task that called it leaves a cleaner RelayManager whose connection state is purely the rust-nostr per-relay query. - ConnectionCoordinator: track path-monitor reconnect Tasks so stop() can cancel any in-flight reconnect. Without this, a fire-and-forget Task spawned by a path-update event could continue calling the injected `reconnect` closure after the coordinator was torn down (e.g. on logout / identity replacement). Mirrors PR #95's tracked-Task pattern for the onboarding-publish watchdog. Code-review follow-up. No user-visible behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../RidestrSDK/Nostr/RelayManager.swift | 32 +++++------------ .../ViewModels/ConnectionCoordinator.swift | 34 +++++++++++++++---- 2 files changed, 36 insertions(+), 30 deletions(-) diff --git a/RidestrSDK/Sources/RidestrSDK/Nostr/RelayManager.swift b/RidestrSDK/Sources/RidestrSDK/Nostr/RelayManager.swift index 37262a5..fc4bcdb 100644 --- a/RidestrSDK/Sources/RidestrSDK/Nostr/RelayManager.swift +++ b/RidestrSDK/Sources/RidestrSDK/Nostr/RelayManager.swift @@ -14,7 +14,6 @@ public actor RelayManager: RelayManagerProtocol { private var connectedRelayURLs: [URL] = [] private var notificationHandler: NotificationRouter? private var notificationTask: Task? - private var _handlerAlive = false // Explicit liveness flag — Task.isCancelled doesn't detect normal completion public init(keypair: NostrKeypair) { self.keypair = keypair @@ -33,12 +32,9 @@ public actor RelayManager: RelayManagerProtocol { /// At least one relay is currently connected. Queries the underlying /// rust-nostr `Client` for the live per-relay status — the previous impl - /// returned `client != nil && !connectedRelayURLs.isEmpty && _handlerAlive` + /// returned `client != nil && !connectedRelayURLs.isEmpty && handlerAlive` /// which all stay true through airplane-mode toggles and other transient - /// network drops, so it lied about the actual WebSocket state. The - /// `_handlerAlive` flag remains relevant for `reconnectIfNeeded`'s - /// liveness gate; it's a separate concern from "is a relay reachable - /// right now." + /// network drops, so it lied about the actual WebSocket state. public var isConnected: Bool { get async { guard let client else { return false } @@ -50,21 +46,16 @@ public actor RelayManager: RelayManagerProtocol { } } - private func markHandlerDead() { - _handlerAlive = false - } - /// Force-rebuild the relay client. Call from app foreground handler. /// /// Always replaces the client rather than short-circuiting on cached /// state because the cached state lies on iOS background→foreground: - /// the OS suspends WebSockets on background, rust-nostr's per-relay - /// status doesn't update until the next read/write attempt fails, and - /// the `_handlerAlive` flag only flips false on a hard error from the - /// notification handler (which doesn't fire on a silently-killed - /// socket). Tearing down and rebuilding is the only way to get - /// truthful state — the alternative is letting the user sit in - /// "everything looks fine" while their relays are dead. + /// the OS suspends WebSockets on background, and rust-nostr's + /// per-relay status doesn't update until the next read/write attempt + /// fails (which can take minutes on a silently-killed socket). + /// Tearing down and rebuilding is the only way to get truthful state — + /// the alternative is letting the user sit in "everything looks fine" + /// while their relays are dead. /// /// Does NOT restart subscriptions — callers must re-subscribe after /// this returns. Cheap when relays are reachable (~1s handshake); @@ -107,7 +98,6 @@ public actor RelayManager: RelayManagerProtocol { private func startNotificationHandler(for client: Client) { let router = NotificationRouter() self.notificationHandler = router - self._handlerAlive = true let clientRef = client notificationTask = Task.detached { [router] in do { @@ -120,15 +110,9 @@ public actor RelayManager: RelayManagerProtocol { router.removeAll() RidestrLogger.error("[RelayManager] All subscription streams terminated due to disconnect") } - // Monitor handler liveness from a separate task - Task { [weak self] in - await self?.notificationTask?.value // Suspends until task completes - await self?.markHandlerDead() - } } private func teardownConnection(clearRelayURLs: Bool) async { - _handlerAlive = false notificationHandler?.removeAll() activeStreams.removeAll() subscriptionGenerations.removeAll() diff --git a/RoadFlare/RoadFlareCore/ViewModels/ConnectionCoordinator.swift b/RoadFlare/RoadFlareCore/ViewModels/ConnectionCoordinator.swift index 2bb1954..bca5178 100644 --- a/RoadFlare/RoadFlareCore/ViewModels/ConnectionCoordinator.swift +++ b/RoadFlare/RoadFlareCore/ViewModels/ConnectionCoordinator.swift @@ -26,6 +26,13 @@ final class ConnectionCoordinator { private var watchdogTask: Task? private var pathMonitor: NWPathMonitor? private var pathMonitorQueue: DispatchQueue? + /// Reconnect Tasks spawned by path-update events. Tracked so `stop()` + /// can cancel any in-flight reconnect that landed mid-tear-down (e.g. + /// on logout / identity replacement). Without this, a fire-and-forget + /// path Task could continue calling the injected `reconnect` closure + /// after the coordinator has been torn down. Mirrors the tracked-Task + /// pattern from PR #95's onboarding-publish watchdog. + private var pathReconnectTasks: [Task] = [] private var hasReceivedFirstPath = false private var isReconnecting = false @@ -58,9 +65,12 @@ final class ConnectionCoordinator { let queue = DispatchQueue(label: "com.roadflare.connection-coordinator.path-monitor") monitor.pathUpdateHandler = { [weak self] _ in // The very first path update fires on monitor start with the - // current path — not a real change. Skip that to avoid a - // spurious rebuild on app launch when `setupServices` has just - // established the initial connection. + // current path — not a real change in production paths. Skip + // it to avoid a spurious rebuild on app launch right after + // `setupServices` has established the initial connection. + // (Apple's `NWPathMonitor` documentation does not strictly + // contract this, but it is the consistently observed behavior + // for `start(queue:)` and is widely relied on across iOS apps.) Task { @MainActor [weak self] in guard let self else { return } if !self.hasReceivedFirstPath { @@ -68,9 +78,17 @@ final class ConnectionCoordinator { return } guard !self.isReconnecting, shouldReconnect() else { return } - self.isReconnecting = true - defer { self.isReconnecting = false } - await reconnect() + let reconnectTask = Task { @MainActor [weak self] in + guard let self else { return } + guard !Task.isCancelled else { return } + self.isReconnecting = true + defer { self.isReconnecting = false } + await reconnect() + } + self.pathReconnectTasks.append(reconnectTask) + // Best-effort cleanup of finished Tasks so the array doesn't + // grow unboundedly across many path transitions. + self.pathReconnectTasks.removeAll(where: { $0.isCancelled }) } } monitor.start(queue: queue) @@ -85,6 +103,10 @@ final class ConnectionCoordinator { pathMonitor?.cancel() pathMonitor = nil pathMonitorQueue = nil + for task in pathReconnectTasks { + task.cancel() + } + pathReconnectTasks.removeAll() hasReceivedFirstPath = false isReconnecting = false } From 541d84cee5953e23c1a696279914e53bb3d41ac4 Mon Sep 17 00:00:00 2001 From: variablefate Date: Thu, 7 May 2026 14:42:26 -0700 Subject: [PATCH 5/7] chore(connection): single pathReconnectTask instead of unbounded array Self-noticed defect in the previous polish commit: the `pathReconnectTasks: [Task]` array's "best-effort cleanup" line only removed Tasks that had been explicitly cancelled, not Tasks that completed naturally. So the array grew with every path event during a session, bounded only by `stop()`. Tiny leak in practice but the cleanup comment was misleading. `isReconnecting` already serializes path-spawned reconnects to at most one in flight at any moment, so a single Task var is sufficient for the original goal of "let stop() cancel any in-flight reconnect." Co-Authored-By: Claude Opus 4.7 (1M context) --- .../ViewModels/ConnectionCoordinator.swift | 27 ++++++++----------- 1 file changed, 11 insertions(+), 16 deletions(-) diff --git a/RoadFlare/RoadFlareCore/ViewModels/ConnectionCoordinator.swift b/RoadFlare/RoadFlareCore/ViewModels/ConnectionCoordinator.swift index bca5178..35c6d3f 100644 --- a/RoadFlare/RoadFlareCore/ViewModels/ConnectionCoordinator.swift +++ b/RoadFlare/RoadFlareCore/ViewModels/ConnectionCoordinator.swift @@ -26,13 +26,14 @@ final class ConnectionCoordinator { private var watchdogTask: Task? private var pathMonitor: NWPathMonitor? private var pathMonitorQueue: DispatchQueue? - /// Reconnect Tasks spawned by path-update events. Tracked so `stop()` - /// can cancel any in-flight reconnect that landed mid-tear-down (e.g. - /// on logout / identity replacement). Without this, a fire-and-forget - /// path Task could continue calling the injected `reconnect` closure - /// after the coordinator has been torn down. Mirrors the tracked-Task - /// pattern from PR #95's onboarding-publish watchdog. - private var pathReconnectTasks: [Task] = [] + /// Most recently spawned path-event reconnect Task. `isReconnecting` + /// already serializes path-spawned reconnects (a second event during an + /// in-flight reconnect short-circuits via the guard), so at most one + /// such Task is doing real work at any moment — tracking the latest + /// is sufficient to cancel an in-flight reconnect on `stop()` (e.g. + /// logout / identity replacement). Mirrors the tracked-Task pattern + /// from PR #95's onboarding-publish watchdog. + private var pathReconnectTask: Task? private var hasReceivedFirstPath = false private var isReconnecting = false @@ -78,17 +79,13 @@ final class ConnectionCoordinator { return } guard !self.isReconnecting, shouldReconnect() else { return } - let reconnectTask = Task { @MainActor [weak self] in + self.pathReconnectTask = Task { @MainActor [weak self] in guard let self else { return } guard !Task.isCancelled else { return } self.isReconnecting = true defer { self.isReconnecting = false } await reconnect() } - self.pathReconnectTasks.append(reconnectTask) - // Best-effort cleanup of finished Tasks so the array doesn't - // grow unboundedly across many path transitions. - self.pathReconnectTasks.removeAll(where: { $0.isCancelled }) } } monitor.start(queue: queue) @@ -103,10 +100,8 @@ final class ConnectionCoordinator { pathMonitor?.cancel() pathMonitor = nil pathMonitorQueue = nil - for task in pathReconnectTasks { - task.cancel() - } - pathReconnectTasks.removeAll() + pathReconnectTask?.cancel() + pathReconnectTask = nil hasReceivedFirstPath = false isReconnecting = false } From 66e0843d6f2e1f3ad09199205c64d3076118e9de Mon Sep 17 00:00:00 2001 From: variablefate Date: Thu, 7 May 2026 14:51:54 -0700 Subject: [PATCH 6/7] fix(connection): set isReconnecting synchronously in outer path Task MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes a race surfaced during re-review of PR #101. The previous structure spawned an inner Task and set `isReconnecting = true` inside its body, leaving a window where two rapid path events could both pass the outer `guard !isReconnecting` (since inner A hadn't yet set the flag), spawn overlapping inner Tasks, and orphan the first Task's handle in `pathReconnectTask`. NWPathMonitor can fire repeatedly on airplane-mode toggles or Wi-Fi flap, so this isn't theoretical. Setting `isReconnecting = true` in the outer Task body — same critical section as the guard check — closes the race. The inner Task's defer clears it via `self?.isReconnecting = false` (which is also the only clear, removing the duplicate `isReconnecting = true` assignment that was left over from the prior structure). Also tightens three doc-comments that became stale after the SDK changes: - isConnected doc: stop spelling the removed flag as `handlerAlive` (the actual name was `_handlerAlive`); generalise to "Swift-side cached state." - reconnectIfNeeded doc: qualify "always replaces" with "(when relay URLs are configured)" to match the `guard !connectedRelayURLs.isEmpty` early-exit. - ConnectionCoordinator class doc: drop "drop the relay manager has noticed" — leftover from the `_handlerAlive` semantics; isConnected now queries live state directly. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../RidestrSDK/Nostr/RelayManager.swift | 26 ++++++++++--------- .../ViewModels/ConnectionCoordinator.swift | 20 +++++++++----- 2 files changed, 28 insertions(+), 18 deletions(-) diff --git a/RidestrSDK/Sources/RidestrSDK/Nostr/RelayManager.swift b/RidestrSDK/Sources/RidestrSDK/Nostr/RelayManager.swift index fc4bcdb..7fe6cad 100644 --- a/RidestrSDK/Sources/RidestrSDK/Nostr/RelayManager.swift +++ b/RidestrSDK/Sources/RidestrSDK/Nostr/RelayManager.swift @@ -31,10 +31,11 @@ public actor RelayManager: RelayManagerProtocol { } /// At least one relay is currently connected. Queries the underlying - /// rust-nostr `Client` for the live per-relay status — the previous impl - /// returned `client != nil && !connectedRelayURLs.isEmpty && handlerAlive` - /// which all stay true through airplane-mode toggles and other transient - /// network drops, so it lied about the actual WebSocket state. + /// rust-nostr `Client` for the live per-relay status — the previous + /// impl checked only Swift-side cached state (the client pointer, a + /// configured-URLs list, and a handler-liveness flag), all of which + /// stay true through airplane-mode toggles and other transient network + /// drops, so it lied about the actual WebSocket state. public var isConnected: Bool { get async { guard let client else { return false } @@ -48,18 +49,19 @@ public actor RelayManager: RelayManagerProtocol { /// Force-rebuild the relay client. Call from app foreground handler. /// - /// Always replaces the client rather than short-circuiting on cached - /// state because the cached state lies on iOS background→foreground: - /// the OS suspends WebSockets on background, and rust-nostr's - /// per-relay status doesn't update until the next read/write attempt - /// fails (which can take minutes on a silently-killed socket). - /// Tearing down and rebuilding is the only way to get truthful state — - /// the alternative is letting the user sit in "everything looks fine" - /// while their relays are dead. + /// Always replaces the client (when relay URLs are configured) rather + /// than short-circuiting on cached state because the cached state + /// lies on iOS background→foreground: the OS suspends WebSockets on + /// background, and rust-nostr's per-relay status doesn't update until + /// the next read/write attempt fails (which can take minutes on a + /// silently-killed socket). Tearing down and rebuilding is the only + /// way to get truthful state — the alternative is letting the user + /// sit in "everything looks fine" while their relays are dead. /// /// Does NOT restart subscriptions — callers must re-subscribe after /// this returns. Cheap when relays are reachable (~1s handshake); /// the trade-off vs. per-call cost is correctness on every foreground. + /// No-op if `connect(to:)` has not been called yet (no URLs to reach). public func reconnectIfNeeded() async { guard !connectedRelayURLs.isEmpty else { return } diff --git a/RoadFlare/RoadFlareCore/ViewModels/ConnectionCoordinator.swift b/RoadFlare/RoadFlareCore/ViewModels/ConnectionCoordinator.swift index 35c6d3f..02b1de2 100644 --- a/RoadFlare/RoadFlareCore/ViewModels/ConnectionCoordinator.swift +++ b/RoadFlare/RoadFlareCore/ViewModels/ConnectionCoordinator.swift @@ -6,9 +6,11 @@ import Network /// /// Two complementary mechanisms drive reconnects: /// -/// 1. **Periodic watchdog** — polls `isConnected` on a fixed interval and -/// triggers reconnect when it returns false. Bounded latency for any -/// drop the relay manager has noticed. +/// 1. **Periodic watchdog** — polls the live `isConnected` query (which, +/// after PR #101, asks rust-nostr for per-relay status rather than +/// reading a cached flag) on a fixed interval and triggers reconnect +/// when it returns false. Bounded latency for any drop visible to +/// rust-nostr. /// /// 2. **`NWPathMonitor` reactive signal** — fires whenever iOS observes a /// network path change (Wi-Fi/cellular swap, airplane mode toggle, @@ -79,11 +81,17 @@ final class ConnectionCoordinator { return } guard !self.isReconnecting, shouldReconnect() else { return } + // Set the busy flag synchronously, in the same critical + // section as the guard check, so a second path event + // landing between this Task and the inner reconnect Task + // observes the busy state and short-circuits. (Setting it + // inside the inner Task left a race window where two + // rapid path events both passed the guard and spawned + // overlapping reconnects.) + self.isReconnecting = true self.pathReconnectTask = Task { @MainActor [weak self] in - guard let self else { return } + defer { self?.isReconnecting = false } guard !Task.isCancelled else { return } - self.isReconnecting = true - defer { self.isReconnecting = false } await reconnect() } } From 1e939d9d39d05a998dba2ad47cb37565b9bc2355 Mon Sep 17 00:00:00 2001 From: variablefate Date: Thu, 7 May 2026 15:10:40 -0700 Subject: [PATCH 7/7] fix(connection): re-check isReconnecting after watchdog's isConnected await MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes the same class of race we just fixed in the path-handler. The watchdog's body has an `await isConnected()` between its `!isReconnecting` guard and the `isReconnecting = true` assignment. During that suspension, an NWPathMonitor event can land, set `isReconnecting = true`, and start its own reconnect — when the watchdog resumes it unconditionally sets the flag (already true) and calls `await reconnect()` concurrently with the in-flight one. Two back-to-back replaceClient rebuilds, second one tearing down what the first just built. One-line fix: re-check `!self.isReconnecting` after the connectivity await, before claiming the busy slot. Surfaced by the third /code-review pass on this PR (score 92). Co-Authored-By: Claude Opus 4.7 (1M context) --- .../RoadFlareCore/ViewModels/ConnectionCoordinator.swift | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/RoadFlare/RoadFlareCore/ViewModels/ConnectionCoordinator.swift b/RoadFlare/RoadFlareCore/ViewModels/ConnectionCoordinator.swift index 02b1de2..81e740f 100644 --- a/RoadFlare/RoadFlareCore/ViewModels/ConnectionCoordinator.swift +++ b/RoadFlare/RoadFlareCore/ViewModels/ConnectionCoordinator.swift @@ -58,6 +58,12 @@ final class ConnectionCoordinator { try? await Task.sleep(for: interval) guard let self, !self.isReconnecting, shouldReconnect() else { continue } guard !(await isConnected()) else { continue } + // Re-check after the connectivity await: a path-monitor + // event landing during the suspension can have set + // `isReconnecting = true` and started its own reconnect. + // Without this guard we'd kick off a second concurrent + // reconnect, tearing down the in-flight rebuild's client. + guard !self.isReconnecting else { continue } self.isReconnecting = true defer { self.isReconnecting = false } await reconnect()