Tear down RPC subscriptions on RN context reload#51
Open
gmaclennan wants to merge 5 commits intomainfrom
Open
Conversation
Adds NodeJSIPC.close() — a synchronous terminal teardown — and calls it from ComapeoCoreModule.OnDestroy in place of the fire-and-forget disconnect(). Why: rpc-reflector subscriptions registered by the JS client during a session are stored on the backend keyed by event name and bound to the underlying socket connection; the cleanup path runs only when the backend observes the socket closing. The previous wiring closed the socket from a launched coroutine, so the new OnCreate (which fires immediately on JS context rebuild — Expo modules are bound to the AppContext, whose lifetime is the JS runtime, not the Activity) could open a fresh connection before the old socket's FD was released. The backend's per-connection cleanup then ran against the wrong (or no) connection, leaving the prior session's listeners attached to MapeoManager and emitting events to a peer that no longer existed. close() closes the LocalSocket on the calling thread and cancels the coroutine scope so receive/send jobs don't linger across reloads. It's purely additive: disconnect() is unchanged and still used by the FGS side (NodeJSService) and by internal IOException recovery. The OnCreate / OnDestroy block now carries a comment documenting that these hooks bind to the JS runtime lifetime (verified via Expo's android-lifecycle-listeners docs), distinguishing them from the activity-level OnActivityEntersForeground hook which is kept for FGS-respawn-while-backgrounded recovery. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three coupled fixes for the same underlying issue: rpc-reflector
event-listener subscriptions registered against MapeoManager during a
JS session must be torn down when that session ends, on every
platform.
backend: SocketMessagePort.close() now emits 'close'
The constructor wires `framedStream.on('close')` to call
`SocketMessagePort.close()`, and ComapeoRpcServer's connection handler
attaches `messagePort.on('close', () => server.close())` to drive
rpc-reflector's per-connection cleanup. close() never emitted the
event, so the server.close() listener was dead — every reconnect
leaked the previous session's listeners onto MapeoManager. One-line
fix; the rest of the wiring was already correct.
ios: JSReloadObserver
Expo's `OnDestroy` doesn't fire on iOS JS reload (expo/expo#33655),
so the lifecycle-only path that works on Android leaves stale sockets
attached on every reload here. JSReloadObserver subscribes to
`RCTBridgeWillReloadNotification` (classic arch) and
`RCTJavaScriptWillStartLoadingNotification` (bridgeless / new arch)
and runs a callback synchronously on the posting thread.
ComapeoCoreModule installs one in OnCreate that calls
`ipc?.disconnect()` — synchronous on iOS, so the backend observes EOF
before the notification post returns and the rpc-reflector cleanup
runs against the right connection. The notification names are
hard-coded as strings so the file builds in the macOS-only swift-test
target (React isn't a dependency there).
Tests
backend/test/message-port.test.js (node --test, 4 tests):
- SocketMessagePort emits 'close' once when the underlying socket
closes (regression-fails without the emit)
- close() is idempotent
- end-to-end: rpc-reflector listener attached to a fake handler is
removed when the socket closes — the contract that was silently
broken before this commit
- smoke test for the in-test pair() helper
ios/Tests/JSReloadObserverTests.swift (Swift Package, macOS):
- fires for every observed name; ignores others
- deinit detaches (so leaked module instances on iOS reload don't
keep firing into a torn-down ipc reference)
- callback runs synchronously on the posting thread
- default names cover both classic and bridgeless reload signals
- integration test wires JSReloadObserver to a live NodeJSIPC +
MockNodeServer and verifies posting the notification disconnects
the client and the server reads EOF
android NodeJSIPCTest.closeReleasesSocketSynchronously (instrumented):
- blocks the mock server on `read()` of the connection, calls
`ipc.close()`, and asserts the read unblocks with EOF inside 1 s.
Pins the synchronous-close contract and would catch a regression
to the fire-and-forget disconnect path that the previous commit
removed from OnDestroy.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Direct verification of the installed expo-modules-core@55.0.23 source
(via `npm pack` + extract) confirms PR #33760's fix is present:
package/ios/Core/MainValueConverter.swift:7
private(set) weak var appContext: AppContext?
package/ios/Core/ModuleHolder.swift:140
deinit { post(event: .moduleDestroy) }
package/ios/Core/AppContext.swift
weak var reactBridge: RCTBridge?
_runtime didSet → releaseRuntimeObjects() on nil
deinit posts .appContextDestroys
With the strong-ref cycle through MainValueConverter broken, the old
AppContext is no longer pinned across reloads. On every JS reload it
deinits, releases the module registry, every ModuleHolder deinits,
each fires .moduleDestroy, every module's OnDestroy hook runs.
The previous commit's JSReloadObserver was correct for pre-#33760
SDKs but is redundant against the version actually shipped here.
Removing it reduces the number of teardown paths to one per platform
(OnDestroy on both Android and iOS) and avoids confusing future
maintainers with two mechanisms that mean the same thing.
The OnDestroy comment now documents the verification and points at the
RCT-notification fallback that should be reintroduced if a future SDK
upgrade or other strong-ref leak reintroduces the bug.
What stays from the previous commit:
- backend SocketMessagePort.emit('close') fix (load-bearing —
rpc-reflector cleanup runs from this event)
- backend tests
- android NodeJSIPC.close() instrumented test
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Previous wording said the strong-reference cycle was "broken" — which read as "buggy / non-functional" rather than the intended "severed / no longer exists". Reworded to spell out the cause-and-effect in plain English: cycle → AppContext pinned → ModuleHolders pinned → deinit chain never runs → OnDestroy never fires; PR #33760 makes the reference weak so the cycle goes away and the chain runs as expected. No code change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces the `sys.boot_completed`-only readiness check with a probe that actually exercises the services we're about to call. On the flaky run that motivated this: ##[command] adb wait-for-device shell 'while [[ -z $(getprop sys.boot_completed) ]]; do sleep 1; done' ##[command] adb shell settings put global window_animation_scale 0.0 cmd: Failure calling service settings: Broken pipe (32) ##[error] The process '/usr/bin/sh' failed with exit code 224 `sys.boot_completed` flipped as soon as zygote was alive — which on a snapshot restore happens before `system_server` finishes binding the `settings`, `package`, and `activity` services. The wait-for-device loop returned in ~14ms because the property was already set, and the very next `settings put` crashed with `Broken pipe (32)`. The new `wait_for_emulator` helper polls until all three of: - `sys.boot_completed == 1` - `pm path android` returns successfully (package manager bound) - `cmd settings list global` returns successfully (settings service bound) are true, with a 120s hard ceiling and a debug dump of `getprop | grep boot|svc` if it times out. One retry on each `settings put` covers a residual race where the probe sees `list` succeed but a write transaction loses to first-time initialisation. `disable-animations: false` is unchanged — emulator-runner's own `input keyevent 82` path still crashes the emulator on this image, which is the original reason the workflow drives the settings directly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When React Native reloads the JS bundle (dev reload,
DevSettings.reload(), fast-refresh full reload) — or, on Android, when the main app process restarts — the previous JS session'sMapeoClientis gone but its rpc-reflector event-listener subscriptions are still attached to the long-livedMapeoManageron the backend. The backend then keeps emitting events to a peer that no longer has anyone listening, and on every reload the listener set grows.The fix is to make sure the AF_UNIX socket is actually closed on reload, and that the backend's per-connection cleanup actually runs when the socket closes.
Changes
Backend —
SocketMessagePort.close()now emits'close'(load-bearing)backend/lib/message-port.js. The constructor wiresframedStream.on('close')to callSocketMessagePort.close(), andComapeoRpcServeralready attachedmessagePort.on('close', () => server.close())to drive rpc-reflector's per-connection cleanup. Butclose()never emitted the event, so theserver.close()listener was dead and every reconnect leaked the previous session's listeners ontoMapeoManager. One-line behaviour fix; the rest of the wiring was already correct.Android — synchronous IPC teardown on
OnDestroyandroid/.../NodeJSIPC.kt,android/.../ComapeoCoreModule.kt. AddsNodeJSIPC.close()— a synchronous terminal teardown that closes theLocalSocketon the calling thread and cancels the coroutine scope.ComapeoCoreModule.OnDestroynow callsclose()instead of the fire-and-forgetdisconnect().Why: Expo modules on Android are bound to the
AppContext, whose lifetime is the JS runtime, soOnDestroyfires on every JS context tear-down (android-lifecycle-listeners docs). The previous wiring closed the socket from a launched coroutine, so the newOnCreatecould open a fresh connection before the old socket's FD was released — the backend's per-connection cleanup then ran against the wrong (or no) connection.close()closes the FD on the calling thread and cancels the scope so receive/send coroutines don't linger.disconnect()is unchanged and still used by the FGS side (NodeJSService) and by internal IOException recovery.iOS — relies on
OnDestroyin SDK 55+ios/ComapeoCoreModule.swift. No mechanism change — only a comment documenting thatOnDestroyfires reliably on iOS JS reload as ofexpo-modules-corePR #33760 (merged Dec 2024, shipped in SDK 53+). Verified directly against the installedexpo-modules-core@55.0.23:ios/Core/MainValueConverter.swift:7—private(set) weak var appContext: AppContext?ios/Core/ModuleHolder.swift:140—deinit { post(event: .moduleDestroy) }Previously a strong-reference cycle through
MainValueConverterpinnedAppContextacross reloads. With it weak,AppContextdeinits on reload, the module registry releases eachModuleHolder, everyModuleHolder.deinitfires.moduleDestroy, andOnDestroyruns. iOS'sdisconnect()is already synchronous (shutdown(2)→ join receive loop →close(2)), so no new mechanism is needed. The comment also points at theRCTBridgeWillReloadNotification/RCTJavaScriptWillStartLoadingNotificationfallback to add if a future SDK upgrade reintroduces a leak.Test plan
Backend (runnable now:
cd backend && npm test):SocketMessagePortemits'close'exactly once when the underlying socket closesclose()is idempotent (no double-fire)pair()helperAndroid (instrumented test, requires device/emulator):
closeReleasesSocketSynchronously— blocks the mock server onread(), callsipc.close(), asserts the read unblocks with EOF inside 1 s. Pins the synchronous-close contract.NodeJSIPCTestsuite (regression check)iOS (Swift Package + example-app integration tests, require Xcode):
swift testfromios/(ComapeoCoreTests) — regression check; no new tests added since the iOS change is comment-onlyexample/tests/iosintegration suite — regression check thatOnDestroyactually runs on real device JS reload (couldn't verify on Linux CI)Manual:
comapeo.invite$.on('invite-received', ...)) in the app, trigger an RN reload, observe that the new JS session's listener fires once per event (not N+1 times for each prior session) and that backend logs no longer show the listener count growing across reloads.Notes for review
node --test, no new dependencies.mainafter refactor: per-component lifecycle state with derived ComapeoState #47 landed; no merge conflicts.🤖 Generated with Claude Code
Generated by Claude Code