Skip to content

Tear down RPC subscriptions on RN context reload#51

Open
gmaclennan wants to merge 5 commits intomainfrom
claude/handle-rn-reload-rpc-Qa65u
Open

Tear down RPC subscriptions on RN context reload#51
gmaclennan wants to merge 5 commits intomainfrom
claude/handle-rn-reload-rpc-Qa65u

Conversation

@gmaclennan
Copy link
Copy Markdown
Member

Summary

When React Native reloads the JS bundle (dev reload, DevSettings.reload(), fast-refresh full reload) — or, on Android, when the main app process restarts — the previous JS session's MapeoClient is gone but its rpc-reflector event-listener subscriptions are still attached to the long-lived MapeoManager on the backend. The backend then keeps emitting events to a peer that no longer has anyone listening, and on every reload the listener set grows.

The fix is to make sure the AF_UNIX socket is actually closed on reload, and that the backend's per-connection cleanup actually runs when the socket closes.

Changes

Backend — SocketMessagePort.close() now emits 'close' (load-bearing)

backend/lib/message-port.js. The constructor wires framedStream.on('close') to call SocketMessagePort.close(), and ComapeoRpcServer already attached messagePort.on('close', () => server.close()) to drive rpc-reflector's per-connection cleanup. But close() never emitted the event, so the server.close() listener was dead and every reconnect leaked the previous session's listeners onto MapeoManager. One-line behaviour fix; the rest of the wiring was already correct.

Android — synchronous IPC teardown on OnDestroy

android/.../NodeJSIPC.kt, android/.../ComapeoCoreModule.kt. Adds NodeJSIPC.close() — a synchronous terminal teardown that closes the LocalSocket on the calling thread and cancels the coroutine scope. ComapeoCoreModule.OnDestroy now calls close() instead of the fire-and-forget disconnect().

Why: Expo modules on Android are bound to the AppContext, whose lifetime is the JS runtime, so OnDestroy fires on every JS context tear-down (android-lifecycle-listeners docs). The previous wiring closed the socket from a launched coroutine, so the new OnCreate could open a fresh connection before the old socket's FD was released — the backend's per-connection cleanup then ran against the wrong (or no) connection. close() closes the FD on the calling thread and cancels the scope so receive/send coroutines don't linger.

disconnect() is unchanged and still used by the FGS side (NodeJSService) and by internal IOException recovery.

iOS — relies on OnDestroy in SDK 55+

ios/ComapeoCoreModule.swift. No mechanism change — only a comment documenting that OnDestroy fires reliably on iOS JS reload as of expo-modules-core PR #33760 (merged Dec 2024, shipped in SDK 53+). Verified directly against the installed expo-modules-core@55.0.23:

  • ios/Core/MainValueConverter.swift:7private(set) weak var appContext: AppContext?
  • ios/Core/ModuleHolder.swift:140deinit { post(event: .moduleDestroy) }

Previously a strong-reference cycle through MainValueConverter pinned AppContext across reloads. With it weak, AppContext deinits on reload, the module registry releases each ModuleHolder, every ModuleHolder.deinit fires .moduleDestroy, and OnDestroy runs. iOS's disconnect() is already synchronous (shutdown(2) → join receive loop → close(2)), so no new mechanism is needed. The comment also points at the RCTBridgeWillReloadNotification / RCTJavaScriptWillStartLoadingNotification fallback to add if a future SDK upgrade reintroduces a leak.

Test plan

Backend (runnable now: cd backend && npm test):

  • SocketMessagePort emits 'close' exactly once when the underlying socket closes
  • close() is idempotent (no double-fire)
  • End-to-end: rpc-reflector listener attached to a fake handler is removed when the socket closes — the contract that was silently broken before this PR
  • Smoke test for the in-test pair() helper

Android (instrumented test, requires device/emulator):

  • closeReleasesSocketSynchronously — blocks the mock server on read(), calls ipc.close(), asserts the read unblocks with EOF inside 1 s. Pins the synchronous-close contract.
  • Existing NodeJSIPCTest suite (regression check)

iOS (Swift Package + example-app integration tests, require Xcode):

  • swift test from ios/ (ComapeoCoreTests) — regression check; no new tests added since the iOS change is comment-only
  • example/tests/ios integration suite — regression check that OnDestroy actually runs on real device JS reload (couldn't verify on Linux CI)

Manual:

  • Subscribe to a backend event (e.g. comapeo.invite$.on('invite-received', ...)) in the app, trigger an RN reload, observe that the new JS session's listener fires once per event (not N+1 times for each prior session) and that backend logs no longer show the listener count growing across reloads.

Notes for review

🤖 Generated with Claude Code


Generated by Claude Code

claude added 5 commits April 30, 2026 14:27
Adds NodeJSIPC.close() — a synchronous terminal teardown — and calls
it from ComapeoCoreModule.OnDestroy in place of the fire-and-forget
disconnect().

Why: rpc-reflector subscriptions registered by the JS client during a
session are stored on the backend keyed by event name and bound to the
underlying socket connection; the cleanup path runs only when the
backend observes the socket closing. The previous wiring closed the
socket from a launched coroutine, so the new OnCreate (which fires
immediately on JS context rebuild — Expo modules are bound to the
AppContext, whose lifetime is the JS runtime, not the Activity) could
open a fresh connection before the old socket's FD was released. The
backend's per-connection cleanup then ran against the wrong (or no)
connection, leaving the prior session's listeners attached to
MapeoManager and emitting events to a peer that no longer existed.

close() closes the LocalSocket on the calling thread and cancels the
coroutine scope so receive/send jobs don't linger across reloads. It's
purely additive: disconnect() is unchanged and still used by the FGS
side (NodeJSService) and by internal IOException recovery.

The OnCreate / OnDestroy block now carries a comment documenting that
these hooks bind to the JS runtime lifetime (verified via Expo's
android-lifecycle-listeners docs), distinguishing them from the
activity-level OnActivityEntersForeground hook which is kept for
FGS-respawn-while-backgrounded recovery.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three coupled fixes for the same underlying issue: rpc-reflector
event-listener subscriptions registered against MapeoManager during a
JS session must be torn down when that session ends, on every
platform.

backend: SocketMessagePort.close() now emits 'close'

The constructor wires `framedStream.on('close')` to call
`SocketMessagePort.close()`, and ComapeoRpcServer's connection handler
attaches `messagePort.on('close', () => server.close())` to drive
rpc-reflector's per-connection cleanup. close() never emitted the
event, so the server.close() listener was dead — every reconnect
leaked the previous session's listeners onto MapeoManager. One-line
fix; the rest of the wiring was already correct.

ios: JSReloadObserver

Expo's `OnDestroy` doesn't fire on iOS JS reload (expo/expo#33655),
so the lifecycle-only path that works on Android leaves stale sockets
attached on every reload here. JSReloadObserver subscribes to
`RCTBridgeWillReloadNotification` (classic arch) and
`RCTJavaScriptWillStartLoadingNotification` (bridgeless / new arch)
and runs a callback synchronously on the posting thread.
ComapeoCoreModule installs one in OnCreate that calls
`ipc?.disconnect()` — synchronous on iOS, so the backend observes EOF
before the notification post returns and the rpc-reflector cleanup
runs against the right connection. The notification names are
hard-coded as strings so the file builds in the macOS-only swift-test
target (React isn't a dependency there).

Tests

backend/test/message-port.test.js (node --test, 4 tests):
  - SocketMessagePort emits 'close' once when the underlying socket
    closes (regression-fails without the emit)
  - close() is idempotent
  - end-to-end: rpc-reflector listener attached to a fake handler is
    removed when the socket closes — the contract that was silently
    broken before this commit
  - smoke test for the in-test pair() helper

ios/Tests/JSReloadObserverTests.swift (Swift Package, macOS):
  - fires for every observed name; ignores others
  - deinit detaches (so leaked module instances on iOS reload don't
    keep firing into a torn-down ipc reference)
  - callback runs synchronously on the posting thread
  - default names cover both classic and bridgeless reload signals
  - integration test wires JSReloadObserver to a live NodeJSIPC +
    MockNodeServer and verifies posting the notification disconnects
    the client and the server reads EOF

android NodeJSIPCTest.closeReleasesSocketSynchronously (instrumented):
  - blocks the mock server on `read()` of the connection, calls
    `ipc.close()`, and asserts the read unblocks with EOF inside 1 s.
    Pins the synchronous-close contract and would catch a regression
    to the fire-and-forget disconnect path that the previous commit
    removed from OnDestroy.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Direct verification of the installed expo-modules-core@55.0.23 source
(via `npm pack` + extract) confirms PR #33760's fix is present:

  package/ios/Core/MainValueConverter.swift:7
    private(set) weak var appContext: AppContext?
  package/ios/Core/ModuleHolder.swift:140
    deinit { post(event: .moduleDestroy) }
  package/ios/Core/AppContext.swift
    weak var reactBridge: RCTBridge?
    _runtime didSet → releaseRuntimeObjects() on nil
    deinit posts .appContextDestroys

With the strong-ref cycle through MainValueConverter broken, the old
AppContext is no longer pinned across reloads. On every JS reload it
deinits, releases the module registry, every ModuleHolder deinits,
each fires .moduleDestroy, every module's OnDestroy hook runs.

The previous commit's JSReloadObserver was correct for pre-#33760
SDKs but is redundant against the version actually shipped here.
Removing it reduces the number of teardown paths to one per platform
(OnDestroy on both Android and iOS) and avoids confusing future
maintainers with two mechanisms that mean the same thing.

The OnDestroy comment now documents the verification and points at the
RCT-notification fallback that should be reintroduced if a future SDK
upgrade or other strong-ref leak reintroduces the bug.

What stays from the previous commit:
  - backend SocketMessagePort.emit('close') fix (load-bearing —
    rpc-reflector cleanup runs from this event)
  - backend tests
  - android NodeJSIPC.close() instrumented test

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Previous wording said the strong-reference cycle was "broken" — which
read as "buggy / non-functional" rather than the intended "severed /
no longer exists". Reworded to spell out the cause-and-effect in plain
English: cycle → AppContext pinned → ModuleHolders pinned → deinit
chain never runs → OnDestroy never fires; PR #33760 makes the
reference weak so the cycle goes away and the chain runs as expected.

No code change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces the `sys.boot_completed`-only readiness check with a probe
that actually exercises the services we're about to call. On the
flaky run that motivated this:

  ##[command] adb wait-for-device shell 'while [[ -z $(getprop sys.boot_completed) ]]; do sleep 1; done'
  ##[command] adb shell settings put global window_animation_scale 0.0
  cmd: Failure calling service settings: Broken pipe (32)
  ##[error] The process '/usr/bin/sh' failed with exit code 224

`sys.boot_completed` flipped as soon as zygote was alive — which on a
snapshot restore happens before `system_server` finishes binding the
`settings`, `package`, and `activity` services. The wait-for-device
loop returned in ~14ms because the property was already set, and the
very next `settings put` crashed with `Broken pipe (32)`.

The new `wait_for_emulator` helper polls until all three of:
  - `sys.boot_completed == 1`
  - `pm path android` returns successfully (package manager bound)
  - `cmd settings list global` returns successfully (settings service bound)

are true, with a 120s hard ceiling and a debug dump of `getprop |
grep boot|svc` if it times out. One retry on each `settings put`
covers a residual race where the probe sees `list` succeed but a
write transaction loses to first-time initialisation.

`disable-animations: false` is unchanged — emulator-runner's own
`input keyevent 82` path still crashes the emulator on this image,
which is the original reason the workflow drives the settings
directly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants