Skip to content

Test coverage follow-ups from per-component lifecycle refactor (#47) #50

@gmaclennan

Description

@gmaclennan

Umbrella tracking issue for test-coverage gaps identified during the audit of #47 (per-component lifecycle state with derived ComapeoState). Each item below is independent; they're grouped here so a single agent (or a human picking up the work) has the full picture and can break them into sub-issues sized to whoever takes them.

Context

The branch under #47 introduces:

  • A pure deriveState(NodeRuntime, BackendState, stopRequested) → State function on both platforms.
  • A new stopping control frame (Node → native) for graceful-shutdown attribution.
  • A new error-native control frame (Android FGS → backend) for cross-process error attribution.
  • SimpleRpcServer.broadcast(msg) (generalised from broadcastError).
  • Per-instance terminal ERROR (start() / stop() refused once entered).
  • lastError cleared on fresh start(), preserved across cleanup().

What's tested in #47 (post-audit additions):

  • Pure derivation truth-table on both platforms (DeriveStateTests/Test).
  • Stopping frame parse on both platforms.
  • iOS: unexpected node exit → ERROR; backend error frame → ERROR; rootkey load failure → ERROR; stopping frame → STOPPING → STOPPED.
  • iOS: start() from ERROR refused (existing testStartFromErrorStateIsRejected).

What's NOT tested — the items in this issue.

Architectural reference: docs/ARCHITECTURE.md §5 (lifecycle state machine), §5.5 (errors), §5.7 (timeout topology).


Item 1: Backend (Node) test harness

The backend (backend/) has zero test infrastructure today (npm test runs expo-module test, which doesn't cover the Node-side code). Several behaviours introduced in #47 live entirely on the Node side and have no tests:

1a. error-native handler — malformed-input rejection (backend/index.js:140-152)
Should ignore (with a console.warn) frames where phase or message is not a string. Expected: no call to handleFatal, no process.exit, no error broadcast.

1b. error-native handler — valid input → re-broadcast + exit (backend/index.js:140-152)
A well-formed {type:\"error-native\", phase, message} frame should:

  1. Call handleFatal(phase, new Error(message)).
  2. handleFatal calls controlIpcServer.broadcastError({phase, message, stack}) — every connected client receives an error frame.
  3. After a 100ms flush window, process.exit(1) runs.

1c. shutdown handler — ordering invariant (backend/index.js:104-118)
The first line of the shutdown handler is controlIpcServer.broadcast({ type: \"stopping\" }). The contract is that this frame reaches all connected clients before the socket close, because native's exit-classification logic uses the presence of stopping to distinguish graceful from unexpected disconnects (see ARCHITECTURE.md §5.4).

1d. SimpleRpcServer.broadcast(msg) and broadcastError(payload) (backend/lib/simple-rpc.js:131-165)

  • broadcast: posts to every connected client, doesn't throw if a single client errors (logs and continues).
  • broadcastError: same, with the {type:\"error\", ...payload} shape.
  • Neither replays to late-connecting clients (intentional — see ARCHITECTURE.md §3.1).
  • setReadinessPhase DOES replay (already implicitly covered by the iOS handshake tests, but a direct unit test would be tighter).

Infrastructure needed

A Node-side test runner. Suggested approach:

  • Use node:test (built into Node 24, which is the dev runtime per package.json's devEngines).
  • Add a backend:test npm script.
  • For 1a/1d: in-process tests using SimpleRpcServer directly with a pair of MessagePort-style mocks or a real AF_UNIX socket pair on /tmp.
  • For 1b/1c: subprocess tests — spawn backend/index.js as a child process, feed it framed messages over a real socket, observe broadcasts and exit code. The existing MockNodeServer shape on iOS is a useful reference for the wire format.

Acceptance criteria

  • npm run backend:test exists and runs in CI (extend .github/workflows/).
  • Coverage for 1a–1d above.
  • Tests are fast (<5s total) so they can run on every PR.

Item 2: Android start() / stop() refused from ERROR

NodeJSService.start() (android/src/main/java/com/comapeo/core/NodeJSService.kt:345-362) and stop() (:559-575) both check getState() and refuse to proceed from ERROR. iOS has equivalent guards and they're tested by testStartFromErrorStateIsRejected and the body of testStopWhenAlreadyStoppedIsIgnored. Android has no equivalent test.

Constraints

  • JVM unit tests can't construct NodeJSService because the companion-object init block calls System.loadLibrary(\"comapeo-core-react-native\"), which fails outside a device/emulator.
  • The LifecycleStateDerivation extraction in refactor: per-component lifecycle state with derived ComapeoState #47 was specifically to keep deriveLifecycleState JVM-testable; the rest of the class isn't.

Suggested approach

Either:

  • Instrumented test in apps/example/tests/android/. Construct NodeJSService, drive it to ERROR (e.g. via the watchdog by passing a tiny startupTimeoutMs and never sending started), assert subsequent start() throws and stop() no-ops. Cost: +1 instrumented test, runs only on the emulator job in CI.
  • Refactor the guards out into a small testable function that takes state: State and returns a sealed StartDecision / StopDecision. Pure, JVM-testable. Lower coverage than the instrumented option (doesn't verify the call site uses the function correctly) but vastly cheaper.

Acceptance criteria

  • A failing test would catch "someone removed the ERROR guard from start() / stop()".
  • Whichever approach is chosen, it covers both methods on Android.

Item 3: Android ComapeoCoreModule disconnect-reason logic

ComapeoCoreModule.kt's Disconnected handler maps the IPC's pre-disconnect JS state to a post-disconnect outcome (see ARCHITECTURE.md §5.4 table):

when (current) {
    JsState.ERROR -> {}                          // terminal
    JsState.STOPPING, JsState.STOPPED -> setState(JsState.STOPPED)
    JsState.STARTING, JsState.STARTED -> setState(JsState.ERROR, mapOf(
        \"errorPhase\" to \"node-runtime-unexpected\",
        \"errorMessage\" to \"Backend disconnected unexpectedly\",
    ))
}

This is the cross-process equivalent of the iOS nodeRuntime.exited(_, .unexpected) rule. It's the linchpin of correct ERROR detection on Android: if it's wrong, an unannounced backend crash silently lands in STOPPED, which is the bug #47 was designed to prevent.

Constraints

Same as Item 2 — JVM unit tests can't construct the module class (it derives from Expo's Module which depends on Android framework classes).

Suggested approach

  • Instrumented test that:
    1. Boots the FGS in the test process or relies on it being up.
    2. Connects a real NodeJSIPC to control.sock.
    3. For each pre-disconnect state, drives it (via control frames or by waiting for natural transitions), then closes the FGS-side socket abruptly.
    4. Asserts the JS state observer fires with the expected outcome.

The harder bit is step 1 — needs the FGS to actually be running with a known initial state. Easier path: extract the disconnect-reason mapping into a pure disconnectReason(JsState): JsState function and test that directly. Pair with an integration test that just spot-checks one path end-to-end.

Acceptance criteria

  • The three branches (ERROR / STOPPING-or-STOPPED / STARTING-or-STARTED) are each verified.

Item 4: Android sendErrorNativeFrame (success + 2s timeout)

NodeJSService.sendErrorNativeFrame(phase, message) (android/.../NodeJSService.kt:545-585):

serviceScope.launch {
    try {
        val ipc = withTimeoutOrNull(SEND_ERROR_NATIVE_TIMEOUT_MS) {
            ipcDeferred.await()
        }
        if (ipc == null) {
            Log.w(TAG, \"Dropping error-native frame: ...\")
            return@launch
        }
        ipc.sendMessage(payload)
    } catch (e: Exception) { ... }
}

Two paths to verify:

  • Success path: ipcDeferred completes with a mock IPC, sendMessage is called with the JSON-serialised {type:\"error-native\", phase, message} payload.
  • Timeout path: ipcDeferred never completes; after 2s the frame is dropped and a warning is logged. No exception thrown.

Constraints

Same JNI loading constraint. Plus serviceScope and ipcDeferred are private.

Suggested approach

Two options:

  1. Extract the body into a top-level suspend fun sendErrorNativeFrame(ipcDeferred: Deferred<NodeJSIPC>, phase: String, message: String, json: Json, timeoutMs: Long): Boolean. JVM-testable with a mock NodeJSIPC interface. The class method becomes a one-liner.
  2. Instrumented test that drives NodeJSService through both paths.

Option 1 is cleaner — the function is logically a free helper; nothing it does requires class state.

Acceptance criteria

  • Both paths covered, including the 2s timeout via runTest's virtual time (advanceTimeBy(2_001)).

Item 5: Android runNode() catch branch

The catch branch in runNode() (android/.../NodeJSService.kt:442-462) sets both backendState = BackendState.Error(...) and nodeRuntime = NodeRuntimeState.Exited(_, REQUESTED) to keep the component triple consistent after the thread unwinds.

Constraints

Hard to reach in a test. Requires startNodeWithArguments (a JNI external) to throw, which is impossible in a JVM unit test without an alternate stub-loaded library.

Priority

Low. Visible state still derives correctly via the backend-error rule (rule 1) even if nodeRuntime were left stale; the consistency fix from PR #48 is defensive against future code that might re-derive after another mutation. Concretely: this branch is only ever entered if startNodeWithArguments itself throws, which in production means the JNI bridge died — at that point most of the rest of the test infra is also broken.

Suggested approach (if pursued)

  • Instrumented test where the JS index file is deliberately malformed in a way that causes the native runtime to throw at startup. Brittle and platform-version-sensitive.
  • Or accept that this branch is exercised only in production and skip a test.

Acceptance criteria

  • Either: a test that causes runNode's catch branch to fire and verifies the component triple ends consistent.
  • Or: explicit decision (in this issue or its sub-issue's resolution) that the path is too costly to test and we accept the static-analysis-only coverage.

Suggested sub-issue layout

If splitting:

  • Sub-issue A: Item 1 (backend test harness). Largest piece — sets up new infrastructure, then implements 1a–1d on top.
  • Sub-issue B: Items 2 + 3 + 4 grouped ("Android testability"). They share the JNI-loading constraint and benefit from a coherent decision on instrumented-vs-extract-and-unit-test.
  • Sub-issue C: Item 5 (or close as wontfix with rationale).

Each sub-issue should reference back here and link to the relevant ARCHITECTURE.md sections.


Pointers for whoever picks this up

  • ARCHITECTURE.md §5.5 (Errors) is the design reference for which paths exist and what phase strings each uses.
  • ARCHITECTURE.md §5.7 (Timeout topology) is the reference for the 2s error-native timeout (Item 4) and other timeout interactions.
  • The iOS MockBackend (ios/Tests/Helpers/MockBackend.swift) is a useful pattern for what a backend-side test fixture looks like — wire format, framing, handshake sequencing.
  • LifecycleStateDerivation.kt is the model for "extract the testable bit out of the JNI-loading class" if you go that route on the Android side.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions