Test coverage follow-ups from per-component lifecycle refactor (#47)

Umbrella tracking issue for test-coverage gaps identified during the audit of #47 (per-component lifecycle state with derived `ComapeoState`). Each item below is independent; they're grouped here so a single agent (or a human picking up the work) has the full picture and can break them into sub-issues sized to whoever takes them.

## Context

The branch under #47 introduces:
- A pure `deriveState(NodeRuntime, BackendState, stopRequested) → State` function on both platforms.
- A new `stopping` control frame (Node → native) for graceful-shutdown attribution.
- A new `error-native` control frame (Android FGS → backend) for cross-process error attribution.
- `SimpleRpcServer.broadcast(msg)` (generalised from `broadcastError`).
- Per-instance terminal `ERROR` (`start()` / `stop()` refused once entered).
- `lastError` cleared on fresh `start()`, preserved across `cleanup()`.

What's tested in #47 (post-audit additions):
- Pure derivation truth-table on both platforms (`DeriveStateTests`/`Test`).
- `Stopping` frame parse on both platforms.
- iOS: unexpected node exit → ERROR; backend `error` frame → ERROR; rootkey load failure → ERROR; `stopping` frame → STOPPING → STOPPED.
- iOS: `start()` from ERROR refused (existing `testStartFromErrorStateIsRejected`).

What's NOT tested — the items in this issue.

Architectural reference: `docs/ARCHITECTURE.md` §5 (lifecycle state machine), §5.5 (errors), §5.7 (timeout topology).

---

## Item 1: Backend (Node) test harness

The backend (`backend/`) has zero test infrastructure today (`npm test` runs `expo-module test`, which doesn't cover the Node-side code). Several behaviours introduced in #47 live entirely on the Node side and have no tests:

**1a. `error-native` handler — malformed-input rejection** (`backend/index.js:140-152`)
Should ignore (with a `console.warn`) frames where `phase` or `message` is not a string. Expected: no call to `handleFatal`, no `process.exit`, no `error` broadcast.

**1b. `error-native` handler — valid input → re-broadcast + exit** (`backend/index.js:140-152`)
A well-formed `{type:\"error-native\", phase, message}` frame should:
1. Call `handleFatal(phase, new Error(message))`.
2. `handleFatal` calls `controlIpcServer.broadcastError({phase, message, stack})` — every connected client receives an `error` frame.
3. After a 100ms flush window, `process.exit(1)` runs.

**1c. `shutdown` handler — ordering invariant** (`backend/index.js:104-118`)
The first line of the shutdown handler is `controlIpcServer.broadcast({ type: \"stopping\" })`. The contract is that this frame reaches all connected clients **before** the socket close, because native's exit-classification logic uses the presence of `stopping` to distinguish graceful from unexpected disconnects (see ARCHITECTURE.md §5.4).

**1d. `SimpleRpcServer.broadcast(msg)` and `broadcastError(payload)`** (`backend/lib/simple-rpc.js:131-165`)
- `broadcast`: posts to every connected client, doesn't throw if a single client errors (logs and continues).
- `broadcastError`: same, with the `{type:\"error\", ...payload}` shape.
- Neither replays to late-connecting clients (intentional — see ARCHITECTURE.md §3.1).
- `setReadinessPhase` DOES replay (already implicitly covered by the iOS handshake tests, but a direct unit test would be tighter).

### Infrastructure needed

A Node-side test runner. Suggested approach:
- Use `node:test` (built into Node 24, which is the dev runtime per `package.json`'s `devEngines`).
- Add a `backend:test` npm script.
- For 1a/1d: in-process tests using `SimpleRpcServer` directly with a pair of `MessagePort`-style mocks or a real AF_UNIX socket pair on `/tmp`.
- For 1b/1c: subprocess tests — spawn `backend/index.js` as a child process, feed it framed messages over a real socket, observe broadcasts and exit code. The existing `MockNodeServer` shape on iOS is a useful reference for the wire format.

### Acceptance criteria
- `npm run backend:test` exists and runs in CI (extend `.github/workflows/`).
- Coverage for 1a–1d above.
- Tests are fast (<5s total) so they can run on every PR.

---

## Item 2: Android `start()` / `stop()` refused from ERROR

`NodeJSService.start()` (`android/src/main/java/com/comapeo/core/NodeJSService.kt:345-362`) and `stop()` (`:559-575`) both check `getState()` and refuse to proceed from ERROR. iOS has equivalent guards and they're tested by `testStartFromErrorStateIsRejected` and the body of `testStopWhenAlreadyStoppedIsIgnored`. Android has no equivalent test.

### Constraints
- JVM unit tests can't construct `NodeJSService` because the companion-object `init` block calls `System.loadLibrary(\"comapeo-core-react-native\")`, which fails outside a device/emulator.
- The `LifecycleStateDerivation` extraction in #47 was specifically to keep `deriveLifecycleState` JVM-testable; the rest of the class isn't.

### Suggested approach
Either:
- **Instrumented test** in `apps/example/tests/android/`. Construct `NodeJSService`, drive it to ERROR (e.g. via the watchdog by passing a tiny `startupTimeoutMs` and never sending `started`), assert subsequent `start()` throws and `stop()` no-ops. Cost: +1 instrumented test, runs only on the emulator job in CI.
- **Refactor the guards** out into a small testable function that takes `state: State` and returns a sealed `StartDecision` / `StopDecision`. Pure, JVM-testable. Lower coverage than the instrumented option (doesn't verify the call site uses the function correctly) but vastly cheaper.

### Acceptance criteria
- A failing test would catch \"someone removed the ERROR guard from start() / stop()\".
- Whichever approach is chosen, it covers both methods on Android.

---

## Item 3: Android `ComapeoCoreModule` disconnect-reason logic

`ComapeoCoreModule.kt`'s `Disconnected` handler maps the IPC's pre-disconnect JS state to a post-disconnect outcome (see ARCHITECTURE.md §5.4 table):

```kotlin
when (current) {
    JsState.ERROR -> {}                          // terminal
    JsState.STOPPING, JsState.STOPPED -> setState(JsState.STOPPED)
    JsState.STARTING, JsState.STARTED -> setState(JsState.ERROR, mapOf(
        \"errorPhase\" to \"node-runtime-unexpected\",
        \"errorMessage\" to \"Backend disconnected unexpectedly\",
    ))
}
```

This is the cross-process equivalent of the iOS `nodeRuntime.exited(_, .unexpected)` rule. It's the linchpin of correct ERROR detection on Android: if it's wrong, an unannounced backend crash silently lands in STOPPED, which is the bug #47 was designed to prevent.

### Constraints
Same as Item 2 — JVM unit tests can't construct the module class (it derives from Expo's `Module` which depends on Android framework classes).

### Suggested approach
- **Instrumented test** that:
  1. Boots the FGS in the test process or relies on it being up.
  2. Connects a real `NodeJSIPC` to `control.sock`.
  3. For each pre-disconnect state, drives it (via control frames or by waiting for natural transitions), then closes the FGS-side socket abruptly.
  4. Asserts the JS state observer fires with the expected outcome.

The harder bit is step 1 — needs the FGS to actually be running with a known initial state. Easier path: extract the disconnect-reason mapping into a pure `disconnectReason(JsState): JsState` function and test that directly. Pair with an integration test that just spot-checks one path end-to-end.

### Acceptance criteria
- The three branches (ERROR / STOPPING-or-STOPPED / STARTING-or-STARTED) are each verified.

---

## Item 4: Android `sendErrorNativeFrame` (success + 2s timeout)

`NodeJSService.sendErrorNativeFrame(phase, message)` (`android/.../NodeJSService.kt:545-585`):

```kotlin
serviceScope.launch {
    try {
        val ipc = withTimeoutOrNull(SEND_ERROR_NATIVE_TIMEOUT_MS) {
            ipcDeferred.await()
        }
        if (ipc == null) {
            Log.w(TAG, \"Dropping error-native frame: ...\")
            return@launch
        }
        ipc.sendMessage(payload)
    } catch (e: Exception) { ... }
}
```

Two paths to verify:
- **Success path**: `ipcDeferred` completes with a mock IPC, `sendMessage` is called with the JSON-serialised `{type:\"error-native\", phase, message}` payload.
- **Timeout path**: `ipcDeferred` never completes; after 2s the frame is dropped and a warning is logged. No exception thrown.

### Constraints
Same JNI loading constraint. Plus `serviceScope` and `ipcDeferred` are private.

### Suggested approach
Two options:
1. **Extract** the body into a top-level `suspend fun sendErrorNativeFrame(ipcDeferred: Deferred<NodeJSIPC>, phase: String, message: String, json: Json, timeoutMs: Long): Boolean`. JVM-testable with a mock `NodeJSIPC` interface. The class method becomes a one-liner.
2. **Instrumented test** that drives `NodeJSService` through both paths.

Option 1 is cleaner — the function is logically a free helper; nothing it does requires class state.

### Acceptance criteria
- Both paths covered, including the 2s timeout via `runTest`'s virtual time (`advanceTimeBy(2_001)`).

---

## Item 5: Android `runNode()` catch branch

The catch branch in `runNode()` (`android/.../NodeJSService.kt:442-462`) sets both `backendState = BackendState.Error(...)` and `nodeRuntime = NodeRuntimeState.Exited(_, REQUESTED)` to keep the component triple consistent after the thread unwinds.

### Constraints
Hard to reach in a test. Requires `startNodeWithArguments` (a JNI external) to throw, which is impossible in a JVM unit test without an alternate stub-loaded library.

### Priority
**Low.** Visible state still derives correctly via the backend-error rule (rule 1) even if `nodeRuntime` were left stale; the consistency fix from PR #48 is defensive against future code that might re-derive after another mutation. Concretely: this branch is only ever entered if `startNodeWithArguments` itself throws, which in production means the JNI bridge died — at that point most of the rest of the test infra is also broken.

### Suggested approach (if pursued)
- Instrumented test where the JS index file is deliberately malformed in a way that causes the native runtime to throw at startup. Brittle and platform-version-sensitive.
- Or accept that this branch is exercised only in production and skip a test.

### Acceptance criteria
- Either: a test that causes `runNode`'s catch branch to fire and verifies the component triple ends consistent.
- Or: explicit decision (in this issue or its sub-issue's resolution) that the path is too costly to test and we accept the static-analysis-only coverage.

---

## Suggested sub-issue layout

If splitting:
- **Sub-issue A**: Item 1 (backend test harness). Largest piece — sets up new infrastructure, then implements 1a–1d on top.
- **Sub-issue B**: Items 2 + 3 + 4 grouped (\"Android testability\"). They share the JNI-loading constraint and benefit from a coherent decision on instrumented-vs-extract-and-unit-test.
- **Sub-issue C**: Item 5 (or close as wontfix with rationale).

Each sub-issue should reference back here and link to the relevant ARCHITECTURE.md sections.

---

## Pointers for whoever picks this up

- ARCHITECTURE.md §5.5 (Errors) is the design reference for which paths exist and what phase strings each uses.
- ARCHITECTURE.md §5.7 (Timeout topology) is the reference for the 2s `error-native` timeout (Item 4) and other timeout interactions.
- The iOS `MockBackend` (`ios/Tests/Helpers/MockBackend.swift`) is a useful pattern for what a backend-side test fixture looks like — wire format, framing, handshake sequencing.
- `LifecycleStateDerivation.kt` is the model for \"extract the testable bit out of the JNI-loading class\" if you go that route on the Android side.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test coverage follow-ups from per-component lifecycle refactor (#47) #50

Context

Item 1: Backend (Node) test harness

Infrastructure needed

Acceptance criteria

Item 2: Android `start()` / `stop()` refused from ERROR

Constraints

Suggested approach

Acceptance criteria

Item 3: Android `ComapeoCoreModule` disconnect-reason logic

Constraints

Suggested approach

Acceptance criteria

Item 4: Android `sendErrorNativeFrame` (success + 2s timeout)

Constraints

Suggested approach

Acceptance criteria

Item 5: Android `runNode()` catch branch

Constraints

Priority

Suggested approach (if pursued)

Acceptance criteria

Suggested sub-issue layout

Pointers for whoever picks this up

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Test coverage follow-ups from per-component lifecycle refactor (#47) #50

Description

Context

Item 1: Backend (Node) test harness

Infrastructure needed

Acceptance criteria

Item 2: Android start() / stop() refused from ERROR

Constraints

Suggested approach

Acceptance criteria

Item 3: Android ComapeoCoreModule disconnect-reason logic

Constraints

Suggested approach

Acceptance criteria

Item 4: Android sendErrorNativeFrame (success + 2s timeout)

Constraints

Suggested approach

Acceptance criteria

Item 5: Android runNode() catch branch

Constraints

Priority

Suggested approach (if pursued)

Acceptance criteria

Suggested sub-issue layout

Pointers for whoever picks this up

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Item 2: Android `start()` / `stop()` refused from ERROR

Item 3: Android `ComapeoCoreModule` disconnect-reason logic

Item 4: Android `sendErrorNativeFrame` (success + 2s timeout)

Item 5: Android `runNode()` catch branch