Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 25 additions & 7 deletions docs/adr/0005-ios-runner-interaction-lifecycle.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,24 @@ uses selectors or text queries to find the semantic `XCUIElement`, but when the
activation taps the resolved center point instead of calling `XCUIElement.tap()`. tvOS remains
focus/remote-driven because tvOS does not support normal coordinate input.

Ready runner sessions are probed with a short `uptime` preflight before command send. The daemon
does not keep or consult a "recent success" health cache. Read-only startup commands still skip that
preflight because the first successful command is the readiness proof for a newly launched runner.
Readiness probe commands skip preflight to avoid recursion.
Ready runner sessions are probed with a short `uptime` preflight before command send. Read-only
startup commands still skip that preflight because the first successful command is the readiness
proof for a newly launched runner. Readiness probe commands skip preflight to avoid recursion.

The daemon may additionally skip the ready-session `uptime` preflight for an explicit allowlist of
mutating interactions (`tap`, `tapSeries`, `longPress`, `drag`, `dragSeries`, `swipe`) when the same
session produced a healthy mutating response — parsed ok and not carrying `runnerFatal` — for the
same `appBundleId` within 5 seconds. This recency lives only on the `RunnerSession` object as
`lastHealthyMutation`, so it dies with every invalidation/restart, and it is recorded only after the
`runnerFatal` check, so sparse AX-fallback snapshots and `runnerFatal` payloads never refresh it.
Snapshots and other read-only responses never count as a health signal. This narrow skip is
permitted now because the future-work precondition below is met: coordinate-first activation removed
the command-induced teardown trigger, and the lifecycle status journal plus the status-before-
invalidate recovery is the teardown-surviving status surface that resolves any ambiguous post-send
failure before invalidation. A transport failure after a skip clears the recency record and is marked
with the skip context; connection-shaped failures (refused, reset, hung up) run status recovery
instead of a blind replay, while timeout-shaped failures propagate with the skip context (the same
classification preflighted sends use).

`uptime` is a direct runner listener probe. It is answered before command journaling, the serial
command execution queue, app activation, and main-thread XCTest dispatch. It should measure only
Expand Down Expand Up @@ -63,9 +77,13 @@ If xcodebuild still exits for another reason, the next command detects the stale
process/liveness checks and avoids the old 15-second graceful-shutdown wait. The remaining latency is
fresh xcodebuild runner startup, not a stale transport stall.

The daemon no longer models recent success as a runner-health signal. That adds one cheap `uptime`
request before ready-session commands, but it removes a false health signal that was observed to be
unsafe.
The daemon no longer models a generic "recent success" cache as a runner-health signal. A proven
healthy mutating response for the same app — recorded only after the `runnerFatal` check and only
for allowlisted interactions — is now a real end-to-end liveness proof (HTTP listener through to the
app target), so a hot loop of allowlisted interactions skips the per-command `uptime` request while
still re-earning each skip from another healthy mutation. The earlier unconditional `uptime` before
every ready-session command remains the default for non-allowlisted commands and after any
invalidation, stale record, app-bundle change, or absent record.

Apps with broken accessibility trees may still be impossible for XCTest to inspect deeply, but one
failed snapshot no longer teaches the runner to keep using a suspect cached app target or to amplify
Expand Down
43 changes: 28 additions & 15 deletions docs/ios-runner-protocol-optimizations.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,21 +41,34 @@ iOS simulator validation:

### 2. Adaptive `uptime` preflight policy

Status: superseded by ADR 0005 for ready-session command execution.

Goal: reduce unnecessary readiness probes only when another health signal proves the runner is still
serving new requests. A recent successful command response is not sufficient proof: React Navigation
dogfood showed XCTest can return a successful tap response and then immediately fail the test runner
while re-resolving a navigation-disappeared element.

Acceptance criteria:

- Existing first-command/startup readiness behavior is preserved.
- Existing failed-preflight stale-session recovery is preserved.
- Repeated hot interactions do not skip `uptime` based on cached recent-success state.
- Commands that still need conservative readiness checks remain preflighted until measured.
- A transport failure after skipping preflight runs status recovery before invalidation.
- Diagnostics expose whether a command used, skipped, or recovered from a readiness preflight.
Status: implemented with guardrails (see ADR 0005). The earlier blanket "recent success" cache was
shipped and then reverted in #702 because XCTest could return a successful tap response and then fail
the runner while re-resolving a navigation-disappeared element, and because sparse AX-fallback
snapshots were cached as healthy state. #702's coordinate-first activation removed that teardown
trigger, so the skip is reintroduced as a structurally narrower "healthy mutation recency" signal.

Goal: skip the per-command `uptime` for hot allowlisted interaction loops only when a proven healthy
mutating response makes the runner's liveness already known, while every uncertain path keeps
preflighting.

Acceptance criteria (as shipped):

- First-command/startup, no-record, stale-record, app-activation-uncertain, and non-allowlisted
(conservative) commands still preflight; readiness probes and read-only startup commands keep
their existing skips.
- Recency is derived only from healthy (parsed ok, non-`runnerFatal`) responses of an explicit
mutating allowlist (`tap`, `tapSeries`, `longPress`, `drag`, `dragSeries`, `swipe`) for the same
`appBundleId`, within a 5s freshness window, and lives only on the session object so it dies with
every invalidation/restart. Snapshots and read-only responses never refresh it.
- A transport failure after a skipped preflight clears the recency record and marks the error with
the skip context (`runnerReadinessPreflightSkipped`, distinct from the restart predicate's
`runnerReadinessPreflightFailed`). Connection-shaped failures run status recovery before
invalidation — never a replay; timeout-shaped failures propagate with the skip context, matching
the existing classification for preflighted sends.
- Diagnostics expose whether a command used, skipped, or recovered from a readiness preflight,
including command type, skip reason, and recency age.
- Measured threshold: 1 runner request per hot allowlisted command after the first, with no increase
in invalidation or failure rate.

iOS simulator validation:

Expand Down
88 changes: 88 additions & 0 deletions src/platforms/ios/__tests__/runner-command-retry.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -628,6 +628,94 @@ test('mutating commands report recovery guidance when completed status has no re
});
});

test('mutating commands run status recovery after transport failure when readiness preflight was skipped', async () => {
const session = makeRunnerSession({ port: 8100, ready: true });

mockEnsureRunnerSession.mockResolvedValueOnce(session);
mockExecuteRunnerCommandWithSession
.mockRejectedValueOnce(
new AppError('COMMAND_FAILED', 'fetch failed', {
runnerReadinessPreflightSkipped: true,
runnerReadinessPreflightSkipReason: 'recent_healthy_mutation',
runnerReadinessPreflightSkippedAgeMs: 1_200,
}),
)
.mockResolvedValueOnce({
lifecycleState: 'completed',
lifecycleResponseJson: JSON.stringify({ ok: true, data: { message: 'tapped' } }),
});

const result = await runIosRunnerCommand(IOS_SIMULATOR, { command: 'tap', x: 120, y: 240 });

assert.deepEqual(result, { message: 'tapped' });
assert.equal(mockInvalidateRunnerSession.mock.calls.length, 0);
assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 2);
const recoveryDiagnostic = mockEmitDiagnostic.mock.calls.find(
([event]) => event.phase === 'ios_runner_command_status_recovery',
)?.[0];
assert.ok(recoveryDiagnostic);
assert.equal(recoveryDiagnostic.data?.readinessPreflightSkipped, true);
assert.equal(recoveryDiagnostic.data?.readinessPreflightSkipReason, 'recent_healthy_mutation');
assert.equal(recoveryDiagnostic.data?.readinessPreflightSkippedAgeMs, 1_200);
});

test('mutating commands include skipped readiness context in lost-response guidance', async () => {
const session = makeRunnerSession({ port: 8100, ready: true });

mockEnsureRunnerSession.mockResolvedValueOnce(session);
mockExecuteRunnerCommandWithSession
.mockRejectedValueOnce(
new AppError('COMMAND_FAILED', 'fetch failed', {
runnerReadinessPreflightSkipped: true,
runnerReadinessPreflightSkipReason: 'recent_healthy_mutation',
runnerReadinessPreflightSkippedAgeMs: 1_200,
}),
)
.mockResolvedValueOnce({ lifecycleState: 'completed' });

await assert.rejects(
() => runIosRunnerCommand(IOS_SIMULATOR, { command: 'tap', x: 120, y: 240 }),
(error: unknown) => {
assert.ok(error instanceof AppError);
assert.match(String(error.details?.hint), /^This hot command skipped the uptime preflight/);
assert.equal(error.details?.readinessPreflightSkipped, true);
assert.equal(error.details?.readinessPreflightSkipReason, 'recent_healthy_mutation');
assert.equal(error.details?.readinessPreflightSkippedAgeMs, 1_200);
return true;
},
);

assert.equal(mockInvalidateRunnerSession.mock.calls.length, 0);
});

test('mutating commands keep conservative invalidation for skipped-preflight failures with unknown lifecycle', async () => {
const session = makeRunnerSession({ port: 8100, ready: true });

mockEnsureRunnerSession.mockResolvedValueOnce(session);
mockExecuteRunnerCommandWithSession
.mockRejectedValueOnce(
new AppError('COMMAND_FAILED', 'fetch failed', {
runnerReadinessPreflightSkipped: true,
runnerReadinessPreflightSkipReason: 'recent_healthy_mutation',
runnerReadinessPreflightSkippedAgeMs: 1_200,
}),
)
.mockResolvedValueOnce({ lifecycleState: 'paused' });

await assert.rejects(() =>
runIosRunnerCommand(IOS_SIMULATOR, { command: 'tap', x: 120, y: 240 }),
);

assert.deepEqual(mockInvalidateRunnerSession.mock.calls, [
[session, 'transport_error_after_command_send'],
]);
assertDiagnosticDecision({
decision: 'retained',
reason: 'unknown_lifecycle_state',
lifecycleState: 'paused',
});
});

test('mutating commands preserve runner failure details from status recovery', async () => {
const session = makeRunnerSession({ port: 8100, ready: true });

Expand Down
Loading
Loading