Skip to content

test: comprehensive integration + coverage tests for session-coordinator and session-runtime#142

Merged
teng-lin merged 33 commits intomainfrom
worktree-test-coverage
Feb 25, 2026
Merged

test: comprehensive integration + coverage tests for session-coordinator and session-runtime#142
teng-lin merged 33 commits intomainfrom
worktree-test-coverage

Conversation

@teng-lin
Copy link
Owner

@teng-lin teng-lin commented Feb 25, 2026

Summary

Comprehensive test coverage expansion: integration tests for session lifecycle, plus targeted branch coverage tests for 20+ previously under-covered source files. Also includes CI fixes and PR review improvements.

What's Included

Integration Tests (4 files, 34 tests)

  • coordinator-runtime-integration (8 tests): Policy dispatch (idle_reap, reconnect_timeout, capabilities_timeout), lease guards, closeSessionInternal error handling, model propagation
  • session-runtime-capabilities (10 tests): CAPABILITIES_INIT flow with timeout/dedup/unsupported adapters, orchestrateSessionInit with/without git, lifecycle state clearance on SESSION_CLOSING
  • session-runtime-commands (10 tests): Message queue ops, adapter updates, permission flow, rate limiting, consumer/passthrough routing, presence queries
  • session-runtime-orchestration (6 tests): Git-based orchestration, control responses, backend connection cleanup, dirty state debouncing

Branch Coverage Tests (20+ files, 100+ new tests)

Core policies & session:

  • idle-policy (9 tests): Double-start guard, stop()-during-sweep race, null snapshot, no-domainEvents sweep
  • reconnect-policy (12 tests): Batch failure logging, cleanup teardown, domain event clearing, archived session skip
  • node-process-manager (4 tests): Polling deadline, EPERM re-poll, fast-resolve, signal verification
  • consumer-gatekeeper (4 tests): Auth timeout, socket-closed-during-auth race, rate-limit defaults, factory caching

Adapters:

  • codex-session (8 tests): RPC timeout, thread reset, notification routing, trace application
  • codex-adapter (4 tests): Launcher error handler, createSlashExecutor true branch
  • codex-message-translator (6 tests): Translation edge cases and fallback paths
  • agent-sdk-session (15 tests): Null-query guard, system:init session_id, stream error catch, inputDone guard, canUseTool paths
  • opencode-adapter (7 tests): httpClient guard, null/string address in reserveEphemeralPort, SSE retry abort paths
  • opencode-message-translator (9 tests): tool/step-start/step-finish cases, session.compacted
  • prometheus-metrics-collector (2 tests): consumer/backend disconnected decrements
  • acp-adapter (3 tests): stdin/stdout null guard
  • file-storage (7 tests): safeJoin path traversal, base-with-slash ternary, non-.tmp file skip, non-UUID loadAll skip

Daemon & utils:

  • daemon-supervisor (7 tests): Process supervision lifecycle branches
  • lock-file (2 tests): Non-ENOENT unlink errors (EPERM re-throw)
  • state-file (2 tests): Non-Error reject value in writeState catch
  • state-migrator (5 tests): Migration chain gap, array field true branches, null fallback

Other:

  • message-tracer (20 tests): All tracer methods and edge cases
  • slash-command-chain (5 tests): Pipeline execution, error handling, guard branches
  • team-state-reducer (4 tests): Reducer state transitions
  • pairing (2 tests): Wrong-length unsealed key, wrong-length pk in parsePairingLink
  • backend-recovery-service (2 tests): adapterName ?? "unknown" fallback, stopped guard in timer callback
  • cloudflared-manager (5 tests): scheduleRestart timer, handleData after urlFound, onError/onExit after URL found, buildArgs without metricsPort

Code Quality Fixes

Addressed 7 high-priority findings from dual code review (Claude + Gemini CLI):

Severity Issue Fix
CRITICAL reconnectController.deps.bridge private field chain Direct applyPolicyCommandForSession call
HIGH idle_reap test missing lifecycle precondition Added expect(lifecycle).toBe("awaiting_backend")
HIGH Silent if(backendSession) setup guard Unconditional expect(backendSession).not.toBeNull()
HIGH SESSION_CLOSING test checks wrong event type Assert capabilities:timeout event
HIGH Insufficient Promise flush → 4× flushPromises()
HIGH Misleading test title Corrected to document reconnectTimer === null guard
HIGH killSpy.toHaveBeenCalled() too weak Filter on signal === 0

CI & PR Review Fixes

  • TS2742 fix: Explicit return type annotations in session-runtime-test-helpers.ts
  • vi.runAllTicks(): Deterministic tick drain replacing double Promise.resolve()
  • vi.waitUntil(): Event-based polling replacing 4× flushPromises()

Test Results

  • 33 commits vs main — 28 files changed, 5976 insertions
  • All tests passing across full suite
  • Branch coverage: 90% threshold maintained across all covered source files
  • 100+ new tests covering critical failure scenarios, race conditions, and resource cleanup

Testing

pnpm test                    # Full suite — all tests pass
pnpm test:e2e:smoke         # Smoke tests with real binary

🤖 Generated with Claude Code

Cover CAPABILITIES_INIT_REQUESTED (no backend, unsupported adapter, dedup,
timer timeout, SESSION_CLOSING clears timer), orchestrateSessionInit
(with/without capabilities, git resolver), and CAPABILITIES_APPLIED
(commands registration).
…nd lifecycle

Tests cover:
- applyPolicyCommandForSession: idle_reap, reconnect_timeout, capabilities_timeout
- withMutableSession lease guard skips fn for nonexistent sessions
- closeSessionInternal logs warning when backend close() throws
- createSession model propagation to session snapshot state
- onProcessSpawned relay handler seeds cwd, model, and adapterName
Cover the two previously uncovered code paths:
- Lines 73-76: logger.warn called when Promise.allSettled yields a rejected
  result during batch relaunch (verifies allSettled semantics — other sessions
  still processed when one fails)
- Lines 99-104: teardownDomainSubscriptions iteration called from stop(),
  ensuring all three domain-event off() calls are made; also covers the
  idempotency guard and the never-started edge case
Covers lines 22-25 (deadline reached while group alive) and lines 31-41
(EPERM polling loop, EPERM-at-deadline, unexpected error code) in the
internal waitForProcessGroupDead helper inside node-process-manager.ts.

Uses vi.useFakeTimers() to control setTimeout-driven polling without real
delays, and vi.spyOn(process, 'kill') to inject EPERM / ESRCH / EBADF
errors at the signal-0 check-alive call site.
Covers three previously-uncovered lines in consumer-gatekeeper.ts:

- Line 102: clearTimeout called inside cleanup() when auth resolves
  before the timeout fires (verified via spy on globalThis.clearTimeout).

- Line 112: .catch() path where cleanup() returns false — socket was
  already cancelled before the auth promise rejected; returns null
  instead of re-throwing.

- Line 137: fallback rate-limit config (burstSize:20, tokensPerSecond:50)
  when config.consumerMessageRateLimit is absent (undefined) at runtime.
Covers four previously-uncovered code paths in IdlePolicy:

- Line 32: double-start guard — verifies start() is idempotent by asserting
  domainEvents.on is called exactly 3 times even when start() is invoked twice.
- Line 53: double-subscribe guard — verifies ensureDomainSubscriptions() does
  not register duplicate event listeners when eventCleanups is already populated.
- Lines 103-113: runSweep() early-exit guard — uses synchronous
  vi.advanceTimersByTime() to enqueue a sweep without draining microtasks, then
  calls stop() before Promise.resolve() so that runSweep() sees !running and
  returns before touching the bridge; also covers the null-snapshot continue path
  (line 113) with a getSession mock that returns null for one session.
- Line 119: lastActivity null-coalescing — passes a snapshot with no
  lastActivity field and asserts the session is reaped (treated as epoch-old).

Bonus: covers the !domainEvents guard path (line 53 first clause) by
instantiating IdlePolicy without domainEvents and verifying periodic sweep still
runs.
The SESSION_CLOSING timer-cleared test was asserting that no
`session_closed` broadcast occurred after advancing timers, but the
CAPABILITIES_TIMEOUT timer fires `deps.emitEvent("capabilities:timeout",
...)` — not a broadcast. The old assertion passed vacuously regardless
of whether the timer was actually cleared.

Replace the broadcast filter with a check on `deps.emitEvent` for
`capabilities:timeout` calls, which is the real observable side-effect
of the timer firing.
… test

- Finding 1 (CRITICAL): replace three-hop private chain
  `reconnectController.deps.bridge.applyPolicyCommand` with the
  single-hop `applyPolicyCommandForSession` that is already used
  throughout the file, removing the `policyBridge` variable entirely.

- Finding 2 (HIGH): add precondition assertion in the `idle_reap` test
  to verify the session is in `awaiting_backend` lifecycle state before
  the policy command is applied, so a regression will surface at setup
  rather than on the post-condition check.

- Finding 3 (HIGH): replace silent `if (backendSession)` guard with a
  hard `expect(backendSession).not.toBeNull()` assertion so a failure
  to connect the backend causes an immediate, descriptive test failure
  rather than a confusing `logger.warn` assertion mismatch.
…erage tests

Tests 1 and 3 now capture the killSpy return value and filter mock calls
for `(-pid, 0)` — the exact signature used by the polling loop — before
asserting at least one such call occurred. This ensures the polling path
actually executed, consistent with the pattern already used in Test 4.
…coverage

- Replace two Promise.resolve() flushes with four flushPromises() calls
  imported from src/testing/cli-message-factories.ts; the relaunchStaleSessions
  chain has multiple async hops (timer → relaunch reject → Promise.allSettled
  → result iteration → logger.warn) that require more than two microtask turns
- Rename idempotency test title to accurately describe the guard that fires:
  "stop() is idempotent — second call exits early because reconnectTimer is
  already null" (the reconnectTimer null-check in stop(), not the
  teardownDomainSubscriptions empty-array guard)
- Update inline comment to clarify teardownDomainSubscriptions is not reached
  on the second stop() call
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @teng-lin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness and reliability of the session management system by introducing a comprehensive suite of new integration and coverage tests. The added tests target critical failure scenarios, race conditions, and resource cleanup, ensuring that previously untested branches in core policies and managers are now thoroughly validated. Furthermore, several identified code quality issues within the new tests have been resolved, leading to a more stable and well-understood codebase.

Highlights

  • Expanded Integration Test Coverage: Added 24 new integration tests across four files, focusing on core session management, policy dispatch, and process lifecycle handling. This includes tests for idle_reap, reconnect_timeout, capabilities_timeout policies, lease guards, backend session error handling, model propagation, CAPABILITIES_INIT flows, message queue operations, adapter updates, permission flows, rate limiting, consumer/passthrough routing, and Git-based orchestration.
  • Enhanced Coverage Tests for Core Policies and Managers: Introduced 19 new coverage tests across four files to address previously untested branches in IdlePolicy, ReconnectPolicy, NodeProcessManager, and ConsumerGatekeeper. These tests cover edge cases like double-start guards, stop()-during-sweep race conditions, null snapshot handling, batch relaunch failures, process group polling deadlines, EPERM handling, auth timeout cleanup, and fallback rate-limit defaults.
  • Addressed Code Quality Issues: Fixed 7 high-priority code quality issues identified during a dual code review. These fixes include correcting private field access, adding missing lifecycle preconditions, ensuring proper null checks, asserting correct event types, improving Promise flushing, clarifying test titles, and refining mock expectations for process polling.
  • Maintained High Test Coverage: The addition of these new tests ensures that the project maintains its 90% branch coverage threshold, with all 2979 total tests passing.
Changelog
  • src/adapters/node-process-manager-coverage.test.ts
    • Added coverage tests for waitForProcessGroupDead helper, ensuring correct behavior when polling deadlines are reached, EPERM errors occur, or unexpected errors are thrown during process group monitoring.
  • src/core/consumer/consumer-gatekeeper-coverage.test.ts
    • Added coverage tests for ConsumerGatekeeper, verifying auth timeout cleanup, handling of sockets closed during authentication, and the correct application of fallback rate-limit configurations.
  • src/core/coordinator/coordinator-runtime-integration.integration.test.ts
    • Added integration tests for SessionCoordinator and runtime interactions, covering policy dispatch for idle_reap, reconnect_timeout, and capabilities_timeout, withMutableSession lease guards, error handling during backend session closure, and proper propagation of session model and cwd information.
  • src/core/policies/idle-policy-coverage.test.ts
    • Added coverage tests for IdlePolicy, confirming idempotency of start() and subscribeToEvents(), proper exit behavior of runSweep() when stop() is called prematurely, handling of disappearing sessions during sweep, correct reaping of sessions with missing lastActivity, and functionality without domainEvents.
  • src/core/policies/reconnect-policy-coverage.test.ts
    • Added coverage tests for ReconnectPolicy, verifying logging of warnings for failed batch relaunches, comprehensive cleanup of registered event listeners upon stop(), idempotency of stop(), and safe execution of stop() without prior start().
  • src/core/session/session-runtime-capabilities.integration.test.ts
    • Added integration tests for SessionRuntime's capabilities and initialization flow, including handling of CAPABILITIES_INIT_REQUESTED with no backend or unsupported adapters, deduplication of init requests, timeout behavior, and the impact of SESSION_CLOSING on pending initializations. Also tested orchestrateSessionInit with and without Git information, and CAPABILITIES_APPLIED command registration.
  • src/core/session/session-runtime-commands.integration.test.ts
    • Added integration tests for SessionRuntime's inbound command routing, covering queue_message, update_queued_message, cancel_queued_message, presence_query, set_adapter errors on active sessions, warnings for unknown permission request IDs, double-disconnect warnings for CONSUMER_DISCONNECTED, PASSTHROUGH_ENQUEUED storage, and rate limiting logic.
  • src/core/session/session-runtime-orchestration.integration.test.ts
    • Added integration tests for SessionRuntime's orchestration logic, specifically for orchestrateResult with Git information, orchestrateControlResponse delegation, closeBackendConnection functionality (including aborting and broadcasting disconnects), and the debouncing mechanism for markDirty state persistence.
Activity
  • The pull request introduced 8 new test files and 34 total test cases to improve coverage.
  • A dual code review, involving both Claude and Gemini CLI, was conducted.
  • Seven high-priority code quality findings from the review were addressed and fixed within the new tests.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a comprehensive set of integration and coverage tests, significantly improving the test suite's robustness. The new tests cover various edge cases and failure paths in session management, policy dispatch, and process lifecycle handling. My review focuses on improving the reliability of some asynchronous testing patterns. I've suggested replacing brittle timer-based waits with more robust methods like vi.waitUntil and vi.runAllTicks to prevent test flakiness, aligning with our guidelines for deterministic testing.

Comment on lines +152 to +153
await Promise.resolve();
await Promise.resolve(); // two ticks: sweepChain.then wraps runSweep
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using multiple Promise.resolve() calls to advance microtask ticks can be brittle. vitest provides vi.runAllTicks() to flush the microtask queue completely, which is more robust and clearly expresses the intent.

Suggested change
await Promise.resolve();
await Promise.resolve(); // two ticks: sweepChain.then wraps runSweep
await vi.runAllTicks();
References
  1. In tests, replace non-deterministic synchronization methods with deterministic helpers to avoid flakiness.

Comment on lines +103 to +106
await flushPromises();
await flushPromises();
await flushPromises();
await flushPromises();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using multiple flushPromises() calls to wait for asynchronous operations is brittle. If the underlying async chain changes, this test could become flaky. A more robust approach is to wait until the expected side effect occurs using vi.waitUntil.

Suggested change
await flushPromises();
await flushPromises();
await flushPromises();
await flushPromises();
await vi.waitUntil(() => (logger.warn as any).mock.calls.length > 0);
References
  1. In tests, avoid using brittle synchronization methods and instead wait for specific asynchronous events to prevent flakiness.

Cover three previously uncovered branches in slash-command-chain.ts:

- Line 112 (LocalHandler catch): String(err) path when rejection value
  is not an Error instance (plain string and number rejections)
- Line 164 (AdapterNativeHandler.execute): early-return guard when
  adapterSlashExecutor is null at execute time
- Line 180 (AdapterNativeHandler.execute): if (!result) return guard
  when adapter executor resolves null or undefined
Cover lines 111, 225, and 276 — the three null-return branches that were
not exercised by the existing tests:

- Line 111: default case in translateCodexEvent switch (unknown event type)
- Line 225: null return in translateItemAdded when item.type is neither
  message nor function_call (e.g. function_call_output, unknown types)
- Line 276: null return in translateItemDone when item.type is not one of
  the three handled kinds (function_call_output / function_call / message)
Cover the two uncovered branch groups in child-process-supervisor.ts:
- lines 124-125: stopAll() — verified with zero sessions, one session,
  and multiple sessions all stopped concurrently via Promise.all
- lines 137-138: removeSession() — verified session map entry deleted,
  session count decremented, process handle removed, and non-existent
  ID handled without throwing
Added 8 new tests to reconnect-policy-coverage.test.ts covering:
- process:connected and backend:connected events clearing watchdogs (clearOnConnect callback, lines 86/89)
- session:closed event via mock domainEvents (clearOnClose callback)
- ensureDomainSubscriptions guard when no domainEvents dep provided (line 83)
- start() reconnectTimer guard when called twice (line 26)
- start() early-exit when no starting sessions (line 29)
- archived session skip during relaunch (line 65)
- clearWatchdog no-op for unwatched sessionId (line 108)

Branch coverage: 65% -> 95% (target was >=90%).
Bring src/core/messaging/message-tracer.ts branch coverage from 83.97%
to 92.81% (target was ≥90%) by covering the following previously
untested branches:

- error() emit call: parentTraceId forwarding, auto-generated traceId,
  requestId/command/phase/outcome passthrough, and sessionId resolution
  from an existing open trace via resolveSessionId()
- getSeq() MAX_SESSIONS eviction path (pre-fill 1000 session entries to
  trigger the oldest-entry eviction before inserting a new session)
- emit() catch block (line 586): circular-reference body causes
  JSON.stringify to throw, exercising the minimal fallback event path
- sweepStale() defensive break (line 663): monkey-patch Set.values()
  to return undefined so the else-break is exercised
- summary() stale count for matching session (line 411)
- smartSanitize() "type" field array collapse (lines 156-159): arrays
  of >3 objects with "type" (not "role") collapse to "[N messages]"
- roughObjectSize() final return 8 fallback (line 262): Symbol type
  hits the non-string/number/boolean/array/object branch
Cover six previously uncovered branches in codex-session.ts:
- Lines 279-280: requestRpc timeout callback fires when RPC does not respond
- Line 356: resetThread awaits in-flight initializingThread before clearing state
- Line 364: resetThread throws when ensureThreadInitialized leaves threadId null
- Line 657: handleNotification else-branch when translateCodexEvent returns null
- Line 883: translateResponseItem default case for unknown item types
- Line 906: applyTraceToUnified copies requestId to slash_request_id

Branch coverage rises from 88.42% to 91.66% (threshold: ≥90%).
Cover five previously uncovered branches in cloudflared-manager.ts
(reported as buffered-relay-manager in the coverage task):

- Line 207: scheduleRestart() timer fires when stopped=false, calling spawnProcess()
- Line 130: handleData() early-return when urlFound is already true
- Line 152: onError() false-branch when error fires after URL already found
- Line 170: onExit() false-branch when stopped=true suppresses scheduleRestart()
- Line 187: buildArgs() production mode without metricsPort omits --metrics flag

Branch coverage for cloudflared-manager.ts rises from 83.33% to 97.22%.
Cover three previously uncovered branches in opencode-adapter.ts:

- Line 122: connect() guard that throws when httpClient is missing after
  ensureServer() resolves (simulated by mocking ensureServer to resolve
  without setting httpClient)

- Lines 194-195: reserveEphemeralPort() rejection path when server.address()
  returns null or a string (defensive check for non-TCP/unexpected sockets),
  covered via vi.mock('node:net') with a controllable createServer override

- Line 255: runSseLoop() guard that throws when httpClient is undefined
  (called directly before ensureServer has run)

Additional tests for SSE retry loop aborted-signal paths (lines 229, 234)
and the for-await signal.aborted break (line 260) bring branch coverage
from 78.94% to 92.1%, exceeding the ≥90% threshold.
Targets BackendRecoveryService (backend-recovery-service.ts) branch
coverage gaps, lifting it from 88.88% to 100%:

- Line 88: `info.adapterName ?? "unknown"` — exercises the nullish-
  coalesce fallback by reconnecting a no-PID session with no adapterName,
  confirming the logger emits "unknown" in the message.

- Line 132: `if (this.stopped) return` inside scheduleDedupClear's
  timer callback — uses a temporary clearTimeout mock to prevent stop()
  from cancelling the pending timer, then advances fake time past the
  dedup window so the callback fires while stopped === true.
Cover two previously uncovered branches in src/utils/crypto/pairing.ts:
- Line 97: handlePairingRequest returns {success:false} when sealOpen
  succeeds but decrypted payload is not 32 bytes (non-32-byte sealed plaintext)
- Line 168: parsePairingLink throws when decoded public key is not 32 bytes

Statement/line coverage: 85.71% → 100%
Branch coverage: 85.71% → 95.23%
Covers previously uncovered lines 198, 207-226, 269 and extends to
reach ≥90% branch coverage for the file:

- Line 198:     consumeStream() early return when this.query is null
- Lines 207-209: system:init with missing/empty session_id (falsy branch)
- Lines 217-226: catch block — stream throws while session is open
- Line 269:     pushInput() early return when inputDone is true
- Lines 83-84:  send(interrupt) optional chain with null query
- Lines 146-150: startQueryLoop resume option with and without backendSessionId
- Lines 166-181: canUseTool callback allow and deny decision paths
- Lines 259-260: createInputIterable next() waiting branch (sets inputResolve)
- Lines 271-273: pushInput() resolves a pending inputResolve promise
- Lines 283-285: finishInput() resolves pending inputResolve with done:true

Before: 86.79% line coverage (lines 198, 207-226, 269 uncovered)
After:  100% lines, 100% statements, 100% functions, 90.56% branches
@teng-lin teng-lin merged commit 7f001ed into main Feb 25, 2026
6 checks passed
@teng-lin teng-lin deleted the worktree-test-coverage branch February 25, 2026 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant