Skip to content

Investigate: does ResumeSessionAsync destroy in-flight tool execution? #473

@PureWeen

Description

@PureWeen

Question

When PolyPilot calls ResumeSessionAsync on a session that has tools actively executing on the headless CLI server, does the resume command destroy the running tools?

PR #472 assumes yes and works around it with a poll-then-resume pattern (never calling resume on active sessions). But we have not definitively proven causation, only correlation.

Correlation evidence

Session 4f4f2380 (worker-1, April 1 2026):

06:32:23.027Z tool.execution_start    <- tool running on CLI
06:32:50.928Z session.resume          <- PolyPilot called ResumeSessionAsync
07:03:13.653Z session.shutdown        <- no tool.execution_complete ever arrived

The tool started, resume was called 27s later, and the tool never completed. The session eventually shut down 30 minutes later.

Context: This happened during the ResumeOrchestrationIfPendingAsync flow which called EnsureSessionConnectedAsync then ResumeSessionAsync on a session that was still actively processing.

What we could NOT prove

A controlled CLI-only repro was attempted (April 2) but was inconclusive:

  • Started a copilot CLI session, sent a prompt intended to trigger sleep 120
  • The agent interpreted it differently (ran read_bash with delay instead)
  • Stopped the first CLI, resumed with copilot --resume=<id>
  • The tool completed before the resume (at 81s, not 120s) -- unclear if it was interrupted or timed out naturally

The interactive CLI TUI makes controlled repros difficult. A proper test requires the SDK programmatically (create session, inject a known long-running tool, call resume, check for completion).

Current workaround (PR #472)

The poll-then-resume pattern avoids calling ResumeSessionAsync on active sessions entirely:

  1. IsSessionStillProcessing() detects active sessions via events.jsonl
  2. PollEventsAndResumeWhenIdleAsync polls every 5s for session.shutdown
  3. Only calls ResumeSessionAsync after the CLI finishes
  4. 600s watchdog as safety net

What needs to happen

  1. Write an SDK-level test (Node.js or .NET) that:
    • Creates a session on the headless server
    • Sends a prompt that triggers a long bash tool (sleep 120)
    • Waits for tool.execution_start in the event stream
    • Calls session.resume() while the tool is running
    • Checks whether tool.execution_complete arrives with the correct output
  2. If confirmed: file upstream on github/copilot-cli -- headless mode should support non-destructive reconnect
  3. If NOT confirmed: simplify PR fix: session relaunch resilience + model selection fixes #472 by removing the poll-then-resume and just eager-resuming

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions