Context
The Copilot SDK/CLI emits premature session.idle events during long tool-executing turns, causing multi-agent workers to be collected with truncated responses. PR #375 added an elaborate workaround (RecoverFromPrematureIdleIfNeededAsync) using:
ManualResetEventSlim signal set by re-arm events after premature idle
events.jsonl mtime freshness detection (< 15s = CLI still writing)
- Multi-round recovery loop with
bestResponse accumulation
- 120s recovery timeout
Why this is a hack
- Relies on filesystem side-channel (file mtime) to determine SDK state
- Normal completions can stall ~15s while freshness detection runs
- Multi-round loop adds complexity and edge cases (OCE handling, bestResponse scoping)
- The root cause is the SDK emitting
session.idle before the turn is actually complete
Additional finding: backgroundTasks field is inconsistently populated (Mar 2025)
The SessionIdleEvent.Data.BackgroundTasks field (agents/shells arrays) is not reliably populated when sub-agents are running. Within the same worker session, some session.idle events arrive with empty backgroundTasks while others correctly list running agents. This causes:
session.idle arrives with empty backgroundTasks → IDLE-DEFER logic does NOT trigger → CompleteResponse fires → IsProcessing=false (premature)
- Sub-agent finishes → new
TurnStartEvent arrives → EVT-REARM re-sets IsProcessing=true
- Cycle repeats multiple times per worker turn (observed 3x in a single turn)
Evidence from PR Review Squad worker-5 (2026-03-25):
13:26:02 [EVT] worker-5 TurnStart (IsProcessing=True) ← sub-agents launched
13:29:08 [EVT] worker-5 TurnEnd (IsProcessing=False) ← premature idle fired, no backgroundTasks
13:29:08 [EVT-REARM] worker-5 re-arming IsProcessing ← sub-agent finished, new turn
13:31:18 [EVT] worker-5 TurnEnd (IsProcessing=False) ← premature idle AGAIN
13:31:18 [EVT-REARM] worker-5 re-arming IsProcessing
13:34:23 [EVT] worker-5 TurnEnd (IsProcessing=False) ← premature idle AGAIN
13:34:23 [EVT-REARM] worker-5 re-arming IsProcessing
13:35:08 [IDLE-DEFER] worker-5 session.idle with active ← THIS time backgroundTasks IS populated
background tasks — deferring completion
The EVT-REARM mechanism recovers correctly every time, so no data is lost — but the UI spinner flickers (see #395) and diagnostics are confusing.
Proposed SDK fixes (either would resolve this)
- Don't emit
session.idle while background tasks are active. The SDK clearly tracks them (it populates the field sometimes). Hold the idle event until agents/shells are truly empty.
- Add a turn ID / correlation token to
session.idle. Consumers could match idle events to the prompt that triggered them, distinguishing real completion from stale/premature idle.
Option 1 eliminates the entire class of premature idle bugs. Option 2 is more general and helps with other edge cases too.
Proposed investigation
- File upstream issue with Copilot SDK team documenting the premature idle behavior
- Request turn-scoped terminal events or turn IDs so consumers can distinguish real vs premature idle
- If upstream fix is not forthcoming, evaluate whether the workaround can be simplified (e.g., just use turn ID matching)
Priority
Medium — the workaround works but adds resilience debt in the most critical orchestration path.
Context
The Copilot SDK/CLI emits premature
session.idleevents during long tool-executing turns, causing multi-agent workers to be collected with truncated responses. PR #375 added an elaborate workaround (RecoverFromPrematureIdleIfNeededAsync) using:ManualResetEventSlimsignal set by re-arm events after premature idleevents.jsonlmtime freshness detection (< 15s = CLI still writing)bestResponseaccumulationWhy this is a hack
session.idlebefore the turn is actually completeAdditional finding:
backgroundTasksfield is inconsistently populated (Mar 2025)The
SessionIdleEvent.Data.BackgroundTasksfield (agents/shells arrays) is not reliably populated when sub-agents are running. Within the same worker session, somesession.idleevents arrive with emptybackgroundTaskswhile others correctly list running agents. This causes:session.idlearrives with emptybackgroundTasks→ IDLE-DEFER logic does NOT trigger →CompleteResponsefires →IsProcessing=false(premature)TurnStartEventarrives →EVT-REARMre-setsIsProcessing=trueEvidence from PR Review Squad worker-5 (2026-03-25):
The EVT-REARM mechanism recovers correctly every time, so no data is lost — but the UI spinner flickers (see #395) and diagnostics are confusing.
Proposed SDK fixes (either would resolve this)
session.idlewhile background tasks are active. The SDK clearly tracks them (it populates the field sometimes). Hold the idle event until agents/shells are truly empty.session.idle. Consumers could match idle events to the prompt that triggered them, distinguishing real completion from stale/premature idle.Option 1 eliminates the entire class of premature idle bugs. Option 2 is more general and helps with other edge cases too.
Proposed investigation
Priority
Medium — the workaround works but adds resilience debt in the most critical orchestration path.