[External] Copilot CLI: session.idle should be persisted to events.jsonl

## Summary

When PolyPilot resumes or reconnects after missing the live event stream, there is no durable, authoritative way to determine whether the previous turn completed successfully.

`session.idle` appears to be the canonical live "turn is done / session is now idle" signal, but it is ephemeral and not written to `events.jsonl`. That leaves restart/reconnect logic relying on inference instead of a persisted completion marker.

## Why this is a real API/CLI gap

A resumed client that missed the live stream can currently combine coarse session metadata plus persisted artifacts on disk. The relevant signals are:

| Event / signal | Persisted? | Reliably means the turn is done? |
|---|---:|---:|
| `session.idle` | ❌ No | ✅ Yes |
| `assistant.turn_end` | ✅ Yes | ❌ No |
| `assistant.message` | ✅ Yes | ❌ No |
| `session.error` | ✅ Yes | ✅ Yes, but only for failures |
| `session.shutdown` | ✅ Yes | ✅ Yes, but only for shutdown |
| `SessionMetadata.ModifiedTime` | ✅ Yes | ❌ No, only a coarse hint |

The key issue is that **the authoritative success-path completion signal is not durably available after reconnect**.

`assistant.turn_end` is not a substitute: it can occur between tool rounds and before subsequent reasoning/tool activity. `ModifiedTime` is useful as a hint, but it does not answer whether the turn is actually finished.

## Why `session.idle` is special among ephemeral events

Most ephemeral events are understandable as live-only because they have a persisted counterpart:

- `assistant.message_delta` → persisted `assistant.message`
- `assistant.reasoning_delta` → persisted `assistant.reasoning`
- `tool.execution_partial_result` / `tool.execution_progress` → persisted `tool.execution_complete`

`session.idle` is different: **it has no persisted counterpart at all**. That is what makes it problematic for restart/reconnect flows.

## What PolyPilot has had to do to work around this

Some past stuck-session bugs were genuinely in PolyPilot and have been fixed. This issue tracks the remaining external gap that still forces a large workaround stack even after those local fixes.

Today PolyPilot has to do all of the following just to approximate "did this turn finish?":

1. **Tail-scan `events.jsonl` and analyze sub-turn structure.** `IsSessionStillProcessing()` cannot trust `assistant.turn_end`, so it scans the event tail and walks backward within the current sub-turn (`IsCleanNoToolSubturn`) to decide whether more tool rounds are likely coming.
2. **Run a multi-tier processing watchdog.** `RunProcessingWatchdogAsync()` uses different timeout regimes for resumed sessions, normal inactivity, tool-heavy turns, and deferred-idle/background-task cases. This exists largely to compensate for the missing durable completion marker.
3. **Handle `session.idle` with active `backgroundTasks` as a special deferral state.** PolyPilot has to defer completion when idle arrives with active agents/shells, track carryover/zombie background tasks, and sometimes re-arm `IsProcessing` if a later idle arrives after state was already cleared.
4. **Use file-growth and mtime heuristics.** For multi-agent sessions, PolyPilot checks whether `events.jsonl` is still growing. If mtime stays fresh but file size stops increasing for multiple checks, it assumes the connection is dead and moves to recovery.
5. **Force-complete sessions to prevent infinite spinners.** When the heuristics say the event stream is dead, PolyPilot flushes any partial response, adds a system warning, and force-completes the session rather than leaving the UI stuck in "Thinking…" forever.
6. **Scan external session directories and lock files.** A background `ExternalSessionScanner` polls session-state folders and lock PIDs to infer whether orphaned sessions are probably still active after restart.
7. **Maintain extensive regression coverage.** A nontrivial amount of test coverage now exists purely to keep these resume/watchdog/idle heuristics from regressing.

This is the downstream cost of the missing persisted completion signal: a basic resume question becomes a mix of log parsing, timeout tuning, file-system probing, and recovery heuristics.

## Proposed fix

The simplest fix is:

- **Persist `session.idle` to `events.jsonl`.**

## Acceptable alternative fixes

If there is a strong product reason to keep `session.idle` ephemeral, then an equivalent persisted signal is still needed. Any of these would address the underlying problem:

1. add a persisted `session.turn_complete` / `session.ready` event,
2. include explicit completion state in `session.resume` / resume metadata,
3. persist a final session-status snapshot that can be read after reconnect.

The important point is not the event name; it is that **there must be some persisted, authoritative completion marker** for resumed clients.

## Why this seems worth addressing upstream

There is already prior evidence that this lifecycle edge is fragile: SDK workarounds have had to synthesize `session.idle` when the CLI omits it after `assistant.turn_end` in some flows. Persisting a definitive completion marker would remove a whole class of resume/reconnect heuristics from downstream clients.

## Scope

- Affects the CLI event log written under `~/.copilot/session-state/<session-id>/events.jsonl`
- Affects any SDK/app that restores sessions across restart, reconnect, crash recovery, or transport recreation

---

**Upstream tracking issue:** [github/copilot-cli#2596](https://github.com/github/copilot-cli/issues/2596)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[External] Copilot CLI: session.idle should be persisted to events.jsonl #538

Summary

Why this is a real API/CLI gap

Why `session.idle` is special among ephemeral events

What PolyPilot has had to do to work around this

Proposed fix

Acceptable alternative fixes

Why this seems worth addressing upstream

Scope

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Event / signal	Persisted?	Reliably means the turn is done?
`session.idle`	❌ No	✅ Yes
`assistant.turn_end`	✅ Yes	❌ No
`assistant.message`	✅ Yes	❌ No
`session.error`	✅ Yes	✅ Yes, but only for failures
`session.shutdown`	✅ Yes	✅ Yes, but only for shutdown
`SessionMetadata.ModifiedTime`	✅ Yes	❌ No, only a coarse hint

[External] Copilot CLI: session.idle should be persisted to events.jsonl #538

Description

Summary

Why this is a real API/CLI gap

Why session.idle is special among ephemeral events

What PolyPilot has had to do to work around this

Proposed fix

Acceptable alternative fixes

Why this seems worth addressing upstream

Scope

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Why `session.idle` is special among ephemeral events