feat: add GitHub Copilot CLI as first-class agent#701
feat: add GitHub Copilot CLI as first-class agent#701dwizzzle wants to merge 10 commits intoRunMaestro:mainfrom
Conversation
Add copilot-cli agent support with full JSONL output parsing, session storage browsing, error detection, and UI integration. Agent definition: - Binary: copilot, batch mode via -p flag - JSON output: --output-format json (JSONL) - Session resume: --resume SESSION-ID - YOLO mode: --allow-all - Model selection: --model flag Output parser (verified against actual CLI output): - 11 event types: session lifecycle, streaming deltas, tool execution, assistant messages, and result with session ID - Accumulates outputTokens from assistant.message events Session storage: - Reads ~/.copilot/session-state/<uuid>/workspace.yaml for metadata - Parses events.jsonl for message history - Supports pagination, search, and project filtering Also includes: - Error patterns (auth, rate limit, network, token exhaustion) - UI: added to SUPPORTED_AGENTS and wizard agent tiles - copilot-instructions.md for Copilot sessions in this repo Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Greptile SummaryThis PR integrates GitHub Copilot CLI ( Key changes:
Confidence Score: 4/5Safe to merge after fixing the two P1 token-accumulation bugs in the output parser. Two P1 issues exist in the new output parser: accumulatedOutputTokens is never reset between sessions (singleton parser instance), and the accumulated token count is silently discarded when the result event has no usage field. Both produce incorrect usage stats from the second session onward. The session storage and UI changes are clean. Fixing the two accumulator issues is straightforward and does not require architectural changes. src/main/parsers/copilot-cli-output-parser.ts — token accumulator reset and unconditional usage reporting; src/main/storage/copilot-cli-session-storage.ts — sessionId validation before path join. Important Files Changed
Sequence DiagramsequenceDiagram
participant UI as Renderer (UI)
participant Main as Main Process
participant Parser as CopilotCliOutputParser
participant Storage as CopilotCliSessionStorage
participant CLI as copilot binary
UI->>Main: Launch agent (copilot -p "..." --output-format json --allow-all)
Main->>CLI: spawn()
CLI-->>Parser: session.tools_updated (JSONL)
Parser-->>Main: ParsedEvent { type: 'init' }
CLI-->>Parser: assistant.message_delta (streaming)
Parser-->>Main: ParsedEvent { type: 'text', isPartial: true }
CLI-->>Parser: assistant.message (with toolRequests)
Parser-->>Main: ParsedEvent { type: 'tool_use' }
CLI-->>Parser: tool.execution_start / tool.execution_complete
Parser-->>Main: ParsedEvent { type: 'tool_use', toolState }
CLI-->>Parser: assistant.message (text only)
Parser-->>Main: ParsedEvent { type: 'result', text }
Note over Parser: accumulatedOutputTokens += outputTokens
CLI-->>Parser: result (sessionId, usage)
Parser-->>Main: ParsedEvent { type: 'usage', sessionId, usage }
Note over Parser: ⚠ tokens not reset after this point
Main->>UI: Session complete, sessionId stored
UI->>Main: Browse past sessions
Main->>Storage: listSessions(projectPath)
Storage->>Storage: readdir ~/.copilot/session-state/
Storage->>Storage: parseWorkspaceYaml per UUID dir
Storage-->>Main: AgentSessionInfo[]
Main-->>UI: Session list
UI->>Main: Resume session (--resume SESSION-ID)
Main->>CLI: spawn with resumeArgs
|
| readonly agentId: ToolType = 'copilot-cli'; | ||
|
|
||
| // Accumulate output tokens from assistant.message events for usage reporting | ||
| private accumulatedOutputTokens = 0; |
There was a problem hiding this comment.
accumulatedOutputTokens never resets between sessions
CopilotCliOutputParser is registered as a singleton (one instance for the lifetime of the process). The accumulatedOutputTokens field is never reset, so after the first Copilot CLI session ends and a second session starts, the token count from session 1 will be added to session 2's tally. Every subsequent session will report an inflated (and incorrect) output-token count.
Other parsers avoid this by reading token values directly from each event rather than accumulating across sessions. The fix is to reset the counter when the top-level result event is emitted:
case 'result': {
const event: ParsedEvent = {
type: 'usage',
sessionId: msg.sessionId,
raw: msg,
};
event.usage = {
inputTokens: 0,
outputTokens: this.accumulatedOutputTokens,
};
// Reset for next session
this.accumulatedOutputTokens = 0;
return event;
}|
|
||
| // ---- Result (session complete) ---- | ||
|
|
||
| case 'result': { | ||
| const event: ParsedEvent = { | ||
| type: 'usage', | ||
| sessionId: msg.sessionId, | ||
| raw: msg, | ||
| }; | ||
|
|
||
| // Extract usage stats | ||
| if (msg.usage) { | ||
| event.usage = { | ||
| inputTokens: 0, // Copilot CLI doesn't report input tokens | ||
| outputTokens: this.accumulatedOutputTokens, | ||
| // No per-token cost — Copilot uses premium requests model | ||
| }; | ||
| } | ||
|
|
There was a problem hiding this comment.
Accumulated tokens silently dropped when
result event has no usage field
accumulatedOutputTokens is only written into event.usage when msg.usage is truthy. If the result event is missing the top-level usage object (network truncation, CLI version difference, empty session, etc.), the entire per-turn accumulation is discarded and the caller receives a ParsedEvent with type: 'usage' but no event.usage set.
Because extractUsage returns event.usage || null, the caller will receive null, meaning all output-token data for the session is silently lost. Report the accumulated tokens unconditionally:
case 'result': {
const event: ParsedEvent = {
type: 'usage',
sessionId: msg.sessionId,
raw: msg,
};
// Report accumulated output tokens regardless of whether msg.usage exists
event.usage = {
inputTokens: 0,
outputTokens: this.accumulatedOutputTokens,
};
this.accumulatedOutputTokens = 0;
return event;
}| * | ||
| * Phase 1 capabilities are conservative — advanced features (session storage | ||
| * browsing, cost tracking, thinking display) will be enabled in Phase 2 | ||
| * after verifying the JSON output schema. | ||
| */ | ||
| 'copilot-cli': { | ||
| supportsResume: true, // --resume SESSION-ID | ||
| supportsReadOnlyMode: false, // No explicit CLI flag; may use --deny-tool in future | ||
| supportsJsonOutput: true, // --output-format json (JSONL format) | ||
| supportsSessionId: true, // sessionId in 'result' event - Verified | ||
| supportsImageInput: false, // Not documented in CLI reference | ||
| supportsImageInputOnResume: false, | ||
| supportsSlashCommands: true, // /help, /compact, /model, /resume, /usage, etc. | ||
| supportsSessionStorage: true, // ~/.copilot/session-state/<uuid>/ - Verified | ||
| supportsCostTracking: false, // Uses premium requests model, not per-token cost | ||
| supportsUsageStats: true, // outputTokens in assistant.message events - Verified | ||
| supportsBatchMode: true, // -p flag for programmatic execution | ||
| requiresPromptToStart: true, // Requires -p prompt in batch mode |
There was a problem hiding this comment.
Capabilities contradict the Phase 2 parity matrix
supportsSessionStorage: true and supportsResultMessages: true are set here, but the parity matrix in docs/Copilot-CLI-Phase2-Plan.md explicitly marks both as ❌ for "Copilot CLI (Phase 1)".
supportsSessionStorage is fine because CopilotCliSessionStorage is actually implemented in this PR — the plan doc is simply stale on that point. However supportsResultMessages needs a second look: the plan doc marks it as unsupported in Phase 1 and lists it as a Phase 2 TODO. If it truly is unsupported, the flag should remain false to avoid enabling Auto Run prematurely. If it is supported as implemented, the plan doc should be updated to reflect that.
41 tests covering all 11 JSONL event types verified against actual copilot CLI output. Includes: - Session lifecycle: mcp_server_status_changed, mcp_servers_loaded, tools_updated - Conversation: user.message, assistant.turn_start/end, message_delta, message - Tool execution: tool.execution_start, tool.execution_complete - Completion: result with sessionId and usage - Error detection: auth, rate limit, network, exit codes - End-to-end: full session simulations (simple + tool use) - Edge cases: empty deltas, long output truncation, token accumulation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ssues On Windows, long prompts passed as PowerShell CLI args get garbled due to escaping issues with special characters. Changed copilot-cli to use '-p -' (read from stdin) and sendPromptViaStdinRaw=true. Also added sendPromptViaStdinRaw as an agent definition field so other agents can opt into stdin-based prompt delivery. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When an output parser is registered (JSONL agents like copilot-cli, codex, opencode, factory-droid), non-JSON lines from stdout are now suppressed instead of being displayed to the user as raw text. This prevents PowerShell profile banners and MCP server startup messages from cluttering the agent output. Only agents WITHOUT an output parser (terminal, legacy mode) continue to emit raw non-JSON lines, which is correct for terminal sessions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Had a similar PR if you wanted to build on it or take anything from it. :) #566 |
The promptArgs function was being skipped when sendPromptViaStdinRaw=true because the spawner's promptViaStdin guard prevents adding prompt args. But '-p -' isn't prompt text — it's a flag telling copilot to read stdin. Moved to batchModeArgs so it's always present in batch mode. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
copilot -p - means 'prompt is the literal dash character', not 'read from stdin'. When stdin is piped (sendPromptViaStdinRaw), copilot reads it automatically without -p. Removed -p - from batchModeArgs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot CLI outputs MCP server startup messages, PowerShell profile banners, and initialization noise to stderr. These were being displayed via the onStderr renderer handler. Now suppressed for all JSONL agents with output parsers (except Codex which has special stderr handling). Error detection still runs first, so real errors are captured. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two fixes for the raw JSONL display issue: 1. detectErrorFromParsed now handles session.error events (Copilot CLI format: data.message, data.errorType). Previously, session.error wasn't caught because it has no top-level 'error' field, causing the error to fall through to detectErrorFromExit which dumps the full stdoutBuffer in raw. 2. detectErrorFromExit no longer includes stderr/stdout in raw — for JSONL agents the stdoutBuffer contains all parsed JSON events which is noise, not useful error context. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The root cause of raw JSONL display: copilot-cli's --output-format json args didn't match the isStreamJsonMode heuristic check, so the process was treated as batch-JSON (single JSON blob) instead of streaming JSONL. On exit, handleBatchModeExit tried to JSON.parse the entire buffer, failed, and dumped the raw content to the display. Fixed by adding outputParser presence as a signal for stream-json mode. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
82f1fa8 to
be67a81
Compare
P1: Reset accumulatedOutputTokens on result event (singleton parser was carrying tokens across sessions). Report tokens unconditionally even when result event has no usage field. P2: Add UUID validation to session storage public methods to prevent path traversal. Update Phase 2 parity matrix to reflect shipped features. Derive GRID_ROWS from AGENT_TILES.length. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Review Feedback Addressed (commit 05ae36d)Thanks for the thorough review. All 5 issues fixed: P1 - Token accumulator bugs (both fixed)
P2 - Session storage, docs, UI (all fixed)
Tests updated to cover token reset behavior and unconditional usage reporting. All 41 tests pass. |
|
@coderabbitai analyze this PR |
Summary
Adds \copilot-cli\ as a first-class agent in Maestro, on par with Claude Code, Codex, OpenCode, and Factory Droid.
What's included
Core integration (8 modified files, 2 new files):
Output parser — verified against actual Copilot CLI JSONL output:
esult\
Session storage browser — reads ~/.copilot/session-state//:
Error patterns — auth failures, rate limiting, network errors, token exhaustion
UI — Added to \SUPPORTED_AGENTS\ and wizard agent tiles (no more 'Coming Soon')
Testing
Follow-up (Phase 2)
\docs/Copilot-CLI-Phase2-Plan.md\ covers remaining parity items: read-only mode, wizard, group chat moderation.