Skip to content

feat: add GitHub Copilot CLI as first-class agent#701

Open
dwizzzle wants to merge 10 commits intoRunMaestro:mainfrom
dwizzzle:feat/copilot-cli-agent
Open

feat: add GitHub Copilot CLI as first-class agent#701
dwizzzle wants to merge 10 commits intoRunMaestro:mainfrom
dwizzzle:feat/copilot-cli-agent

Conversation

@dwizzzle
Copy link
Copy Markdown

@dwizzzle dwizzzle commented Apr 1, 2026

Summary

Adds \copilot-cli\ as a first-class agent in Maestro, on par with Claude Code, Codex, OpenCode, and Factory Droid.

What's included

Core integration (8 modified files, 2 new files):

  • Agent ID, definition, capabilities, display name, beta badge, context window
  • CLI args: -p\ (batch), --output-format json, --resume, --allow-all, --model\
  • Config options for model selection and context window size

Output parser — verified against actual Copilot CLI JSONL output:

  • 11 event types: \session.*, \�ssistant.message_delta, \�ssistant.message, \ ool.execution_start/complete,
    esult\
  • Streaming text display, tool use tracking, session ID extraction, output token accumulation

Session storage browser — reads ~/.copilot/session-state//:

  • Parses \workspace.yaml\ for metadata (summary, cwd, timestamps)
  • Reads \�vents.jsonl\ for message history
  • Supports pagination, search, and project path filtering

Error patterns — auth failures, rate limiting, network errors, token exhaustion

UI — Added to \SUPPORTED_AGENTS\ and wizard agent tiles (no more 'Coming Soon')

Testing

  • TypeScript compiles cleanly across all 3 configs
  • Maestro launches, detects \copilot\ binary, registers parser
  • Ran \copilot -p ... --output-format json\ to verify JSONL schema matches parser

Follow-up (Phase 2)

\docs/Copilot-CLI-Phase2-Plan.md\ covers remaining parity items: read-only mode, wizard, group chat moderation.

Add copilot-cli agent support with full JSONL output parsing, session
storage browsing, error detection, and UI integration.

Agent definition:
- Binary: copilot, batch mode via -p flag
- JSON output: --output-format json (JSONL)
- Session resume: --resume SESSION-ID
- YOLO mode: --allow-all
- Model selection: --model flag

Output parser (verified against actual CLI output):
- 11 event types: session lifecycle, streaming deltas, tool execution,
  assistant messages, and result with session ID
- Accumulates outputTokens from assistant.message events

Session storage:
- Reads ~/.copilot/session-state/<uuid>/workspace.yaml for metadata
- Parses events.jsonl for message history
- Supports pagination, search, and project filtering

Also includes:
- Error patterns (auth, rate limit, network, token exhaustion)
- UI: added to SUPPORTED_AGENTS and wizard agent tiles
- copilot-instructions.md for Copilot sessions in this repo

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Apr 1, 2026

Greptile Summary

This PR integrates GitHub Copilot CLI (copilot) as a first-class agent in Maestro, following the established patterns used by Claude Code, Codex, OpenCode, and Factory Droid. The integration spans all required layers: agent ID/definition/capabilities, a JSONL output parser, a session storage browser, error-pattern matching, and UI registration (wizard tile + SUPPORTED_AGENTS). A Phase 2 roadmap document is included for future parity work.

Key changes:

  • src/main/parsers/copilot-cli-output-parser.ts — New parser handling 11 Copilot CLI JSONL event types; contains two bugs around accumulatedOutputTokens never resetting between sessions and tokens being silently dropped when the result event lacks a usage field.
  • src/main/storage/copilot-cli-session-storage.ts — New session storage reading ~/.copilot/session-state/<uuid>/; unvalidated sessionId in public methods creates a path traversal risk.
  • src/main/agents/capabilities.ts — Capability flags for Copilot CLI; supportsSessionStorage and supportsResultMessages are set true but the Phase 2 plan parity matrix marks both as Phase 1 unsupported.
  • All shared/registry wiring (agent IDs, metadata, constants, parser index, storage index) is clean and consistent with existing agent patterns.

Confidence Score: 4/5

Safe to merge after fixing the two P1 token-accumulation bugs in the output parser.

Two P1 issues exist in the new output parser: accumulatedOutputTokens is never reset between sessions (singleton parser instance), and the accumulated token count is silently discarded when the result event has no usage field. Both produce incorrect usage stats from the second session onward. The session storage and UI changes are clean. Fixing the two accumulator issues is straightforward and does not require architectural changes.

src/main/parsers/copilot-cli-output-parser.ts — token accumulator reset and unconditional usage reporting; src/main/storage/copilot-cli-session-storage.ts — sessionId validation before path join.

Important Files Changed

Filename Overview
src/main/parsers/copilot-cli-output-parser.ts New JSONL parser for Copilot CLI; two bugs: accumulatedOutputTokens never resets between sessions (singleton parser), and accumulated tokens are silently discarded when the result event lacks a usage field.
src/main/storage/copilot-cli-session-storage.ts New session storage reading ~/.copilot/session-state/; logic is solid but readSessionMessages/getSessionPath accept an unvalidated sessionId that is directly concatenated into a file path, enabling potential path traversal.
src/main/agents/capabilities.ts Copilot CLI capability block added; supportsSessionStorage and supportsResultMessages are both set true but the Phase 2 plan parity matrix marks both as Phase 1 unsupported, requiring clarification.
src/main/agents/definitions.ts Copilot CLI agent definition added with correct CLI flags (-p, --output-format json, --resume, --allow-all, --model) and UI config options for model and context window.
src/main/parsers/error-patterns.ts Copilot CLI error patterns added covering auth failures, rate limiting, network errors, and token exhaustion; patterns are reasonable and registered correctly.
src/renderer/components/Wizard/screens/AgentSelectionScreen.tsx Copilot CLI tile added to the wizard; GRID_ROWS bumped to 3 for 7 items (correct), but the constant is hardcoded rather than derived from tile count.
docs/Copilot-CLI-Phase2-Plan.md Phase 2 roadmap doc; useful, but the parity matrix is already stale — session storage shipped in this PR but is still shown as Phase 1 unsupported.

Sequence Diagram

sequenceDiagram
    participant UI as Renderer (UI)
    participant Main as Main Process
    participant Parser as CopilotCliOutputParser
    participant Storage as CopilotCliSessionStorage
    participant CLI as copilot binary

    UI->>Main: Launch agent (copilot -p "..." --output-format json --allow-all)
    Main->>CLI: spawn()

    CLI-->>Parser: session.tools_updated (JSONL)
    Parser-->>Main: ParsedEvent { type: 'init' }

    CLI-->>Parser: assistant.message_delta (streaming)
    Parser-->>Main: ParsedEvent { type: 'text', isPartial: true }

    CLI-->>Parser: assistant.message (with toolRequests)
    Parser-->>Main: ParsedEvent { type: 'tool_use' }

    CLI-->>Parser: tool.execution_start / tool.execution_complete
    Parser-->>Main: ParsedEvent { type: 'tool_use', toolState }

    CLI-->>Parser: assistant.message (text only)
    Parser-->>Main: ParsedEvent { type: 'result', text }
    Note over Parser: accumulatedOutputTokens += outputTokens

    CLI-->>Parser: result (sessionId, usage)
    Parser-->>Main: ParsedEvent { type: 'usage', sessionId, usage }
    Note over Parser: ⚠ tokens not reset after this point

    Main->>UI: Session complete, sessionId stored

    UI->>Main: Browse past sessions
    Main->>Storage: listSessions(projectPath)
    Storage->>Storage: readdir ~/.copilot/session-state/
    Storage->>Storage: parseWorkspaceYaml per UUID dir
    Storage-->>Main: AgentSessionInfo[]
    Main-->>UI: Session list

    UI->>Main: Resume session (--resume SESSION-ID)
    Main->>CLI: spawn with resumeArgs
Loading

Comments Outside Diff (2)

  1. src/main/storage/copilot-cli-session-storage.ts, line 1304-1311 (link)

    P2 sessionId not validated before path construction

    readSessionMessages (and getSessionPath, getSearchableMessages) construct a file path by directly concatenating the caller-supplied sessionId into the session base directory:

    const sessionDir = path.join(getCopilotSessionDir(), sessionId);

    If sessionId contains path traversal sequences (e.g. ../../../etc/passwd), path.join will resolve them and the code will attempt to read files outside ~/.copilot/session-state/. listSessions generates clean UUIDs from directory reads, but readSessionMessages is a public interface that accepts arbitrary strings. Adding a UUID-format guard before path construction would eliminate this risk:

    if (!/^[0-9a-f-]{36}$/i.test(sessionId)) {
        return { messages: [], total: 0, hasMore: false };
    }
  2. src/renderer/components/Wizard/screens/AgentSelectionScreen.tsx, line 1476-1478 (link)

    P2 Hardcoded GRID_ROWS will need manual bumping for future agents

    The change from GRID_ROWS = 2 to GRID_ROWS = 3 is correct for 7 items, but this constant will need to be bumped again when a future tile is added. Consider deriving it programmatically:

    const GRID_ROWS = Math.ceil(AGENT_TILES.length / GRID_COLS);

Reviews (1): Last reviewed commit: "feat: add GitHub Copilot CLI as first-cl..." | Re-trigger Greptile

readonly agentId: ToolType = 'copilot-cli';

// Accumulate output tokens from assistant.message events for usage reporting
private accumulatedOutputTokens = 0;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 accumulatedOutputTokens never resets between sessions

CopilotCliOutputParser is registered as a singleton (one instance for the lifetime of the process). The accumulatedOutputTokens field is never reset, so after the first Copilot CLI session ends and a second session starts, the token count from session 1 will be added to session 2's tally. Every subsequent session will report an inflated (and incorrect) output-token count.

Other parsers avoid this by reading token values directly from each event rather than accumulating across sessions. The fix is to reset the counter when the top-level result event is emitted:

case 'result': {
    const event: ParsedEvent = {
        type: 'usage',
        sessionId: msg.sessionId,
        raw: msg,
    };

    event.usage = {
        inputTokens: 0,
        outputTokens: this.accumulatedOutputTokens,
    };

    // Reset for next session
    this.accumulatedOutputTokens = 0;

    return event;
}

Comment on lines +262 to +280

// ---- Result (session complete) ----

case 'result': {
const event: ParsedEvent = {
type: 'usage',
sessionId: msg.sessionId,
raw: msg,
};

// Extract usage stats
if (msg.usage) {
event.usage = {
inputTokens: 0, // Copilot CLI doesn't report input tokens
outputTokens: this.accumulatedOutputTokens,
// No per-token cost — Copilot uses premium requests model
};
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Accumulated tokens silently dropped when result event has no usage field

accumulatedOutputTokens is only written into event.usage when msg.usage is truthy. If the result event is missing the top-level usage object (network truncation, CLI version difference, empty session, etc.), the entire per-turn accumulation is discarded and the caller receives a ParsedEvent with type: 'usage' but no event.usage set.

Because extractUsage returns event.usage || null, the caller will receive null, meaning all output-token data for the session is silently lost. Report the accumulated tokens unconditionally:

case 'result': {
    const event: ParsedEvent = {
        type: 'usage',
        sessionId: msg.sessionId,
        raw: msg,
    };

    // Report accumulated output tokens regardless of whether msg.usage exists
    event.usage = {
        inputTokens: 0,
        outputTokens: this.accumulatedOutputTokens,
    };

    this.accumulatedOutputTokens = 0;
    return event;
}

Comment on lines +396 to +413
*
* Phase 1 capabilities are conservative — advanced features (session storage
* browsing, cost tracking, thinking display) will be enabled in Phase 2
* after verifying the JSON output schema.
*/
'copilot-cli': {
supportsResume: true, // --resume SESSION-ID
supportsReadOnlyMode: false, // No explicit CLI flag; may use --deny-tool in future
supportsJsonOutput: true, // --output-format json (JSONL format)
supportsSessionId: true, // sessionId in 'result' event - Verified
supportsImageInput: false, // Not documented in CLI reference
supportsImageInputOnResume: false,
supportsSlashCommands: true, // /help, /compact, /model, /resume, /usage, etc.
supportsSessionStorage: true, // ~/.copilot/session-state/<uuid>/ - Verified
supportsCostTracking: false, // Uses premium requests model, not per-token cost
supportsUsageStats: true, // outputTokens in assistant.message events - Verified
supportsBatchMode: true, // -p flag for programmatic execution
requiresPromptToStart: true, // Requires -p prompt in batch mode
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Capabilities contradict the Phase 2 parity matrix

supportsSessionStorage: true and supportsResultMessages: true are set here, but the parity matrix in docs/Copilot-CLI-Phase2-Plan.md explicitly marks both as for "Copilot CLI (Phase 1)".

supportsSessionStorage is fine because CopilotCliSessionStorage is actually implemented in this PR — the plan doc is simply stale on that point. However supportsResultMessages needs a second look: the plan doc marks it as unsupported in Phase 1 and lists it as a Phase 2 TODO. If it truly is unsupported, the flag should remain false to avoid enabling Auto Run prematurely. If it is supported as implemented, the plan doc should be updated to reflect that.

dwizzzle and others added 3 commits March 31, 2026 20:37
41 tests covering all 11 JSONL event types verified against actual
copilot CLI output. Includes:

- Session lifecycle: mcp_server_status_changed, mcp_servers_loaded, tools_updated
- Conversation: user.message, assistant.turn_start/end, message_delta, message
- Tool execution: tool.execution_start, tool.execution_complete
- Completion: result with sessionId and usage
- Error detection: auth, rate limit, network, exit codes
- End-to-end: full session simulations (simple + tool use)
- Edge cases: empty deltas, long output truncation, token accumulation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ssues

On Windows, long prompts passed as PowerShell CLI args get garbled due
to escaping issues with special characters. Changed copilot-cli to use
'-p -' (read from stdin) and sendPromptViaStdinRaw=true.

Also added sendPromptViaStdinRaw as an agent definition field so other
agents can opt into stdin-based prompt delivery.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When an output parser is registered (JSONL agents like copilot-cli, codex,
opencode, factory-droid), non-JSON lines from stdout are now suppressed
instead of being displayed to the user as raw text. This prevents
PowerShell profile banners and MCP server startup messages from cluttering
the agent output.

Only agents WITHOUT an output parser (terminal, legacy mode) continue to
emit raw non-JSON lines, which is correct for terminal sessions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@nolanmclark
Copy link
Copy Markdown

Had a similar PR if you wanted to build on it or take anything from it. :) #566

dwizzzle and others added 5 commits April 1, 2026 08:00
The promptArgs function was being skipped when sendPromptViaStdinRaw=true
because the spawner's promptViaStdin guard prevents adding prompt args.
But '-p -' isn't prompt text — it's a flag telling copilot to read stdin.
Moved to batchModeArgs so it's always present in batch mode.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
copilot -p - means 'prompt is the literal dash character', not 'read
from stdin'. When stdin is piped (sendPromptViaStdinRaw), copilot reads
it automatically without -p. Removed -p - from batchModeArgs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot CLI outputs MCP server startup messages, PowerShell profile
banners, and initialization noise to stderr. These were being displayed
via the onStderr renderer handler. Now suppressed for all JSONL agents
with output parsers (except Codex which has special stderr handling).

Error detection still runs first, so real errors are captured.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two fixes for the raw JSONL display issue:

1. detectErrorFromParsed now handles session.error events (Copilot CLI
   format: data.message, data.errorType). Previously, session.error
   wasn't caught because it has no top-level 'error' field, causing
   the error to fall through to detectErrorFromExit which dumps the
   full stdoutBuffer in raw.

2. detectErrorFromExit no longer includes stderr/stdout in raw — for
   JSONL agents the stdoutBuffer contains all parsed JSON events which
   is noise, not useful error context.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The root cause of raw JSONL display: copilot-cli's --output-format json
args didn't match the isStreamJsonMode heuristic check, so the process
was treated as batch-JSON (single JSON blob) instead of streaming JSONL.
On exit, handleBatchModeExit tried to JSON.parse the entire buffer,
failed, and dumped the raw content to the display.

Fixed by adding outputParser presence as a signal for stream-json mode.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dwizzzle dwizzzle force-pushed the feat/copilot-cli-agent branch from 82f1fa8 to be67a81 Compare April 1, 2026 18:41
P1: Reset accumulatedOutputTokens on result event (singleton parser
was carrying tokens across sessions). Report tokens unconditionally
even when result event has no usage field.

P2: Add UUID validation to session storage public methods to prevent
path traversal. Update Phase 2 parity matrix to reflect shipped
features. Derive GRID_ROWS from AGENT_TILES.length.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dwizzzle
Copy link
Copy Markdown
Author

dwizzzle commented Apr 1, 2026

Review Feedback Addressed (commit 05ae36d)

Thanks for the thorough review. All 5 issues fixed:

P1 - Token accumulator bugs (both fixed)

  1. accumulatedOutputTokens never resets - Now resets to 0 after the result event. The parser is a singleton so this was carrying tokens across sessions.

  2. Tokens silently dropped when result has no usage field - Usage stats are now reported unconditionally on result, regardless of whether msg.usage exists.

P2 - Session storage, docs, UI (all fixed)

  1. Path traversal in session storage - Added UUID format validation to readSessionMessages, getSessionPath, and getSearchableMessages before path construction.

  2. Capabilities vs Phase 2 parity matrix - Updated docs/Copilot-CLI-Phase2-Plan.md to mark session storage, result messages, and usage stats as shipped.

  3. Hardcoded GRID_ROWS - Changed to Math.ceil(AGENT_TILES.length / GRID_COLS) so it auto-adjusts when future agents are added.

Tests updated to cover token reset behavior and unconditional usage reporting. All 41 tests pass.

@pedramamini
Copy link
Copy Markdown
Collaborator

@coderabbitai analyze this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants