Skip to content

[Bugs] agency_mode per-request spawn times out with many MCP servers (~2.5s deadline too tight) #1012

@dantebarbieri

Description

@dantebarbieri

Mood: 😔
Category: Bugs

Summary

When agency_mode is enabled (server-side feature flag), the Copilot App spawns a fresh agency.exe process for every request (list_models, list_global_agents, get_session_state, telemetry, etc.). Each spawn must initialize the Copilot SDK and establish connections to all configured MCP servers before a hard ~2.5-second deadline. With 14 MCP servers configured in ~/.copilot/mcp-config.json, initialization consistently exceeds this deadline, causing every request to fail with "request cancelled". The app enters a retry loop and never recovers.

The Copilot CLI reads the same mcp-config.json and works perfectly — it initializes once as a long-lived process (takes ~10-15s on first launch) and stays ready.

Related Issues

Environment

Field Value
App version 0.2.32
OS Windows 10.0.26100
Theme GitHub
Path /chat
Tenure Week 4

MCP Configuration (14 servers total)

  • 10× type: "http" — pre-hosted MCP servers on localhost (already running, only need TCP handshake)
  • type: "local" — spawned via npx/pnpm
  • type: "http" — remote endpoint

Reproduction Steps

  1. Configure ~/.copilot/mcp-config.json with 14 MCP servers (mix of http and local types)
  2. Launch the Copilot App (v0.2.32)
  3. Observe the app UI shows errors / never loads models or agents

Expected Behavior

App initializes and connects to all configured MCPs (as the CLI does successfully with the same config).

Actual Behavior

Every agency.exe spawn times out at ~2.5s. The app enters a retry loop:

01:28:19.816 INFO  agency_mode feature flag enabled; spawning copilot via agency wrapper
01:28:22.299 WARN  Failed to initialize runtime telemetry client attempt=1 error=request cancelled
01:28:22.315 WARN  resume_session failed (retry 1/3), retrying in 1s: request cancelled
01:28:22.325 ERROR failed to list models error=request cancelled
01:28:25.296 WARN  resume_session failed (retry 2/3), retrying in 2s: request cancelled
01:28:29.720 WARN  resume_session failed (retry 3/3), retrying in 4s: request cancelled
01:28:36.442 ERROR getSessionState: failed error=request cancelled

Timing: spawn at 19.816, first cancel at 22.299 = 2.48 seconds hard deadline. The pattern repeats for every request type — each spawning a new agency.exe that independently times out.

Workaround

Reducing mcp-config.json to ≤4 servers allows the app to start successfully — agency.exe can initialize within the deadline with fewer MCPs.

Suggested Fixes

  1. Keep agency alive between requests — spawn once, reuse for subsequent requests (like the CLI architecture). This is the root architectural difference.
  2. Increase the deadline — 2.5s is too tight for power users with many MCPs. Even pre-hosted HTTP MCPs need connection handshakes.
  3. Lazy-init MCP connections — don't block the initial response on all MCP connections being established. Initialize MCPs in the background.
  4. Separate MCP init from request readiness — allow basic requests (list_models, telemetry) to succeed before all MCPs are connected.

Logs

Full log available at ~/.copilot/logs/github-app.{pid}.log. The pattern shows 10+ agency.exe spawns in 40 seconds, all failing identically with the same ~2.5s timeout.


Field Value
App version 0.2.32
OS Windows 10.0.26100
Path /chat

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions