Describe the bug
When a configured stdio MCP server is slow or its upstream is unreachable, Copilot CLI spawns its child process but never reaps it, then repeatedly re-spawns on restart/reconnect. Child processes accumulate without bound, pinning CPU and degrading the whole machine. Observed in an autopilot session: a single copilot.exe had spawned 180+ children and was still climbing; the system reached 1,135 total processes. Killing the process tree dropped it to 416 processes and CPU from 66% to 12%.
Affected version
CLI 1.0.59
Steps to reproduce the behavior
- Configure a stdio MCP server whose upstream is slow/unreachable but still emits progress during
initialize/tools/list (e.g. a kusto-style proxy pointed at an unreachable cluster).
- Run a long autopilot session so restart/reconnect fires repeatedly.
- Watch child process count of
copilot.exe climb without bound; CPU rises and the machine lags.
Expected behavior
(That doesn't happen)
Additional context
Diagnosed the bug through a separate CLI session, here are the results/suggested fix:
Root cause (from source, src/mcp-client/)
mcp-registry.ts getConnectOptions() → resetTimeoutOnProgress: true with no maxTotalTimeout: progress keeps resetting the per-request timer, so client.connect() never rejects and the SDK's reap-on-failure (void this.close()) never fires.
mcp-host.ts isServerRunning() returns name in transports && name in clients: a server stuck mid-connect has a transport (+ child) but no client → reports "not running" → startServer skips stopServer → next start overwrites the transport ref and orphans the prior child.
mcp-registry.ts reconnect (attemptReconnect) is isRemote-only; local stdio servers get raw re-spawn with no bounded backoff.
Suggested fix
- Add a
maxTotalTimeout to getConnectOptions() so a stalled connect eventually rejects and the child is reaped.
- Treat pending/stuck connections as reapable: close the transport (kill the child) for a server that is mid-connect before re-spawning.
- Apply bounded backoff to local stdio reconnects, not just remote.
OS: Windows (AMD EPYC 7763, 16 cores, 64 GB). CLI version: 1.0.59 (latest 1.0.60). Full per-spawn logs available: 216 kusto / 118 msft-learn / 114 icm proxy spawns in a single session; only ~79/216 logged shutdown. Happy to attach session logs.
Describe the bug
When a configured stdio MCP server is slow or its upstream is unreachable, Copilot CLI spawns its child process but never reaps it, then repeatedly re-spawns on restart/reconnect. Child processes accumulate without bound, pinning CPU and degrading the whole machine. Observed in an autopilot session: a single
copilot.exehad spawned 180+ children and was still climbing; the system reached 1,135 total processes. Killing the process tree dropped it to 416 processes and CPU from 66% to 12%.Affected version
CLI 1.0.59
Steps to reproduce the behavior
initialize/tools/list(e.g. a kusto-style proxy pointed at an unreachable cluster).copilot.execlimb without bound; CPU rises and the machine lags.Expected behavior
(That doesn't happen)
Additional context
Diagnosed the bug through a separate CLI session, here are the results/suggested fix:
Root cause (from source,
src/mcp-client/)mcp-registry.tsgetConnectOptions()→resetTimeoutOnProgress: truewith nomaxTotalTimeout: progress keeps resetting the per-request timer, soclient.connect()never rejects and the SDK's reap-on-failure (void this.close()) never fires.mcp-host.tsisServerRunning()returnsname in transports && name in clients: a server stuck mid-connect has a transport (+ child) but no client → reports "not running" →startServerskipsstopServer→ next start overwrites the transport ref and orphans the prior child.mcp-registry.tsreconnect (attemptReconnect) isisRemote-only; local stdio servers get raw re-spawn with no bounded backoff.Suggested fix
maxTotalTimeouttogetConnectOptions()so a stalled connect eventually rejects and the child is reaped.OS: Windows (AMD EPYC 7763, 16 cores, 64 GB). CLI version: 1.0.59 (latest 1.0.60). Full per-spawn logs available: 216 kusto / 118 msft-learn / 114 icm proxy spawns in a single session; only ~79/216 logged shutdown. Happy to attach session logs.