[Bug] MCP server connect leak: stuck stdio servers spawn unbounded child processes (CPU/lag)

### Describe the bug

When a configured stdio MCP server is slow or its upstream is unreachable, Copilot CLI spawns its child process but never reaps it, then repeatedly re-spawns on restart/reconnect. Child processes accumulate without bound, pinning CPU and degrading the whole machine. Observed in an autopilot session: a single `copilot.exe` had spawned 180+ children and was still climbing; the system reached 1,135 total processes. Killing the process tree dropped it to 416 processes and CPU from 66% to 12%.

### Affected version

CLI 1.0.59

### Steps to reproduce the behavior

1. Configure a stdio MCP server whose upstream is slow/unreachable but still emits progress during `initialize`/`tools/list` (e.g. a kusto-style proxy pointed at an unreachable cluster).
2. Run a long autopilot session so restart/reconnect fires repeatedly.
3. Watch child process count of `copilot.exe` climb without bound; CPU rises and the machine lags.

### Expected behavior

(That doesn't happen)

### Additional context

Diagnosed the bug through a separate CLI session, here are the results/suggested fix:

**Root cause (from source, `src/mcp-client/`)**
- `mcp-registry.ts` `getConnectOptions()` → `resetTimeoutOnProgress: true` with **no `maxTotalTimeout`**: progress keeps resetting the per-request timer, so `client.connect()` never rejects and the SDK's reap-on-failure (`void this.close()`) never fires.
- `mcp-host.ts` `isServerRunning()` returns `name in transports && name in clients`: a server stuck mid-connect has a transport (+ child) but no client → reports "not running" → `startServer` skips `stopServer` → next start overwrites the transport ref and orphans the prior child.
- `mcp-registry.ts` reconnect (`attemptReconnect`) is `isRemote`-only; local stdio servers get raw re-spawn with no bounded backoff.

**Suggested fix**
- Add a `maxTotalTimeout` to `getConnectOptions()` so a stalled connect eventually rejects and the child is reaped.
- Treat pending/stuck connections as reapable: close the transport (kill the child) for a server that is mid-connect before re-spawning.
- Apply bounded backoff to local stdio reconnects, not just remote.

OS: Windows (AMD EPYC 7763, 16 cores, 64 GB). CLI version: 1.0.59 (latest 1.0.60). Full per-spawn logs available: 216 kusto / 118 msft-learn / 114 icm proxy spawns in a single session; only ~79/216 logged shutdown. Happy to attach session logs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] MCP server connect leak: stuck stdio servers spawn unbounded child processes (CPU/lag) #3698

Describe the bug

Affected version

Steps to reproduce the behavior

Expected behavior

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug] MCP server connect leak: stuck stdio servers spawn unbounded child processes (CPU/lag) #3698

Description

Describe the bug

Affected version

Steps to reproduce the behavior

Expected behavior

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions