Description
KubeConfig.createAgent() is called on every invocation of applyToFetchOptions() (used by Watch.watch()) and applySecurityAuthentication() (used by generated API clients like DiscoveryV1Api). Each call creates a new https.Agent instance, but the old agents are never destroyed or reused.
When using Watch with automatic reconnection (the typical pattern — the done callback triggers a new watch() call), every reconnection cycle creates:
- A new
https.Agent via the Watch.watch() → applyToFetchOptions() → createAgent() path
Over time, these orphaned agents and their sockets accumulate, causing file descriptor exhaustion.
Root Cause
In config.ts, createAgent() unconditionally creates a new agent:
createAgent(cluster: Cluster, agentOptions: AgentOptions): Agent {
// ...
agent = new https.Agent(agentOptions); // ← new Agent every time
// ...
return agent;
}
This is called from two paths:
Path 1 — Watch.watch():
Watch.watch() → this.config.applyToFetchOptions({}) → this.config.applyToHTTPSOptions(opts) → opts.agent = this.createAgent(cluster, agentOptions)
Path 2 — Generated API clients:
SomeApi.method() → makeRequestContext() → applySecurityAuthentication() → context.setAgent(this.createAgent(cluster, agentOptions))
Every watch() reconnection and every API call creates a fresh https.Agent with its own socket pool.
Why the old agents are not GC'd promptly
In theory, once an https.Agent is no longer referenced, it should be garbage collected. In practice, the node-fetch response body stream holds a reference chain that delays GC:
response.body stream
→ event listeners (on 'close', 'finish', 'error')
→ doneCallOnce closure
→ AbortController
→ AbortSignal
→ requestInit
→ agent
Until node-fetch fully destroys the response body stream, the agent remains reachable. Additionally, even after the agent is GC'd, the underlying OS sockets may remain in TIME_WAIT state (typically 60s), keeping file descriptors open.
Impact
Programs that use Watch with automatic reconnection will leak sockets and file descriptors. This manifests as:
- Steadily growing FD count (observable via
lsof -p <pid> | wc -l)
- Eventually
EMFILE errors ("too many open files")
- Process instability or crash
The leak rate depends on how often the watch reconnects (server-side timeouts, network disruptions, etc.).
Reproduction
import * as k8s from '@kubernetes/client-node';
const kc = new k8s.KubeConfig();
kc.loadFromDefault();
const watch = new k8s.Watch(kc);
const ns = kc.contexts[0].namespace;
async function startWatch() {
await watch.watch(
`/apis/discovery.k8s.io/v1/namespaces/${ns}/endpointslices`,
{ resourceVersion: undefined },
(type, obj) => {},
(err) => {
// Reconnect — each cycle creates a new https.Agent
setTimeout(startWatch, 1000);
},
);
}
startWatch();
// Monitor FD count:
setInterval(() => {
const fdCount = require('fs').readdirSync('/proc/self/fd').length;
console.log('open FDs:', fdCount);
}, 5000);
Expected: FD count stays roughly constant after initial stabilization.
Actual: FD count grows continuously with each watch reconnection.
Suggested Fix
Option A — Cache and reuse the agent within KubeConfig:
class KubeConfig {
private cachedAgent: Agent | undefined;
createAgent(cluster: Cluster, agentOptions: AgentOptions): Agent {
if (this.cachedAgent) {
return this.cachedAgent;
}
this.cachedAgent = this._createAgent(cluster, agentOptions);
return this.cachedAgent;
}
}
This is the simplest fix and aligns with how https.Agent is designed to be used — as a shared connection pool.
Option B — Allow users to provide a custom agent via Watch and API client options, so consumers can manage agent lifecycle themselves.
Workaround
Consumers can override createAgent on the KubeConfig instance to return a cached agent:
const kc = new k8s.KubeConfig();
kc.loadFromDefault();
const agent = (kc as any).createAgent(kc.getCurrentCluster(), {}) as https.Agent;
(kc as any).createAgent = () => agent;
const watch = new k8s.Watch(kc);
const api = kc.makeApiClient(k8s.DiscoveryV1Api);
This works but requires casting because createAgent is private.
Environment
@kubernetes/client-node: 1.3.0
- Node.js: 22.x
- OS: Linux
Description
KubeConfig.createAgent()is called on every invocation ofapplyToFetchOptions()(used byWatch.watch()) andapplySecurityAuthentication()(used by generated API clients likeDiscoveryV1Api). Each call creates a newhttps.Agentinstance, but the old agents are never destroyed or reused.When using
Watchwith automatic reconnection (the typical pattern — thedonecallback triggers a newwatch()call), every reconnection cycle creates:https.Agentvia theWatch.watch()→applyToFetchOptions()→createAgent()pathOver time, these orphaned agents and their sockets accumulate, causing file descriptor exhaustion.
Root Cause
In
config.ts,createAgent()unconditionally creates a new agent:This is called from two paths:
Path 1 —
Watch.watch():Path 2 — Generated API clients:
Every
watch()reconnection and every API call creates a freshhttps.Agentwith its own socket pool.Why the old agents are not GC'd promptly
In theory, once an
https.Agentis no longer referenced, it should be garbage collected. In practice, thenode-fetchresponse body stream holds a reference chain that delays GC:Until
node-fetchfully destroys the response body stream, the agent remains reachable. Additionally, even after the agent is GC'd, the underlying OS sockets may remain inTIME_WAITstate (typically 60s), keeping file descriptors open.Impact
Programs that use
Watchwith automatic reconnection will leak sockets and file descriptors. This manifests as:lsof -p <pid> | wc -l)EMFILEerrors ("too many open files")The leak rate depends on how often the watch reconnects (server-side timeouts, network disruptions, etc.).
Reproduction
Expected: FD count stays roughly constant after initial stabilization.
Actual: FD count grows continuously with each watch reconnection.
Suggested Fix
Option A — Cache and reuse the agent within
KubeConfig:This is the simplest fix and aligns with how
https.Agentis designed to be used — as a shared connection pool.Option B — Allow users to provide a custom agent via
Watchand API client options, so consumers can manage agent lifecycle themselves.Workaround
Consumers can override
createAgenton theKubeConfiginstance to return a cached agent:This works but requires casting because
createAgentisprivate.Environment
@kubernetes/client-node: 1.3.0