Skip to content

KubeConfig.createAgent() creates a new https.Agent per request, causing socket/FD leaks on watch reconnection #2893

@okhowang

Description

@okhowang

Description

KubeConfig.createAgent() is called on every invocation of applyToFetchOptions() (used by Watch.watch()) and applySecurityAuthentication() (used by generated API clients like DiscoveryV1Api). Each call creates a new https.Agent instance, but the old agents are never destroyed or reused.

When using Watch with automatic reconnection (the typical pattern — the done callback triggers a new watch() call), every reconnection cycle creates:

  1. A new https.Agent via the Watch.watch()applyToFetchOptions()createAgent() path

Over time, these orphaned agents and their sockets accumulate, causing file descriptor exhaustion.

Root Cause

In config.ts, createAgent() unconditionally creates a new agent:

createAgent(cluster: Cluster, agentOptions: AgentOptions): Agent {
    // ...
    agent = new https.Agent(agentOptions);  // ← new Agent every time
    // ...
    return agent;
}

This is called from two paths:

Path 1 — Watch.watch():

Watch.watch() → this.config.applyToFetchOptions({}) → this.config.applyToHTTPSOptions(opts) → opts.agent = this.createAgent(cluster, agentOptions)

Path 2 — Generated API clients:

SomeApi.method() → makeRequestContext() → applySecurityAuthentication() → context.setAgent(this.createAgent(cluster, agentOptions))

Every watch() reconnection and every API call creates a fresh https.Agent with its own socket pool.

Why the old agents are not GC'd promptly

In theory, once an https.Agent is no longer referenced, it should be garbage collected. In practice, the node-fetch response body stream holds a reference chain that delays GC:

response.body stream
  → event listeners (on 'close', 'finish', 'error')
    → doneCallOnce closure
      → AbortController
        → AbortSignal
          → requestInit
            → agent

Until node-fetch fully destroys the response body stream, the agent remains reachable. Additionally, even after the agent is GC'd, the underlying OS sockets may remain in TIME_WAIT state (typically 60s), keeping file descriptors open.

Impact

Programs that use Watch with automatic reconnection will leak sockets and file descriptors. This manifests as:

  • Steadily growing FD count (observable via lsof -p <pid> | wc -l)
  • Eventually EMFILE errors ("too many open files")
  • Process instability or crash

The leak rate depends on how often the watch reconnects (server-side timeouts, network disruptions, etc.).

Reproduction

import * as k8s from '@kubernetes/client-node';

const kc = new k8s.KubeConfig();
kc.loadFromDefault();
const watch = new k8s.Watch(kc);
const ns = kc.contexts[0].namespace;

async function startWatch() {
  await watch.watch(
    `/apis/discovery.k8s.io/v1/namespaces/${ns}/endpointslices`,
    { resourceVersion: undefined },
    (type, obj) => {},
    (err) => {
      // Reconnect — each cycle creates a new https.Agent
      setTimeout(startWatch, 1000);
    },
  );
}

startWatch();

// Monitor FD count:
setInterval(() => {
  const fdCount = require('fs').readdirSync('/proc/self/fd').length;
  console.log('open FDs:', fdCount);
}, 5000);

Expected: FD count stays roughly constant after initial stabilization.
Actual: FD count grows continuously with each watch reconnection.

Suggested Fix

Option A — Cache and reuse the agent within KubeConfig:

class KubeConfig {
    private cachedAgent: Agent | undefined;

    createAgent(cluster: Cluster, agentOptions: AgentOptions): Agent {
        if (this.cachedAgent) {
            return this.cachedAgent;
        }
        this.cachedAgent = this._createAgent(cluster, agentOptions);
        return this.cachedAgent;
    }
}

This is the simplest fix and aligns with how https.Agent is designed to be used — as a shared connection pool.

Option B — Allow users to provide a custom agent via Watch and API client options, so consumers can manage agent lifecycle themselves.

Workaround

Consumers can override createAgent on the KubeConfig instance to return a cached agent:

const kc = new k8s.KubeConfig();
kc.loadFromDefault();

const agent = (kc as any).createAgent(kc.getCurrentCluster(), {}) as https.Agent;
(kc as any).createAgent = () => agent;

const watch = new k8s.Watch(kc);
const api = kc.makeApiClient(k8s.DiscoveryV1Api);

This works but requires casting because createAgent is private.

Environment

  • @kubernetes/client-node: 1.3.0
  • Node.js: 22.x
  • OS: Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions