Skip to content

fix(cli): close litellm async clients after each compile/index (CLOSE-WAIT/FD leak)#91

Open
cnndabbler wants to merge 1 commit into
VectifyAI:mainfrom
cnndabbler:fix/litellm-async-client-leak
Open

fix(cli): close litellm async clients after each compile/index (CLOSE-WAIT/FD leak)#91
cnndabbler wants to merge 1 commit into
VectifyAI:mainfrom
cnndabbler:fix/litellm-async-client-leak

Conversation

@cnndabbler

Copy link
Copy Markdown

Problem

add_single_file compiles each document in a fresh asyncio.run() event loop. LiteLLM caches its async (aiohttp) clients per event loop, so when each loop ends the previous doc's clients are abandoned without being closed. Their HTTP connections sit in CLOSE-WAIT and accumulate sockets/file descriptors across a long ingest.

Observed on a 165-document ingest against a remote API: the process held 200+ sockets in CLOSE-WAIT, climbing per doc. (On a box with a low ulimit -n this would eventually exhaust FDs and start failing compilations.)

Fix

Add a best-effort _close_litellm_async_clients() (calls litellm's own close_litellm_async_clients(), never raises) and invoke it in try/finally around the three async entry points in add_single_file:

  • index_long_document(...)
  • asyncio.run(compile_long_doc(...))
  • asyncio.run(compile_short_doc(...))

So cached clients are closed after each doc whether it succeeds or fails.

Verification

Added a doc end-to-end after the change: CLOSE-WAIT returns to ~0 after each doc instead of accumulating. Updated test_add_short_doc_runs_compiler (the compile path now drives asyncio.run for both the compile and the cleanup, so it asserts the compile_short_doc coroutine was run rather than that asyncio.run was called exactly once).

Relation to #44

#44 carried this same intent but is now ~23 commits behind main and conflicts with the current indexer.py/cli.py. This is a minimal reimplementation on current main, so it supersedes #44.

litellm caches aiohttp clients per event loop. add_single_file runs each doc
via a fresh asyncio.run() loop, so the previous loop's clients are abandoned
and their HTTP connections linger in CLOSE-WAIT, accumulating sockets/FDs over
a long ingest (observed 200+ against a remote API on a 165-doc run).

Add _close_litellm_async_clients() (best-effort, never raises) and call it in
try/finally around index_long_document and both compile_short_doc /
compile_long_doc calls. Verified: CLOSE-WAIT returns to ~0 after each doc.

Supersedes the now-stale VectifyAI#44 (which carried the same intent on an old base).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant