fix(cli): close litellm async clients after each compile/index (CLOSE-WAIT/FD leak)#91
Open
cnndabbler wants to merge 1 commit into
Open
fix(cli): close litellm async clients after each compile/index (CLOSE-WAIT/FD leak)#91cnndabbler wants to merge 1 commit into
cnndabbler wants to merge 1 commit into
Conversation
litellm caches aiohttp clients per event loop. add_single_file runs each doc via a fresh asyncio.run() loop, so the previous loop's clients are abandoned and their HTTP connections linger in CLOSE-WAIT, accumulating sockets/FDs over a long ingest (observed 200+ against a remote API on a 165-doc run). Add _close_litellm_async_clients() (best-effort, never raises) and call it in try/finally around index_long_document and both compile_short_doc / compile_long_doc calls. Verified: CLOSE-WAIT returns to ~0 after each doc. Supersedes the now-stale VectifyAI#44 (which carried the same intent on an old base).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
add_single_filecompiles each document in a freshasyncio.run()event loop. LiteLLM caches its async (aiohttp) clients per event loop, so when each loop ends the previous doc's clients are abandoned without being closed. Their HTTP connections sit inCLOSE-WAITand accumulate sockets/file descriptors across a long ingest.Observed on a 165-document ingest against a remote API: the process held 200+ sockets in CLOSE-WAIT, climbing per doc. (On a box with a low
ulimit -nthis would eventually exhaust FDs and start failing compilations.)Fix
Add a best-effort
_close_litellm_async_clients()(calls litellm's ownclose_litellm_async_clients(), never raises) and invoke it intry/finallyaround the three async entry points inadd_single_file:index_long_document(...)asyncio.run(compile_long_doc(...))asyncio.run(compile_short_doc(...))So cached clients are closed after each doc whether it succeeds or fails.
Verification
Added a doc end-to-end after the change:
CLOSE-WAITreturns to ~0 after each doc instead of accumulating. Updatedtest_add_short_doc_runs_compiler(the compile path now drivesasyncio.runfor both the compile and the cleanup, so it asserts thecompile_short_doccoroutine was run rather than thatasyncio.runwas called exactly once).Relation to #44
#44 carried this same intent but is now ~23 commits behind
mainand conflicts with the currentindexer.py/cli.py. This is a minimal reimplementation on currentmain, so it supersedes #44.