refactor(backend): split LLM provider routing seams by Git-on-my-level · Pull Request #7969 · BasedHardware/omi

Git-on-my-level · 2026-06-15T15:40:26Z

Summary

Refactors the backend LLM provider/model routing internals while preserving the existing get_llm(feature) production API.

Moves feature/profile model routing into utils/llm/model_config.py.
Moves provider-specific chat model construction into utils/llm/providers.py.
Adds a generic OpenAI-compatible provider constructor (base_url + api_key_env + model) so new compatible providers can be plugged in without touching feature code.
Keeps clients.py as a compatibility facade for get_llm, get_model, get_provider, get_qos_info, and legacy tested private symbols.
Extracts conv_folder schema/context/validation/assignment into utils/llm/conversation_folder.py so route-specific safety logic is reusable and easier to reason about.
Adds unit coverage for provider construction, route options, and folder-assignment validation.

Why

DD-009 concluded that the maintainable production seam should be:

feature callsite
  → get_llm(feature)
  → model/profile config
  → provider factory
  → BaseChatModel

This makes plugging in a new OpenAI-compatible provider/model mostly a config/factory change, while keeping product callsites clean and avoiding a premature TaskContract / ModelStrategy framework.

Cost impact

No immediate traffic or model changes — $0/mo direct savings. This is infrastructure for DD-009: it lowers the engineering cost of routing selected OpenAI/Anthropic workloads to cheaper providers after external benchmarking validates quality and reliability.

User / production impact

No intended behavior change.
Existing production callsites continue using get_llm(feature).
Existing BYOK routing semantics are preserved, including Gemini BYOK routing through the Gemini OpenAI-compatible endpoint and OpenRouter Gemini BYOK rerouting.

Validation

python3 -m compileall passes for all new/modified files
96 passed (new + existing BYOK tests), 4 pre-existing warnings
git diff --check clean
Independent subagent review: caught compatibility regression on _GEMINI_OPENAI_BASE_URL export; fixed with alias. Final: APPROVED

Rollback

Revert this PR. Internal refactor with no data migration or traffic-routing change.

greptile-apps · 2026-06-15T15:46:46Z

Greptile Summary

This PR extracts the LLM provider/model routing internals from the monolithic clients.py into three focused modules — model_config.py (feature→model/provider mapping), providers.py (ChatOpenAI/Gemini factory + cache), and conversation_folder.py (folder-assignment route with validation) — while keeping the existing get_llm(feature) callsite API unchanged.

model_config.py becomes the single source of truth for QoS profiles, pinned features, route options (thinking_budget, temperature, extra_body), and the predicate helpers (is_anthropic_only_feature, supports_prompt_cache, etc.) previously scattered in clients.py.
providers.py introduces OpenAICompatibleProviderConfig and a registry-based factory (get_or_create_openai_compatible_llm) so new OpenAI-compatible backends can be plugged in by adding a config entry rather than touching routing code.
clients.py is slimmed down to a facade that re-exports all legacy symbols for backward compatibility, with BYOK resolution and startup logging intact.

Confidence Score: 4/5

Safe to merge — no production routing semantics changed and the full BYOK path is preserved end-to-end.

The refactor is a clean structural split with no behavior change to get_llm, BYOK resolution, Gemini native routing, or the OpenRouter Gemini reroute logic. The test suite validates facade compatibility and the new seams. The only things worth a follow-up are a dead local variable in get_qos_info, two unreachable entries in _OPENROUTER_TEMPERATURES, and a latent gap where unknown keys in the options dict feed the cache key but are silently dropped from the actual LLM construction — none of these affect production behavior today.

providers.py (get_or_create_openai_compatible_llm) deserves a second look if new option keys are added to get_route_options in a follow-on PR, to make sure each new key is also handled in the construction block.

Important Files Changed

Filename	Overview
backend/utils/llm/model_config.py	New module consolidating all feature→(model, provider) routing, profile management, and route-option helpers; two entries in `_OPENROUTER_TEMPERATURES` are dead because `persona_chat`/`persona_chat_premium` are always `openai`-routed.
backend/utils/llm/providers.py	New provider factory module with `get_or_create_openai_compatible_llm`, `get_or_create_gemini_llm`, and `get_default_client`; options dict is only partially consumed during LLM construction while all keys enter the cache key, creating a silent-drop footgun for future option additions.
backend/utils/llm/clients.py	Reduced to a compatibility facade; `get_llm`, BYOK routing, and startup logging are preserved; `active_profile` in `get_qos_info` is an unused local variable introduced during the refactor.
backend/utils/llm/conversation_folder.py	New module extracting folder-assignment LLM route, validation, and schema from `conversation_processing.py`; logic is identical to what was removed, no functional changes.
backend/utils/llm/conversation_processing.py	Folder-assignment code removed and replaced with imports from `conversation_folder.py`; no logic changes.
backend/tests/unit/test_llm_provider_plugin_structure.py	New unit tests covering cache isolation, provider construction kwargs, route options, and folder-assignment validation; coverage is appropriate for the new seams.

Sequence Diagram

sequenceDiagram
    participant CS as Feature Callsite
    participant CL as clients.get_llm(feature)
    participant MC as model_config<br/>_get_model_config()
    participant RO as model_config<br/>get_route_options()
    participant BYOK as BYOK resolver
    participant PR as providers<br/>get_default_client()
    participant OAI as get_or_create_openai_compatible_llm()
    participant GEM as get_or_create_gemini_llm()

    CS->>CL: get_llm("conv_action_items")
    CL->>MC: _get_model_config(feature)
    MC-->>CL: (model, provider)
    CL->>BYOK: get_byok_key(provider)
    BYOK-->>CL: byok_key or None
    CL->>RO: get_route_options(feature, model, provider)
    RO-->>CL: "{temperature, extra_body, thinking_budget}"
    alt BYOK key present
        CL->>CL: _create_byok_client(model, provider, byok_key)
    end
    CL->>PR: get_default_client(model, provider, streaming, options)
    alt "provider == gemini"
        PR->>GEM: get_or_create_gemini_llm(model, thinking_budget)
        GEM-->>PR: ChatGoogleGenerativeAI
    else openai / openrouter
        PR->>OAI: get_or_create_openai_compatible_llm(provider, model, options)
        OAI-->>PR: ChatOpenAI (cached)
    end
    PR-->>CL: BaseChatModel
    CL-->>CS: BaseChatModel

_{Reviews (1): Last reviewed commit: "test(backend): cover LLM provider routin..." | Re-trigger Greptile}

greptile-apps · 2026-06-15T15:46:50Z

    """Return full feature→(model, provider) mapping for the active profile (debugging/monitoring)."""
    info: Dict[str, Dict[str, str]] = {}
-    all_features = set(_active_profile.keys()) | set(_PINNED_FEATURES.keys())
+    active_profile = get_active_profile()


Unused local variable in get_qos_info

active_profile is assigned on line 325 but never referenced again in the function body — the loop iterates over get_all_configured_features() and calls _get_model_config directly. The assignment is dead code introduced during the refactor and could mislead a reader into thinking the local profile snapshot is driving the iteration.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

greptile-apps · 2026-06-15T15:46:51Z

+# Feature-specific client config (temperature, headers — orthogonal to model choice).
+# Only applied when a feature resolves to an OpenRouter model.
+_OPENROUTER_TEMPERATURES: Dict[str, float] = {
+    'persona_chat': 0.8,
+    'persona_chat_premium': 0.8,
+    'wrapped_analysis': 0.7,
+}


Dead config entries in _OPENROUTER_TEMPERATURES

persona_chat and persona_chat_premium are mapped to the openai provider in every QoS profile (premium, max, byok). get_route_options only applies temperatures when provider == 'openrouter', so these two entries are never reached. The adjacent comment even warns "Only applied when a feature resolves to an OpenRouter model." Leaving them in creates a misleading hint that these features once used OpenRouter and may do so again, which could confuse future engineers adding a new provider for them.

Suggested change

# Feature-specific client config (temperature, headers — orthogonal to model choice).

# Only applied when a feature resolves to an OpenRouter model.

_OPENROUTER_TEMPERATURES: Dict[str, float] = {

'persona_chat': 0.8,

'persona_chat_premium': 0.8,

'wrapped_analysis': 0.7,

}

# Feature-specific client config (temperature, headers — orthogonal to model choice).

# Only applied when a feature resolves to an OpenRouter model.

_OPENROUTER_TEMPERATURES: Dict[str, float] = {

'wrapped_analysis': 0.7,

}

greptile-apps · 2026-06-15T15:46:52Z

+    if key not in _llm_cache:
+        kwargs: Dict[str, Any] = {
+            'callbacks': [_usage_callback],
+            'request_timeout': options.get('request_timeout', 120),
+            'max_retries': options.get('max_retries', 1),
+        }
+        api_key = os.environ.get(provider_config.api_key_env)
+        if api_key:
+            kwargs['api_key'] = api_key
+        if provider_config.base_url:
+            kwargs['base_url'] = provider_config.base_url
+        if provider_config.default_headers:
+            kwargs['default_headers'] = provider_config.default_headers
+        if options.get('extra_body'):
+            kwargs['extra_body'] = options['extra_body']
+        if 'temperature' in options:
+            kwargs['temperature'] = options['temperature']
+        if streaming:
+            kwargs['streaming'] = True
+            kwargs['stream_options'] = {"include_usage": True}
+
+        _llm_cache[key] = ChatOpenAI(model=_api_model_name(provider_config, model_name), **kwargs)


options keys beyond the handled set affect the cache key but are silently dropped from kwargs

_cache_key includes every key in options (via repr of all items), but the construction block only transfers request_timeout, max_retries, extra_body, and temperature to kwargs. Any future key added to an options dict — for example a new sampling parameter like top_p or seed — would produce distinct cache entries that all resolve to the same underlying LLM instance (since the new key is ignored during construction). Today this is benign because get_route_options only emits those four known keys for OpenAI-compatible providers, but the gap widens the blast radius each time a new option is added to get_route_options without a matching handler here.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0389b46320

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-15T16:01:14Z

@@ -0,0 +1,197 @@
+"""Unit tests for maintainable LLM provider/model plug-in seams."""


Add this new test file to the backend test runner

This new unit test is never exercised by the backend CI because backend/test.sh enumerates individual pytest files and does not include tests/unit/test_llm_provider_plugin_structure.py (checked the current script). backend/AGENTS.md also explicitly requires new test files to be added to test.sh; without that entry, the provider/folder routing regressions covered here can silently rot.

Useful? React with 👍 / 👎.

kodjima33

Backend refactor (split LLM provider routing seams) — approve only per policy.

Codex-Agent added 2 commits June 15, 2026 22:39

refactor(backend): split LLM provider routing seams

11f4636

test(backend): cover LLM provider routing compatibility

0389b46

greptile-apps Bot reviewed Jun 15, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 15, 2026

View reviewed changes

kodjima33 approved these changes Jun 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(backend): split LLM provider routing seams#7969

refactor(backend): split LLM provider routing seams#7969
Git-on-my-level wants to merge 2 commits into
BasedHardware:mainfrom
Git-on-my-level:refactor/llm-provider-plugin-structure

Git-on-my-level commented Jun 15, 2026

Uh oh!

greptile-apps Bot commented Jun 15, 2026

Uh oh!

greptile-apps Bot Jun 15, 2026

Uh oh!

greptile-apps Bot Jun 15, 2026

Uh oh!

greptile-apps Bot Jun 15, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Uh oh!

kodjima33 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -0,0 +1,197 @@
		"""Unit tests for maintainable LLM provider/model plug-in seams."""

Conversation

Git-on-my-level commented Jun 15, 2026

Summary

Why

Cost impact

User / production impact

Validation

Rollback

Uh oh!

greptile-apps Bot commented Jun 15, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

kodjima33 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants