Skip to content

refactor(backend): split LLM provider routing seams#7969

Open
Git-on-my-level wants to merge 2 commits into
BasedHardware:mainfrom
Git-on-my-level:refactor/llm-provider-plugin-structure
Open

refactor(backend): split LLM provider routing seams#7969
Git-on-my-level wants to merge 2 commits into
BasedHardware:mainfrom
Git-on-my-level:refactor/llm-provider-plugin-structure

Conversation

@Git-on-my-level

Copy link
Copy Markdown
Collaborator

Summary

Refactors the backend LLM provider/model routing internals while preserving the existing get_llm(feature) production API.

  • Moves feature/profile model routing into utils/llm/model_config.py.
  • Moves provider-specific chat model construction into utils/llm/providers.py.
  • Adds a generic OpenAI-compatible provider constructor (base_url + api_key_env + model) so new compatible providers can be plugged in without touching feature code.
  • Keeps clients.py as a compatibility facade for get_llm, get_model, get_provider, get_qos_info, and legacy tested private symbols.
  • Extracts conv_folder schema/context/validation/assignment into utils/llm/conversation_folder.py so route-specific safety logic is reusable and easier to reason about.
  • Adds unit coverage for provider construction, route options, and folder-assignment validation.

Why

DD-009 concluded that the maintainable production seam should be:

feature callsite
  → get_llm(feature)
  → model/profile config
  → provider factory
  → BaseChatModel

This makes plugging in a new OpenAI-compatible provider/model mostly a config/factory change, while keeping product callsites clean and avoiding a premature TaskContract / ModelStrategy framework.

Cost impact

No immediate traffic or model changes — $0/mo direct savings. This is infrastructure for DD-009: it lowers the engineering cost of routing selected OpenAI/Anthropic workloads to cheaper providers after external benchmarking validates quality and reliability.

User / production impact

  • No intended behavior change.
  • Existing production callsites continue using get_llm(feature).
  • Existing BYOK routing semantics are preserved, including Gemini BYOK routing through the Gemini OpenAI-compatible endpoint and OpenRouter Gemini BYOK rerouting.

Validation

  • python3 -m compileall passes for all new/modified files
  • 96 passed (new + existing BYOK tests), 4 pre-existing warnings
  • git diff --check clean
  • Independent subagent review: caught compatibility regression on _GEMINI_OPENAI_BASE_URL export; fixed with alias. Final: APPROVED

Rollback

Revert this PR. Internal refactor with no data migration or traffic-routing change.

@greptile-apps

greptile-apps Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR extracts the LLM provider/model routing internals from the monolithic clients.py into three focused modules — model_config.py (feature→model/provider mapping), providers.py (ChatOpenAI/Gemini factory + cache), and conversation_folder.py (folder-assignment route with validation) — while keeping the existing get_llm(feature) callsite API unchanged.

  • model_config.py becomes the single source of truth for QoS profiles, pinned features, route options (thinking_budget, temperature, extra_body), and the predicate helpers (is_anthropic_only_feature, supports_prompt_cache, etc.) previously scattered in clients.py.
  • providers.py introduces OpenAICompatibleProviderConfig and a registry-based factory (get_or_create_openai_compatible_llm) so new OpenAI-compatible backends can be plugged in by adding a config entry rather than touching routing code.
  • clients.py is slimmed down to a facade that re-exports all legacy symbols for backward compatibility, with BYOK resolution and startup logging intact.

Confidence Score: 4/5

Safe to merge — no production routing semantics changed and the full BYOK path is preserved end-to-end.

The refactor is a clean structural split with no behavior change to get_llm, BYOK resolution, Gemini native routing, or the OpenRouter Gemini reroute logic. The test suite validates facade compatibility and the new seams. The only things worth a follow-up are a dead local variable in get_qos_info, two unreachable entries in _OPENROUTER_TEMPERATURES, and a latent gap where unknown keys in the options dict feed the cache key but are silently dropped from the actual LLM construction — none of these affect production behavior today.

providers.py (get_or_create_openai_compatible_llm) deserves a second look if new option keys are added to get_route_options in a follow-on PR, to make sure each new key is also handled in the construction block.

Important Files Changed

Filename Overview
backend/utils/llm/model_config.py New module consolidating all feature→(model, provider) routing, profile management, and route-option helpers; two entries in _OPENROUTER_TEMPERATURES are dead because persona_chat/persona_chat_premium are always openai-routed.
backend/utils/llm/providers.py New provider factory module with get_or_create_openai_compatible_llm, get_or_create_gemini_llm, and get_default_client; options dict is only partially consumed during LLM construction while all keys enter the cache key, creating a silent-drop footgun for future option additions.
backend/utils/llm/clients.py Reduced to a compatibility facade; get_llm, BYOK routing, and startup logging are preserved; active_profile in get_qos_info is an unused local variable introduced during the refactor.
backend/utils/llm/conversation_folder.py New module extracting folder-assignment LLM route, validation, and schema from conversation_processing.py; logic is identical to what was removed, no functional changes.
backend/utils/llm/conversation_processing.py Folder-assignment code removed and replaced with imports from conversation_folder.py; no logic changes.
backend/tests/unit/test_llm_provider_plugin_structure.py New unit tests covering cache isolation, provider construction kwargs, route options, and folder-assignment validation; coverage is appropriate for the new seams.

Sequence Diagram

sequenceDiagram
    participant CS as Feature Callsite
    participant CL as clients.get_llm(feature)
    participant MC as model_config<br/>_get_model_config()
    participant RO as model_config<br/>get_route_options()
    participant BYOK as BYOK resolver
    participant PR as providers<br/>get_default_client()
    participant OAI as get_or_create_openai_compatible_llm()
    participant GEM as get_or_create_gemini_llm()

    CS->>CL: get_llm("conv_action_items")
    CL->>MC: _get_model_config(feature)
    MC-->>CL: (model, provider)
    CL->>BYOK: get_byok_key(provider)
    BYOK-->>CL: byok_key or None
    CL->>RO: get_route_options(feature, model, provider)
    RO-->>CL: "{temperature, extra_body, thinking_budget}"
    alt BYOK key present
        CL->>CL: _create_byok_client(model, provider, byok_key)
    end
    CL->>PR: get_default_client(model, provider, streaming, options)
    alt "provider == gemini"
        PR->>GEM: get_or_create_gemini_llm(model, thinking_budget)
        GEM-->>PR: ChatGoogleGenerativeAI
    else openai / openrouter
        PR->>OAI: get_or_create_openai_compatible_llm(provider, model, options)
        OAI-->>PR: ChatOpenAI (cached)
    end
    PR-->>CL: BaseChatModel
    CL-->>CS: BaseChatModel
Loading

Reviews (1): Last reviewed commit: "test(backend): cover LLM provider routin..." | Re-trigger Greptile

"""Return full feature→(model, provider) mapping for the active profile (debugging/monitoring)."""
info: Dict[str, Dict[str, str]] = {}
all_features = set(_active_profile.keys()) | set(_PINNED_FEATURES.keys())
active_profile = get_active_profile()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unused local variable in get_qos_info

active_profile is assigned on line 325 but never referenced again in the function body — the loop iterates over get_all_configured_features() and calls _get_model_config directly. The assignment is dead code introduced during the refactor and could mislead a reader into thinking the local profile snapshot is driving the iteration.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +208 to +214
# Feature-specific client config (temperature, headers — orthogonal to model choice).
# Only applied when a feature resolves to an OpenRouter model.
_OPENROUTER_TEMPERATURES: Dict[str, float] = {
'persona_chat': 0.8,
'persona_chat_premium': 0.8,
'wrapped_analysis': 0.7,
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Dead config entries in _OPENROUTER_TEMPERATURES

persona_chat and persona_chat_premium are mapped to the openai provider in every QoS profile (premium, max, byok). get_route_options only applies temperatures when provider == 'openrouter', so these two entries are never reached. The adjacent comment even warns "Only applied when a feature resolves to an OpenRouter model." Leaving them in creates a misleading hint that these features once used OpenRouter and may do so again, which could confuse future engineers adding a new provider for them.

Suggested change
# Feature-specific client config (temperature, headers — orthogonal to model choice).
# Only applied when a feature resolves to an OpenRouter model.
_OPENROUTER_TEMPERATURES: Dict[str, float] = {
'persona_chat': 0.8,
'persona_chat_premium': 0.8,
'wrapped_analysis': 0.7,
}
# Feature-specific client config (temperature, headers — orthogonal to model choice).
# Only applied when a feature resolves to an OpenRouter model.
_OPENROUTER_TEMPERATURES: Dict[str, float] = {
'wrapped_analysis': 0.7,
}

Comment on lines +80 to +101
if key not in _llm_cache:
kwargs: Dict[str, Any] = {
'callbacks': [_usage_callback],
'request_timeout': options.get('request_timeout', 120),
'max_retries': options.get('max_retries', 1),
}
api_key = os.environ.get(provider_config.api_key_env)
if api_key:
kwargs['api_key'] = api_key
if provider_config.base_url:
kwargs['base_url'] = provider_config.base_url
if provider_config.default_headers:
kwargs['default_headers'] = provider_config.default_headers
if options.get('extra_body'):
kwargs['extra_body'] = options['extra_body']
if 'temperature' in options:
kwargs['temperature'] = options['temperature']
if streaming:
kwargs['streaming'] = True
kwargs['stream_options'] = {"include_usage": True}

_llm_cache[key] = ChatOpenAI(model=_api_model_name(provider_config, model_name), **kwargs)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 options keys beyond the handled set affect the cache key but are silently dropped from kwargs

_cache_key includes every key in options (via repr of all items), but the construction block only transfers request_timeout, max_retries, extra_body, and temperature to kwargs. Any future key added to an options dict — for example a new sampling parameter like top_p or seed — would produce distinct cache entries that all resolve to the same underlying LLM instance (since the new key is ignored during construction). Today this is benign because get_route_options only emits those four known keys for OpenAI-compatible providers, but the gap widens the blast radius each time a new option is added to get_route_options without a matching handler here.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0389b46320

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@@ -0,0 +1,197 @@
"""Unit tests for maintainable LLM provider/model plug-in seams."""

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Add this new test file to the backend test runner

This new unit test is never exercised by the backend CI because backend/test.sh enumerates individual pytest files and does not include tests/unit/test_llm_provider_plugin_structure.py (checked the current script). backend/AGENTS.md also explicitly requires new test files to be added to test.sh; without that entry, the provider/folder routing regressions covered here can silently rot.

Useful? React with 👍 / 👎.

@kodjima33 kodjima33 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backend refactor (split LLM provider routing seams) — approve only per policy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants