refactor(backend): split LLM provider routing seams#7969
refactor(backend): split LLM provider routing seams#7969Git-on-my-level wants to merge 2 commits into
Conversation
Greptile SummaryThis PR extracts the LLM provider/model routing internals from the monolithic
Confidence Score: 4/5Safe to merge — no production routing semantics changed and the full BYOK path is preserved end-to-end. The refactor is a clean structural split with no behavior change to
Important Files Changed
Sequence DiagramsequenceDiagram
participant CS as Feature Callsite
participant CL as clients.get_llm(feature)
participant MC as model_config<br/>_get_model_config()
participant RO as model_config<br/>get_route_options()
participant BYOK as BYOK resolver
participant PR as providers<br/>get_default_client()
participant OAI as get_or_create_openai_compatible_llm()
participant GEM as get_or_create_gemini_llm()
CS->>CL: get_llm("conv_action_items")
CL->>MC: _get_model_config(feature)
MC-->>CL: (model, provider)
CL->>BYOK: get_byok_key(provider)
BYOK-->>CL: byok_key or None
CL->>RO: get_route_options(feature, model, provider)
RO-->>CL: "{temperature, extra_body, thinking_budget}"
alt BYOK key present
CL->>CL: _create_byok_client(model, provider, byok_key)
end
CL->>PR: get_default_client(model, provider, streaming, options)
alt "provider == gemini"
PR->>GEM: get_or_create_gemini_llm(model, thinking_budget)
GEM-->>PR: ChatGoogleGenerativeAI
else openai / openrouter
PR->>OAI: get_or_create_openai_compatible_llm(provider, model, options)
OAI-->>PR: ChatOpenAI (cached)
end
PR-->>CL: BaseChatModel
CL-->>CS: BaseChatModel
Reviews (1): Last reviewed commit: "test(backend): cover LLM provider routin..." | Re-trigger Greptile |
| """Return full feature→(model, provider) mapping for the active profile (debugging/monitoring).""" | ||
| info: Dict[str, Dict[str, str]] = {} | ||
| all_features = set(_active_profile.keys()) | set(_PINNED_FEATURES.keys()) | ||
| active_profile = get_active_profile() |
There was a problem hiding this comment.
Unused local variable in
get_qos_info
active_profile is assigned on line 325 but never referenced again in the function body — the loop iterates over get_all_configured_features() and calls _get_model_config directly. The assignment is dead code introduced during the refactor and could mislead a reader into thinking the local profile snapshot is driving the iteration.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| # Feature-specific client config (temperature, headers — orthogonal to model choice). | ||
| # Only applied when a feature resolves to an OpenRouter model. | ||
| _OPENROUTER_TEMPERATURES: Dict[str, float] = { | ||
| 'persona_chat': 0.8, | ||
| 'persona_chat_premium': 0.8, | ||
| 'wrapped_analysis': 0.7, | ||
| } |
There was a problem hiding this comment.
Dead config entries in
_OPENROUTER_TEMPERATURES
persona_chat and persona_chat_premium are mapped to the openai provider in every QoS profile (premium, max, byok). get_route_options only applies temperatures when provider == 'openrouter', so these two entries are never reached. The adjacent comment even warns "Only applied when a feature resolves to an OpenRouter model." Leaving them in creates a misleading hint that these features once used OpenRouter and may do so again, which could confuse future engineers adding a new provider for them.
| # Feature-specific client config (temperature, headers — orthogonal to model choice). | |
| # Only applied when a feature resolves to an OpenRouter model. | |
| _OPENROUTER_TEMPERATURES: Dict[str, float] = { | |
| 'persona_chat': 0.8, | |
| 'persona_chat_premium': 0.8, | |
| 'wrapped_analysis': 0.7, | |
| } | |
| # Feature-specific client config (temperature, headers — orthogonal to model choice). | |
| # Only applied when a feature resolves to an OpenRouter model. | |
| _OPENROUTER_TEMPERATURES: Dict[str, float] = { | |
| 'wrapped_analysis': 0.7, | |
| } |
| if key not in _llm_cache: | ||
| kwargs: Dict[str, Any] = { | ||
| 'callbacks': [_usage_callback], | ||
| 'request_timeout': options.get('request_timeout', 120), | ||
| 'max_retries': options.get('max_retries', 1), | ||
| } | ||
| api_key = os.environ.get(provider_config.api_key_env) | ||
| if api_key: | ||
| kwargs['api_key'] = api_key | ||
| if provider_config.base_url: | ||
| kwargs['base_url'] = provider_config.base_url | ||
| if provider_config.default_headers: | ||
| kwargs['default_headers'] = provider_config.default_headers | ||
| if options.get('extra_body'): | ||
| kwargs['extra_body'] = options['extra_body'] | ||
| if 'temperature' in options: | ||
| kwargs['temperature'] = options['temperature'] | ||
| if streaming: | ||
| kwargs['streaming'] = True | ||
| kwargs['stream_options'] = {"include_usage": True} | ||
|
|
||
| _llm_cache[key] = ChatOpenAI(model=_api_model_name(provider_config, model_name), **kwargs) |
There was a problem hiding this comment.
options keys beyond the handled set affect the cache key but are silently dropped from kwargs
_cache_key includes every key in options (via repr of all items), but the construction block only transfers request_timeout, max_retries, extra_body, and temperature to kwargs. Any future key added to an options dict — for example a new sampling parameter like top_p or seed — would produce distinct cache entries that all resolve to the same underlying LLM instance (since the new key is ignored during construction). Today this is benign because get_route_options only emits those four known keys for OpenAI-compatible providers, but the gap widens the blast radius each time a new option is added to get_route_options without a matching handler here.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0389b46320
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| @@ -0,0 +1,197 @@ | |||
| """Unit tests for maintainable LLM provider/model plug-in seams.""" | |||
There was a problem hiding this comment.
Add this new test file to the backend test runner
This new unit test is never exercised by the backend CI because backend/test.sh enumerates individual pytest files and does not include tests/unit/test_llm_provider_plugin_structure.py (checked the current script). backend/AGENTS.md also explicitly requires new test files to be added to test.sh; without that entry, the provider/folder routing regressions covered here can silently rot.
Useful? React with 👍 / 👎.
kodjima33
left a comment
There was a problem hiding this comment.
Backend refactor (split LLM provider routing seams) — approve only per policy.
Summary
Refactors the backend LLM provider/model routing internals while preserving the existing
get_llm(feature)production API.utils/llm/model_config.py.utils/llm/providers.py.base_url + api_key_env + model) so new compatible providers can be plugged in without touching feature code.clients.pyas a compatibility facade forget_llm,get_model,get_provider,get_qos_info, and legacy tested private symbols.conv_folderschema/context/validation/assignment intoutils/llm/conversation_folder.pyso route-specific safety logic is reusable and easier to reason about.Why
DD-009 concluded that the maintainable production seam should be:
This makes plugging in a new OpenAI-compatible provider/model mostly a config/factory change, while keeping product callsites clean and avoiding a premature
TaskContract/ModelStrategyframework.Cost impact
No immediate traffic or model changes — $0/mo direct savings. This is infrastructure for DD-009: it lowers the engineering cost of routing selected OpenAI/Anthropic workloads to cheaper providers after external benchmarking validates quality and reliability.
User / production impact
get_llm(feature).Validation
python3 -m compileallpasses for all new/modified filesgit diff --checkclean_GEMINI_OPENAI_BASE_URLexport; fixed with alias. Final: APPROVEDRollback
Revert this PR. Internal refactor with no data migration or traffic-routing change.