fix(wren-ai-service): handle json_schema response_format for non-OpenAI models and guard empty SQL diagnosis response#2167
Conversation
…I models and guard empty SQL diagnosis response
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughTwo robustness changes: SQL diagnosis post_process now returns a dict and defensively handles missing/empty LLM replies, and the Litellm provider conditionally strips Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
wren-ai-service/src/pipelines/generation/sql_diagnosis.py (1)
91-98:⚠️ Potential issue | 🟡 MinorReturn type annotation is incorrect.
The function returns a
dict(either the fallback{"reasoning": ...}or the parsed JSON), but the type hint declares-> str.Proposed fix
`@observe`(capture_input=False) async def post_process( generate_sql_diagnosis: dict, -) -> str: +) -> dict: reply = generate_sql_diagnosis.get("replies", [""])[0] if not reply or not reply.strip(): return {"reasoning": "LLM did not return any response."} return orjson.loads(reply)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@wren-ai-service/src/pipelines/generation/sql_diagnosis.py` around lines 91 - 98, The return type annotation on post_process is wrong: the function returns a dict (either the fallback {"reasoning": ...} or the JSON parsed via orjson.loads) but is annotated as -> str; update the signature of post_process to return a dict type (e.g., -> Dict[str, Any] or -> dict) and import typing as needed so the annotation matches the actual returns from the post_process function.
🧹 Nitpick comments (2)
wren-ai-service/src/providers/llm/litellm.py (2)
102-106: Trailing whitespace on line 105.There appears to be unintentional whitespace added on line 105 after the opening brace merge.
Proposed cleanup
generation_kwargs = { **combined_generation_kwargs, **(generation_kwargs or {}), - }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@wren-ai-service/src/providers/llm/litellm.py` around lines 102 - 106, The assignment to generation_kwargs in litellm.py contains unintended trailing whitespace after the opening brace in the merged dict expression; edit the generation_kwargs = { **combined_generation_kwargs, **(generation_kwargs or {}), } statement (the dict merge that constructs generation_kwargs) to remove the stray whitespace so the line is clean (ensure no extra spaces before the closing brace or after the comma).
107-114: Heuristic for strippingjson_schemamay be overly broad.Using
self._api_basepresence as the sole indicator to stripjson_schemamay inadvertently affect:
- Azure OpenAI endpoints (which support json_schema on newer models)
- OpenAI-compatible proxies that do support structured outputs
- Users accessing native OpenAI through corporate proxies
This fix was added to address real issues where Ollama and similar backends return empty responses with
json_schemaset. However, LiteLLM itself supportsjson_schemafor many non-OpenAI models, making the heuristic potentially too broad. Consider a more targeted approach such as:
- Checking the model name prefix (e.g.,
gpt-,o1-,o3-)- Using a configuration flag to opt-in/out of json_schema stripping
- Checking for specific model providers known not to support it
The TODO comment in the constructor suggests
_api_basemay be removed in the future, which aligns with reconsidering this approach.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@wren-ai-service/src/providers/llm/litellm.py` around lines 107 - 114, The current heuristic in litellm.py strips generation_kwargs["response_format"] whenever self._api_base is set, which is too broad; update the condition in the block that checks response_format/type == "json_schema" to be more targeted by either (a) honoring a new instance flag such as self.strip_json_schema_for_legacy_backends (settable in the constructor), or (b) inspecting the model identifier (e.g., generation_kwargs.get("model") or self._model_name) and only popping response_format for known incompatible backends (match specific provider/model prefixes like "ollama", or a maintained list of incompatible backends) while allowing OpenAI/Azure-style prefixes ("gpt-", "o1-", "o3-") to keep json_schema; ensure the symbol references are self._api_base, generation_kwargs, and response_format so the change is localized to the existing conditional and preserve current behavior for backends that support structured outputs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@wren-ai-service/src/pipelines/generation/sql_diagnosis.py`:
- Around line 91-98: The return type annotation on post_process is wrong: the
function returns a dict (either the fallback {"reasoning": ...} or the JSON
parsed via orjson.loads) but is annotated as -> str; update the signature of
post_process to return a dict type (e.g., -> Dict[str, Any] or -> dict) and
import typing as needed so the annotation matches the actual returns from the
post_process function.
---
Nitpick comments:
In `@wren-ai-service/src/providers/llm/litellm.py`:
- Around line 102-106: The assignment to generation_kwargs in litellm.py
contains unintended trailing whitespace after the opening brace in the merged
dict expression; edit the generation_kwargs = { **combined_generation_kwargs,
**(generation_kwargs or {}), } statement (the dict merge that constructs
generation_kwargs) to remove the stray whitespace so the line is clean (ensure
no extra spaces before the closing brace or after the comma).
- Around line 107-114: The current heuristic in litellm.py strips
generation_kwargs["response_format"] whenever self._api_base is set, which is
too broad; update the condition in the block that checks response_format/type ==
"json_schema" to be more targeted by either (a) honoring a new instance flag
such as self.strip_json_schema_for_legacy_backends (settable in the
constructor), or (b) inspecting the model identifier (e.g.,
generation_kwargs.get("model") or self._model_name) and only popping
response_format for known incompatible backends (match specific provider/model
prefixes like "ollama", or a maintained list of incompatible backends) while
allowing OpenAI/Azure-style prefixes ("gpt-", "o1-", "o3-") to keep json_schema;
ensure the symbol references are self._api_base, generation_kwargs, and
response_format so the change is localized to the existing conditional and
preserve current behavior for backends that support structured outputs.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: aca0aa23-3251-4feb-9692-c0f36e80adb7
📒 Files selected for processing (2)
wren-ai-service/src/pipelines/generation/sql_diagnosis.pywren-ai-service/src/providers/llm/litellm.py
Summary
This PR fixes issues when using non-OpenAI models (e.g., Ollama or LiteLLM proxy backends).
Changes
response_format={"type": "json_schema"}for customapi_basemodels since structured outputs are only supported by native OpenAI models and can return empty responses on other backends.orjson.loads("")crash.Files Changed
Result
Summary by CodeRabbit