Skip to content

fix(wren-ai-service): handle json_schema response_format for non-OpenAI models and guard empty SQL diagnosis response#2167

Open
Thangoun wants to merge 3 commits intoCanner:mainfrom
Thangoun:fix/json-schema-empty-response
Open

fix(wren-ai-service): handle json_schema response_format for non-OpenAI models and guard empty SQL diagnosis response#2167
Thangoun wants to merge 3 commits intoCanner:mainfrom
Thangoun:fix/json-schema-empty-response

Conversation

@Thangoun
Copy link
Copy Markdown

@Thangoun Thangoun commented Mar 24, 2026

Summary

This PR fixes issues when using non-OpenAI models (e.g., Ollama or LiteLLM proxy backends).

Changes

  1. Strip response_format={"type": "json_schema"} for custom api_base models since structured outputs are only supported by native OpenAI models and can return empty responses on other backends.
  2. Add guard for empty SQL diagnosis response to prevent orjson.loads("") crash.

Files Changed

  • wren-ai-service/src/providers/llm/litellm.py
  • wren-ai-service/src/pipelines/generation/sql_diagnosis.py

Result

  • Prevents empty responses from LiteLLM non-OpenAI backends
  • Prevents crash when SQL diagnosis reply is empty

Summary by CodeRabbit

  • Bug Fixes
    • SQL diagnosis now handles missing or empty model replies gracefully, returning a clear standardized message instead of failing.
    • Improved compatibility with custom LLM endpoints by avoiding incompatible response-format instructions being forwarded, reducing errors for non-standard backends.

…I models and guard empty SQL diagnosis response
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 24, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: aadea215-2de9-4e85-a82f-9b24ebb96b8f

📥 Commits

Reviewing files that changed from the base of the PR and between 5dfdafc and 8c5e239.

📒 Files selected for processing (2)
  • wren-ai-service/src/pipelines/generation/sql_diagnosis.py
  • wren-ai-service/src/providers/llm/litellm.py
✅ Files skipped from review due to trivial changes (1)
  • wren-ai-service/src/providers/llm/litellm.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • wren-ai-service/src/pipelines/generation/sql_diagnosis.py

Walkthrough

Two robustness changes: SQL diagnosis post_process now returns a dict and defensively handles missing/empty LLM replies, and the Litellm provider conditionally strips json_schema response_format when api_base is set to avoid forwarding OpenAI-specific formatting to custom backends.

Changes

Cohort / File(s) Summary
SQL Diagnosis
wren-ai-service/src/pipelines/generation/sql_diagnosis.py
post_process return type changed from str to dict; extracts first LLM reply defensively via get("replies", [""])[0]; returns a fallback {"reasoning": "LLM did not return any response."} for missing/empty replies, otherwise parses reply with orjson.loads.
Litellm LLM Provider
wren-ai-service/src/providers/llm/litellm.py
Before building completion request, conditionally remove response_format when self._api_base is set and response_format is a dict with type == "json_schema", preventing forwarding OpenAI json_schema formatting to non-OpenAI/custom API backends.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested labels

module/ai-service, wren-ai-service

Suggested reviewers

  • cyyeh
  • yichieh-lu

Poem

🐰 I nibble on bugs in code so bright,
I patch the silence of a missing byte,
JSON and replies now neatly aligned,
Hops of reliability — coded and signed! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes both main changes: handling json_schema response_format for non-OpenAI models and guarding empty SQL diagnosis responses.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
wren-ai-service/src/pipelines/generation/sql_diagnosis.py (1)

91-98: ⚠️ Potential issue | 🟡 Minor

Return type annotation is incorrect.

The function returns a dict (either the fallback {"reasoning": ...} or the parsed JSON), but the type hint declares -> str.

Proposed fix
 `@observe`(capture_input=False)
 async def post_process(
     generate_sql_diagnosis: dict,
-) -> str:
+) -> dict:
     reply = generate_sql_diagnosis.get("replies", [""])[0]
     if not reply or not reply.strip():
         return {"reasoning": "LLM did not return any response."}
     return orjson.loads(reply)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren-ai-service/src/pipelines/generation/sql_diagnosis.py` around lines 91 -
98, The return type annotation on post_process is wrong: the function returns a
dict (either the fallback {"reasoning": ...} or the JSON parsed via
orjson.loads) but is annotated as -> str; update the signature of post_process
to return a dict type (e.g., -> Dict[str, Any] or -> dict) and import typing as
needed so the annotation matches the actual returns from the post_process
function.
🧹 Nitpick comments (2)
wren-ai-service/src/providers/llm/litellm.py (2)

102-106: Trailing whitespace on line 105.

There appears to be unintentional whitespace added on line 105 after the opening brace merge.

Proposed cleanup
             generation_kwargs = {
                 **combined_generation_kwargs,
                 **(generation_kwargs or {}),
-                
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren-ai-service/src/providers/llm/litellm.py` around lines 102 - 106, The
assignment to generation_kwargs in litellm.py contains unintended trailing
whitespace after the opening brace in the merged dict expression; edit the
generation_kwargs = { **combined_generation_kwargs, **(generation_kwargs or {}),
} statement (the dict merge that constructs generation_kwargs) to remove the
stray whitespace so the line is clean (ensure no extra spaces before the closing
brace or after the comma).

107-114: Heuristic for stripping json_schema may be overly broad.

Using self._api_base presence as the sole indicator to strip json_schema may inadvertently affect:

  • Azure OpenAI endpoints (which support json_schema on newer models)
  • OpenAI-compatible proxies that do support structured outputs
  • Users accessing native OpenAI through corporate proxies

This fix was added to address real issues where Ollama and similar backends return empty responses with json_schema set. However, LiteLLM itself supports json_schema for many non-OpenAI models, making the heuristic potentially too broad. Consider a more targeted approach such as:

  • Checking the model name prefix (e.g., gpt-, o1-, o3-)
  • Using a configuration flag to opt-in/out of json_schema stripping
  • Checking for specific model providers known not to support it

The TODO comment in the constructor suggests _api_base may be removed in the future, which aligns with reconsidering this approach.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren-ai-service/src/providers/llm/litellm.py` around lines 107 - 114, The
current heuristic in litellm.py strips generation_kwargs["response_format"]
whenever self._api_base is set, which is too broad; update the condition in the
block that checks response_format/type == "json_schema" to be more targeted by
either (a) honoring a new instance flag such as
self.strip_json_schema_for_legacy_backends (settable in the constructor), or (b)
inspecting the model identifier (e.g., generation_kwargs.get("model") or
self._model_name) and only popping response_format for known incompatible
backends (match specific provider/model prefixes like "ollama", or a maintained
list of incompatible backends) while allowing OpenAI/Azure-style prefixes
("gpt-", "o1-", "o3-") to keep json_schema; ensure the symbol references are
self._api_base, generation_kwargs, and response_format so the change is
localized to the existing conditional and preserve current behavior for backends
that support structured outputs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@wren-ai-service/src/pipelines/generation/sql_diagnosis.py`:
- Around line 91-98: The return type annotation on post_process is wrong: the
function returns a dict (either the fallback {"reasoning": ...} or the JSON
parsed via orjson.loads) but is annotated as -> str; update the signature of
post_process to return a dict type (e.g., -> Dict[str, Any] or -> dict) and
import typing as needed so the annotation matches the actual returns from the
post_process function.

---

Nitpick comments:
In `@wren-ai-service/src/providers/llm/litellm.py`:
- Around line 102-106: The assignment to generation_kwargs in litellm.py
contains unintended trailing whitespace after the opening brace in the merged
dict expression; edit the generation_kwargs = { **combined_generation_kwargs,
**(generation_kwargs or {}), } statement (the dict merge that constructs
generation_kwargs) to remove the stray whitespace so the line is clean (ensure
no extra spaces before the closing brace or after the comma).
- Around line 107-114: The current heuristic in litellm.py strips
generation_kwargs["response_format"] whenever self._api_base is set, which is
too broad; update the condition in the block that checks response_format/type ==
"json_schema" to be more targeted by either (a) honoring a new instance flag
such as self.strip_json_schema_for_legacy_backends (settable in the
constructor), or (b) inspecting the model identifier (e.g.,
generation_kwargs.get("model") or self._model_name) and only popping
response_format for known incompatible backends (match specific provider/model
prefixes like "ollama", or a maintained list of incompatible backends) while
allowing OpenAI/Azure-style prefixes ("gpt-", "o1-", "o3-") to keep json_schema;
ensure the symbol references are self._api_base, generation_kwargs, and
response_format so the change is localized to the existing conditional and
preserve current behavior for backends that support structured outputs.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: aca0aa23-3251-4feb-9692-c0f36e80adb7

📥 Commits

Reviewing files that changed from the base of the PR and between eea642f and 5dfdafc.

📒 Files selected for processing (2)
  • wren-ai-service/src/pipelines/generation/sql_diagnosis.py
  • wren-ai-service/src/providers/llm/litellm.py

Copy link
Copy Markdown
Author

@Thangoun Thangoun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve conversation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant