Skip to content

fix: guard against empty choices and null message in LLM responses#1695

Open
qizwiz wants to merge 2 commits into
allenai:mainfrom
qizwiz:fix/llm-response-unguarded-crash
Open

fix: guard against empty choices and null message in LLM responses#1695
qizwiz wants to merge 2 commits into
allenai:mainfrom
qizwiz:fix/llm-response-unguarded-crash

Conversation

@qizwiz
Copy link
Copy Markdown

@qizwiz qizwiz commented May 18, 2026

Summary

Six locations across five files access response.choices[0].message.content without first verifying that choices is non-empty and message is not None.

Two crash vectors that bypass existing try/except blocks:

  • IndexError — API returns an empty choices list (quota exhaustion, network failure, provider-side filtering)
  • AttributeErrorchoices[0].message is None (documented Gemini PROHIBITED_CONTENT behaviour: HTTP 200 response with null message object rather than an error status)

Files changed

File Locations Guard action
open_instruct/rubrics/run_utils.py 2 Return "" (matches existing except-path behaviour)
open_instruct/ground_truth_utils.py 1 Raise ValueError → caught by except Exception → returns zero-values
open_instruct/rejection_sampling/synthetic_preference_dataset.py 1 Raise ValueError → caught by retry-loop except Exception
scripts/does_prompt_make_sense.py 1 Raise ValueError → caught by retry-loop except Exception
scripts/data/rlvr_code/the_algorithms.py 1 Raise ValueError → caught by outer except → returns error dict
scripts/data/rlvr_code/sft_to_rlvr_azure.py 1 Guard before access; also simplifies redundant double-access

Pattern

# Before
r = response.choices[0].message.content

# After
if not response.choices or response.choices[0].message is None:
    raise ValueError("LLM returned empty or filtered response")
r = response.choices[0].message.content

References

Accessing response.choices[0].message.content without first checking
that choices is non-empty and message is not None can crash with:

- IndexError: empty choices list (network error, quota exhaustion)
- AttributeError: None message (Gemini PROHIBITED_CONTENT returns
  HTTP 200 with null message instead of an error status)

Six locations fixed across five files:

- open_instruct/rubrics/run_utils.py (2 locations)
  Return empty string when response is empty, matching existing
  error-path behaviour.

- open_instruct/ground_truth_utils.py (1 location)
  Raise ValueError inside existing try/except; error is logged
  and reasoning/score return their zero-values.

- open_instruct/rejection_sampling/synthetic_preference_dataset.py (1)
  Raise ValueError inside retry loop's try/except.

- scripts/does_prompt_make_sense.py (1 location)
  Raise ValueError inside retry loop's try/except.

- scripts/data/rlvr_code/the_algorithms.py (1 location)
  Raise ValueError; caught by outer except → returns error dict.

- scripts/data/rlvr_code/sft_to_rlvr_azure.py (1 location)
  Guard before content access; also simplifies redundant double
  access on the same line.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements safety checks across several files to handle empty or filtered LLM responses, preventing potential errors when accessing response content. The reviewer suggests further hardening these guards by also checking if the message content is null, which can occur with certain LLM providers and would lead to runtime errors or type hint violations.

Comment thread open_instruct/ground_truth_utils.py Outdated
Comment on lines +726 to +727
if not completion.choices or completion.choices[0].message is None:
raise ValueError("LLM returned empty or filtered response")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The guard should also check if completion.choices[0].message.content is None. Some LLM providers (like Gemini) may return a valid message object with null content when a response is filtered, which would cause a TypeError in the subsequent re.sub call.

Suggested change
if not completion.choices or completion.choices[0].message is None:
raise ValueError("LLM returned empty or filtered response")
if not completion.choices or completion.choices[0].message is None or completion.choices[0].message.content is None:
raise ValueError("LLM returned empty or filtered response")

Comment on lines +147 to +148
if not response.choices or response.choices[0].message is None:
raise ValueError("LLM returned empty or filtered response")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The guard should also check if response.choices[0].message.content is None. If content is None, the subsequent r.split call will raise an AttributeError.

Suggested change
if not response.choices or response.choices[0].message is None:
raise ValueError("LLM returned empty or filtered response")
if not response.choices or response.choices[0].message is None or response.choices[0].message.content is None:
raise ValueError("LLM returned empty or filtered response")

Comment on lines +149 to +150
if not response.choices or response.choices[0].message is None:
raise ValueError("LLM returned empty or filtered response")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The guard should also check if response.choices[0].message.content is None. If content is None, the subsequent response_content.startswith call (line 156) will raise an AttributeError.

Suggested change
if not response.choices or response.choices[0].message is None:
raise ValueError("LLM returned empty or filtered response")
if not response.choices or response.choices[0].message is None or response.choices[0].message.content is None:
raise ValueError("LLM returned empty or filtered response")

Comment on lines +97 to +98
if not response.choices or response.choices[0].message is None:
raise ValueError("LLM returned empty or filtered response")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The guard should also check if response.choices[0].message.content is None. If content is None, the returned dictionary will contain None for openai_response, which might cause issues for consumers expecting a string.

Suggested change
if not response.choices or response.choices[0].message is None:
raise ValueError("LLM returned empty or filtered response")
if not response.choices or response.choices[0].message is None or response.choices[0].message.content is None:
raise ValueError("LLM returned empty or filtered response")

Comment thread scripts/does_prompt_make_sense.py Outdated
Comment on lines +70 to +71
if not response.choices or response.choices[0].message is None:
raise ValueError("LLM returned empty or filtered response")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The guard should also check if response.choices[0].message.content is None. If content is None, the subsequent r.split call (line 79) will raise an AttributeError.

Suggested change
if not response.choices or response.choices[0].message is None:
raise ValueError("LLM returned empty or filtered response")
if not response.choices or response.choices[0].message is None or response.choices[0].message.content is None:
raise ValueError("LLM returned empty or filtered response")

Comment thread open_instruct/rubrics/run_utils.py Outdated
Comment on lines +119 to +120
if not response.choices or response.choices[0].message is None:
return ""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The guard should also check if response.choices[0].message.content is None. If content is None, this function will return None, which violates the str return type hint and may cause crashes in downstream code expecting a string.

Suggested change
if not response.choices or response.choices[0].message is None:
return ""
if not response.choices or response.choices[0].message is None or response.choices[0].message.content is None:
return ""

Comment thread open_instruct/rubrics/run_utils.py Outdated
Comment on lines +209 to +210
if not response.choices or response.choices[0].message is None:
return ""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The guard should also check if response.choices[0].message.content is None. If content is None, this function will return None, which violates the str return type hint and may cause crashes in downstream code expecting a string.

Suggested change
if not response.choices or response.choices[0].message is None:
return ""
if not response.choices or response.choices[0].message is None or response.choices[0].message.content is None:
return ""

Extend all six guards to additionally check that message.content is
not None, covering providers that return a non-None message object
but set content to None (e.g. on max-token truncation).
@qizwiz
Copy link
Copy Markdown
Author

qizwiz commented May 18, 2026

Good catch @gemini-code-assist — extended all six guards to additionally check message.content is None (commit 9f3f166), covering providers that return a non-None message object but set content to None on max-token truncation or certain filter responses.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Thank you for the update, @qizwiz. That looks correct and ensures robust handling for providers that return a valid message object but omit the content field. This change significantly improves the reliability of the LLM response processing across all identified paths.

@qizwiz
Copy link
Copy Markdown
Author

qizwiz commented May 18, 2026

Thanks for the review @gemini-code-assist. Pushed 9f3f166 extending all six guards to also check message.content is None — this covers providers that return a non-None message object with content=None (e.g. on max-token truncation or tool-call-only responses).

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Thank you for the update, @qizwiz. The addition of the message.content is None check is a great improvement and ensures the code is resilient against various provider-specific response behaviors. This looks solid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant