Skip to content

gsm8k math example#294

Merged
mayinghan merged 2 commits intomainfrom
math_gsm8k_example
Oct 28, 2025
Merged

gsm8k math example#294
mayinghan merged 2 commits intomainfrom
math_gsm8k_example

Conversation

@benjibc
Copy link
Copy Markdown
Contributor

@benjibc benjibc commented Oct 28, 2025

Note

SingleTurnRolloutProcessor now optionally trims trailing assistant messages (on by default), with new unit tests and simplified pytest examples removing dataset adapters.

  • Pytest rollout processor:
    • SingleTurnRolloutProcessor: add __init__(drop_trailing_assistant_messages=True) and strip trailing assistant messages before LLM call; use trimmed messages when building final conversation.
  • Tests:
    • Add tests/pytest/test_single_turn_rollout_processor.py covering default drop and opt-out behaviors.
    • Replace GSM8K math example with tests/pytest/gsm8k/test_pytest_math_example.py using direct dataset and simple digit check.
    • Remove dataset_adapter usage from test_pytest_math_format_length.py and test_pytest_word_count_example.py.
    • Add tests/pytest/gsm8k/requirements.txt with eval-protocol.

Written by Cursor Bugbot for commit 5613767. This will update automatically on new commits. Configure here.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@benjibc benjibc force-pushed the math_gsm8k_example branch from 1489b63 to 52027f1 Compare October 28, 2025 06:56
cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

@benjibc benjibc force-pushed the math_gsm8k_example branch 2 times, most recently from ca2793f to 207c76a Compare October 28, 2025 20:12
@benjibc benjibc force-pushed the math_gsm8k_example branch from 207c76a to 5613767 Compare October 28, 2025 20:35
while messages_for_request and messages_for_request[-1].role == "assistant":
messages_for_request.pop()

messages_payload = [message.model_dump() for message in messages_for_request]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Empty Payload After Message Filtering

Missing validation after filtering trailing assistant messages. If all messages in row.messages are assistant messages and drop_trailing_assistant_messages=True, the messages_for_request list becomes empty, resulting in an empty messages_payload being sent to the LLM API. This will fail with an API error rather than being caught by the existing validation on line 42-43. A check should be added after the filtering loop (lines 47-49) to ensure messages_for_request is not empty before proceeding.

Fix in Cursor Fix in Web

@mayinghan mayinghan merged commit 9f352ed into main Oct 28, 2025
8 of 9 checks passed
@mayinghan mayinghan deleted the math_gsm8k_example branch October 28, 2025 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants