Skip to content

Conversation

@pankajarm
Copy link
Contributor

Key improvements:

  1. FN_REGEX_PATTERN: Make newline optional after function name
  2. Handle missing/None content in messages gracefully
  3. Strip whitespace from parameter values (models add extra newlines)
  4. Add _preprocess_model_output() to clean wrapper tags SAFELY
    • Only strips , <tool_call>, </tool_call> at boundaries
    • Preserves these strings if they appear inside parameter values
  5. Apply tool name aliases only as fallback (try exact match first)
  6. Skip security_risk from required param check in fn_call_converter (validation happens later in _extract_security_risk with proper context)
  7. Skip ICL examples for nemotron (limited context window)

Best practices applied:

  • All changes are backwards-compatible
  • Tool aliases only applied if exact name not found (no regression)
  • Preprocessing only at boundaries (preserves param content)
  • security_risk schema unchanged (still required for GPT-4/Claude)

Key improvements:
1. FN_REGEX_PATTERN: Make newline optional after function name
2. Handle missing/None content in messages gracefully
3. Strip whitespace from parameter values (models add extra newlines)
4. Add _preprocess_model_output() to clean wrapper tags SAFELY
   - Only strips </think>, <tool_call>, </tool_call> at boundaries
   - Preserves these strings if they appear inside parameter values
5. Apply tool name aliases only as fallback (try exact match first)
6. Skip security_risk from required param check in fn_call_converter
   (validation happens later in _extract_security_risk with proper context)
7. Skip ICL examples for nemotron (limited context window)

Best practices applied:
- All changes are backwards-compatible
- Tool aliases only applied if exact name not found (no regression)
- Preprocessing only at boundaries (preserves param content)
- security_risk schema unchanged (still required for GPT-4/Claude)
@pankajarm
Copy link
Contributor Author

Tested all 7 checks locally including tools-tests which was changed before pushing this.

"str_replace_editor": "file_editor",
"bash": "terminal",
"execute_bash": "terminal",
"str_replace": "file_editor",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This... is this really Nemotron doing it? Or could it have been due to an outdated prompt or anything in the context? No, it probably isn't prompting... 🤔

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah i think so, nemotron is really overfitted to execute_bash in my experience 😓

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it's definitely Nemotron being I guess Nemotron 😅

We ran into this while testing - even when the prompt clearly says the tools are terminal and file_editor, it keeps outputting execute_bash and str_replace_editor. can't help but think it may have been trained on older OpenHands data with those names baked in pretty hard.

The alias fix is just a fallback though - we try exact match first, so it won't mess with models that already get the names right.

Also hit a few other quirks: it skips the newline after <function=...>, wraps stuff in <tool_call> tags, and sometimes throws in extra whitespace. Small stuff but enough to break the parser.

Oh and we skip ICL examples for it too - similar to what's already done for devstral. Context window issues.

LMK if all looks ok.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't help but think it may have been trained on older OpenHands data with those names baked in pretty hard

🫠

My thought too 🤷‍♂️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I don't think Nemotron-3-Nano-30B-A3B has enough chops, Here are all the trajectories and metrics:
https://huggingface.co/datasets/pankajmathur/nemotron-nano-swebench-verified-traj

Metric Value
Total Instances 500
Submitted 500 (100%)
Evaluated 355 (71%)
Resolved 82 (23.1%)
Empty Patches 145 (29%)

Copy link
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@enyst enyst merged commit 5383fd0 into OpenHands:main Dec 23, 2025
14 checks passed
@pankajarm
Copy link
Contributor Author

thanks, just want to add that I don't think Nemotron-3-Nano-30B-A3B has enough chops, Here are all the trajectories and metrics:
https://huggingface.co/datasets/pankajmathur/nemotron-nano-swebench-verified-traj

Metric Value
Total Instances 500
Submitted 500 (100%)
Evaluated 355 (71%)
Resolved 82 (23.1%)
Empty Patches 145 (29%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants