Add hermetic E2E coverage for strict chat-completions tool-call validation

## Summary

Follow-up from #4532 and the review at https://github.com/NVIDIA/NemoClaw/pull/4532#pullrequestreview-4391515742.

#4532 bounded the strict Chat Completions tool-call onboarding probe with `max_tokens: 256` and `stream: false`. The unit coverage asserts the payload shape, but we do not yet have a PR-safe hermetic E2E that validates the behavior through the onboarding flow.

## Why this matters

The strict tool-call probe is currently reached from the Local Ollama onboarding path. A payload-shape regression, retry regression, or mock/server compatibility issue could make onboarding fail before the sandbox is created. The existing `gpu-e2e` path is the closest real-flow coverage, but it requires GPU/Ollama infrastructure and is not a cheap hermetic guard.

## Proposed follow-up

Add a hermetic E2E under `test/e2e/` and wire it into `regression-e2e.yaml` that:

- onboards against an OpenAI-compatible mock endpoint that requires structured Chat Completions tool calls,
- captures the strict validation request body,
- asserts `tool_choice=required`, `max_tokens=256`, and `stream=false`,
- verifies success when the mock returns structured `tool_calls`,
- verifies bounded retry/failure behavior when the first strict probe times out or returns a transient error.

## Thinking-model carve-out to evaluate

The review also called out that `probeChatCompletionsToolCalling` now applies the same `max_tokens: 256` cap to every model in the strict path. Today that path is Local Ollama only, but if `requireChatCompletionsToolCalling` is extended to reasoning/thinking models, a model might consume the cap with internal reasoning before emitting a tool call and create a false negative.

As part of this follow-up, evaluate whether strict tool-call validation needs a thinking-model carve-out similar to the existing `getChatCompletionsProbePayload` special cases (`chat_template_kwargs: { thinking: false }` for models that support it), or at least keep the assumption documented near the strict probe payload.

## References

- #4532
- #4501
- Review: https://github.com/NVIDIA/NemoClaw/pull/4532#pullrequestreview-4391515742

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add hermetic E2E coverage for strict chat-completions tool-call validation #4537

Summary

Why this matters

Proposed follow-up

Thinking-model carve-out to evaluate

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add hermetic E2E coverage for strict chat-completions tool-call validation #4537

Description

Summary

Why this matters

Proposed follow-up

Thinking-model carve-out to evaluate

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions