Skip to content
This repository was archived by the owner on Feb 18, 2026. It is now read-only.

fix(openai): properly transform streaming tool_calls and handle incomplete streams#11

Closed
nichm wants to merge 1 commit into9j:mainfrom
nichm:fix/openai-streaming-tool-calls
Closed

fix(openai): properly transform streaming tool_calls and handle incomplete streams#11
nichm wants to merge 1 commit into9j:mainfrom
nichm:fix/openai-streaming-tool-calls

Conversation

@nichm
Copy link

@nichm nichm commented Nov 25, 2025

Problem

OpenAI-compatible providers (especially Cerebras) have two critical streaming issues:

  1. Tool calls not transformed: When the model generates tool_calls in streaming mode, they arrive in OpenAI format but need to be transformed to Anthropic's tool_use format for Claude Code compatibility.

  2. Incomplete streams: Some providers (Cerebras) close the HTTP stream without sending a finish_reason chunk when tool_calls are generated, causing the stream to end abruptly with hyper::Error(IncompleteMessage).

Solution

1. Tool Calls Transformation

  • Detect tool_calls in OpenAI streaming chunks
  • Transform to Anthropic tool_use format with proper SSE events:
    • content_block_start with tool metadata
    • content_block_delta with tool arguments
    • content_block_stop to close the tool block
  • Close any open text content blocks before tool blocks
  • Validate and parse JSON arguments

2. Stream Finalization Workaround

  • Added stream_ended_properly flag to track if finish_reason was received
  • Stream finalization closure runs after stream ends
  • Only sends end events if:
    • Stream has open content block
    • AND no finish_reason was received
  • Prevents duplicate messages when stream ends normally

Testing

Tested extensively with:

  • Provider: Cerebras (zai-glm-4.6)
  • Scenario: Messages that trigger tool calls (TodoWrite, file operations)
  • Results:
    • ✅ Streaming completes successfully
    • ✅ Tool calls execute properly in Claude Code
    • ✅ No duplicate messages
    • ✅ No hyper::Error(IncompleteMessage)

Changes

  • Cargo.toml: Added uuid dependency for generating unique message IDs
  • src/providers/openai.rs:
    • New structs for OpenAI streaming format (OpenAIStreamChunk, OpenAIStreamChoice, OpenAIStreamDelta)
    • transform_openai_chunk_to_anthropic_sse: Full transformation logic including tool_use
    • send_message_stream: Stream finalization workaround
    • Proper state management with Arc<Mutex<bool>> for thread-safe flags

Before/After

Before:

  • Stream stops abruptly when tool calls are generated
  • No tool execution in Claude Code
  • hyper::Error(IncompleteMessage) in logs

After:

  • Stream completes successfully with proper end events
  • Tool calls execute correctly in Claude Code
  • No errors, no duplicate messages

Impact

This fix enables Claude Code to work seamlessly with OpenAI-compatible providers that:

  1. Use tool calling in streaming mode
  2. Have incomplete stream implementations (missing finish_reason)

Especially critical for Cerebras, which is known for fast, cost-effective inference but has this streaming quirk.

…plete streams

This fix addresses two critical issues with OpenAI-compatible provider streaming:

1. **Tool Calls Transformation**: OpenAI streaming sends tool_calls in a different
   format than Anthropic. This commit implements full transformation:
   - Detects tool_calls in OpenAI streaming chunks
   - Transforms to Anthropic tool_use format with proper event structure
   - Closes text content blocks before tool_use blocks
   - Sends content_block_start/delta/stop for each tool

2. **Incomplete Stream Handling**: Some OpenAI-compatible providers (notably Cerebras)
   close the stream without sending a finish_reason chunk. This commit adds:
   - Stream finalization that detects incomplete streams
   - Automatic end event generation (content_block_stop, message_delta, message_stop)
   - Prevents duplicate end events when finish_reason IS sent

The fix ensures:
- ✅ Streaming works with tool calls (TodoWrite, etc.)
- ✅ No duplicate messages
- ✅ Graceful handling of provider-specific streaming bugs
- ✅ Full Anthropic API compatibility

Tested with: Cerebras (zai-glm-4.6), streaming tool calls work perfectly.
@elidickinson
Copy link

I’m having the same problems but tool calling still isn’t working for me from Claude code in this branch. There isn’t some other secret step I’m missing right? I’ll check it out with debug logging later

@elidickinson
Copy link

For whatever reason my claude code only seems to make non-streaming requests for tool calls so this seems to do it in quick testing #12

@nichm
Copy link
Author

nichm commented Nov 28, 2025

Hmm yeah I'm having some issues today since the new Claude update. Getting lots of "No response requested." though it does work for 10-20 calls. I'll close this and try out your branch or throw some more tokens at it to try and fix.

@nichm nichm closed this Nov 28, 2025
@elidickinson
Copy link

Actually let me reroll that - I think we need both versions

@elidickinson
Copy link

This is I think unrelated but you might want it too elidickinson@fdf5bab I should probably open a new PR for it

@elidickinson
Copy link

@nichm ok I think got it now. That was tricky. Take a look at #14 I took out the Cerebras workaround because it shouldn't be needed (underlying issue was error in SSE parsing when two SSE chunks received at once)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants