import openai
client = openai.OpenAI(
api_key="",
base_url="http://127.0.0.1:8000/v3",
)
output = ""
for chunk in client.chat.completions.create(
messages=[{"role": "user", "content": "1+1"}],
model="Qwen/Qwen3-4B-Instruct-2507",
stream=True,
temperature=0.0,
tool_choice="required",
tools=[
{
"type": "function",
"function": {
"name": "search",
"description": "",
"parameters": {
"type": "object",
"properties": {"query": {"type": "string"}},
"required": ["query"],
},
},
}
],
):
arguments = chunk.choices[0].delta.tool_calls[0].function.arguments
if arguments is not None:
output += arguments
print(output)
{"query": "1+1"}}\n</tool_call>
[2025-12-05 07:30:30.243][152][llm_calculator][debug][servable.cpp:206] Generated subsequent streaming response: data: {"choices":[{"finish_reason":null,"in
dex":0,"logprobs":null,"delta":{"tool_calls":[{"id":"BZS5lI66X","type":"function","index":0,"function":{"name":"search"}}]}}],"created":1764919794,"model":"
Qwen/Qwen3-4B-Instruct-2507","object":"chat.completion.chunk"}
[2025-12-05 07:30:30.243][152][llm_calculator][debug][http_llm_calculator.cc:143] LLMCalculator [Node: LLMExecutor] Response prepared, sending it down the
graph
[2025-12-05 07:30:30.244][152][llm_calculator][debug][http_llm_calculator.cc:156] LLMCalculator [Node: LLMExecutor] Process end
[2025-12-05 07:30:30.252][152][llm_calculator][debug][http_llm_calculator.cc:80] LLMCalculator [Node: LLMExecutor] Process start
[2025-12-05 07:30:30.355][152][llm_calculator][debug][http_llm_calculator.cc:136] LLMCalculator [Node: LLMExecutor] Received partial execution results
[2025-12-05 07:30:30.358][152][llm_calculator][debug][http_llm_calculator.cc:156] LLMCalculator [Node: LLMExecutor] Process end
[2025-12-05 07:30:30.358][152][llm_calculator][debug][http_llm_calculator.cc:80] LLMCalculator [Node: LLMExecutor] Process start
[2025-12-05 07:30:30.485][152][llm_calculator][debug][http_llm_calculator.cc:136] LLMCalculator [Node: LLMExecutor] Received partial execution results
[2025-12-05 07:30:30.487][152][llm_calculator][debug][servable.cpp:206] Generated subsequent streaming response: data: {"choices":[{"finish_reason":null,"index":0,"logprobs":null,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\""}}]}}],"created":1764919794,"model":"Qwen/Qwen3-4B-Instruct-2507","obj
ect":"chat.completion.chunk"}
[2025-12-05 07:30:30.487][152][llm_calculator][debug][http_llm_calculator.cc:143] LLMCalculator [Node: LLMExecutor] Response prepared, sending it down the graph
[2025-12-05 07:30:30.487][152][llm_calculator][debug][http_llm_calculator.cc:156] LLMCalculator [Node: LLMExecutor] Process end
[2025-12-05 07:30:30.488][152][llm_calculator][debug][http_llm_calculator.cc:80] LLMCalculator [Node: LLMExecutor] Process start
[2025-12-05 07:30:30.609][152][llm_calculator][debug][http_llm_calculator.cc:136] LLMCalculator [Node: LLMExecutor] Received partial execution results
[2025-12-05 07:30:30.611][152][llm_calculator][debug][servable.cpp:206] Generated subsequent streaming response: data: {"choices":[{"finish_reason":null,"in
dex":0,"logprobs":null,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"query"}}]}}],"created":1764919794,"model":"Qwen/Qwen3-4B-Instruct-2507","o
bject":"chat.completion.chunk"}
[2025-12-05 07:30:30.611][152][llm_calculator][debug][http_llm_calculator.cc:143] LLMCalculator [Node: LLMExecutor] Response prepared, sending it down the
graph
[2025-12-05 07:30:30.611][152][llm_calculator][debug][http_llm_calculator.cc:156] LLMCalculator [Node: LLMExecutor] Process end
[2025-12-05 07:30:30.611][152][llm_calculator][debug][http_llm_calculator.cc:80] LLMCalculator [Node: LLMExecutor] Process start
[2025-12-05 07:30:30.732][152][llm_calculator][debug][http_llm_calculator.cc:136] LLMCalculator [Node: LLMExecutor] Received partial execution results
[2025-12-05 07:30:30.733][152][llm_calculator][debug][servable.cpp:206] Generated subsequent streaming response: data: {"choices":[{"finish_reason":null,"in
dex":0,"logprobs":null,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\":"}}]}}],"created":1764919794,"model":"Qwen/Qwen3-4B-Instruct-2507","obj
ect":"chat.completion.chunk"}
[2025-12-05 07:30:30.733][152][llm_calculator][debug][http_llm_calculator.cc:143] LLMCalculator [Node: LLMExecutor] Response prepared, sending it down the
graph
[2025-12-05 07:30:30.733][152][llm_calculator][debug][http_llm_calculator.cc:156] LLMCalculator [Node: LLMExecutor] Process end
[2025-12-05 07:30:30.733][152][llm_calculator][debug][http_llm_calculator.cc:80] LLMCalculator [Node: LLMExecutor] Process start
[2025-12-05 07:30:30.849][149][llm_executor][info][llm_executor.hpp:66] All requests: 1; Scheduled requests: 1; Cache usage 55.6%;
[2025-12-05 07:30:30.849][152][llm_calculator][debug][http_llm_calculator.cc:136] LLMCalculator [Node: LLMExecutor] Received partial execution results
[2025-12-05 07:30:30.855][152][llm_calculator][debug][servable.cpp:206] Generated subsequent streaming response: data: {"choices":[{"finish_reason":null,"in
dex":0,"logprobs":null,"delta":{"tool_calls":[{"index":0,"function":{"arguments":" \""}}]}}],"created":1764919794,"model":"Qwen/Qwen3-4B-Instruct-2507","obj
ect":"chat.completion.chunk"}
[2025-12-05 07:30:30.855][152][llm_calculator][debug][http_llm_calculator.cc:143] LLMCalculator [Node: LLMExecutor] Response prepared, sending it down the
graph
[2025-12-05 07:30:30.855][152][llm_calculator][debug][http_llm_calculator.cc:156] LLMCalculator [Node: LLMExecutor] Process end
[2025-12-05 07:30:30.855][152][llm_calculator][debug][http_llm_calculator.cc:80] LLMCalculator [Node: LLMExecutor] Process start
[2025-12-05 07:30:31.017][152][llm_calculator][debug][http_llm_calculator.cc:136] LLMCalculator [Node: LLMExecutor] Received partial execution results
[2025-12-05 07:30:31.022][152][llm_calculator][debug][servable.cpp:206] Generated subsequent streaming response: data: {"choices":[{"finish_reason":null,"in
dex":0,"logprobs":null,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"1"}}]}}],"created":1764919794,"model":"Qwen/Qwen3-4B-Instruct-2507","objec
t":"chat.completion.chunk"}
[2025-12-05 07:30:31.022][152][llm_calculator][debug][http_llm_calculator.cc:143] LLMCalculator [Node: LLMExecutor] Response prepared, sending it down the
graph
[2025-12-05 07:30:31.022][152][llm_calculator][debug][http_llm_calculator.cc:156] LLMCalculator [Node: LLMExecutor] Process end
[2025-12-05 07:30:31.022][152][llm_calculator][debug][http_llm_calculator.cc:80] LLMCalculator [Node: LLMExecutor] Process start
[2025-12-05 07:30:31.141][152][llm_calculator][debug][http_llm_calculator.cc:136] LLMCalculator [Node: LLMExecutor] Received partial execution results
[2025-12-05 07:30:31.142][152][llm_calculator][debug][http_llm_calculator.cc:156] LLMCalculator [Node: LLMExecutor] Process end
[2025-12-05 07:30:31.142][152][llm_calculator][debug][http_llm_calculator.cc:80] LLMCalculator [Node: LLMExecutor] Process start
[2025-12-05 07:30:31.261][152][llm_calculator][debug][http_llm_calculator.cc:136] LLMCalculator [Node: LLMExecutor] Received partial execution results
[2025-12-05 07:30:31.262][152][llm_calculator][debug][http_llm_calculator.cc:156] LLMCalculator [Node: LLMExecutor] Process end
[2025-12-05 07:30:31.262][152][llm_calculator][debug][http_llm_calculator.cc:80] LLMCalculator [Node: LLMExecutor] Process start
[2025-12-05 07:30:31.383][152][llm_calculator][debug][http_llm_calculator.cc:136] LLMCalculator [Node: LLMExecutor] Received partial execution results
[2025-12-05 07:30:31.385][152][llm_calculator][debug][servable.cpp:206] Generated subsequent streaming response: data: {"choices":[{"finish_reason":null,"in
dex":0,"logprobs":null,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"+1\"}}\n"}}]}}],"created":1764919794,"model":"Qwen/Qwen3-4B-Instruct-2507"
,"object":"chat.completion.chunk"}
[2025-12-05 07:30:31.385][152][llm_calculator][debug][http_llm_calculator.cc:143] LLMCalculator [Node: LLMExecutor] Response prepared, sending it down the
graph
[2025-12-05 07:30:31.385][152][llm_calculator][debug][http_llm_calculator.cc:156] LLMCalculator [Node: LLMExecutor] Process end
[2025-12-05 07:30:31.385][152][llm_calculator][debug][http_llm_calculator.cc:80] LLMCalculator [Node: LLMExecutor] Process start
[2025-12-05 07:30:31.507][152][llm_calculator][debug][http_llm_calculator.cc:136] LLMCalculator [Node: LLMExecutor] Received partial execution results
[2025-12-05 07:30:31.508][152][llm_calculator][debug][servable.cpp:206] Generated subsequent streaming response: data: {"choices":[{"finish_reason":null,"in
dex":0,"logprobs":null,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"</"}}]}}],"created":1764919794,"model":"Qwen/Qwen3-4B-Instruct-2507","obje
ct":"chat.completion.chunk"}
[2025-12-05 07:30:31.508][152][llm_calculator][debug][http_llm_calculator.cc:143] LLMCalculator [Node: LLMExecutor] Response prepared, sending it down the
graph
[2025-12-05 07:30:31.508][152][llm_calculator][debug][http_llm_calculator.cc:156] LLMCalculator [Node: LLMExecutor] Process end
[2025-12-05 07:30:31.508][152][llm_calculator][debug][http_llm_calculator.cc:80] LLMCalculator [Node: LLMExecutor] Process start
[2025-12-05 07:30:31.621][152][llm_calculator][debug][http_llm_calculator.cc:136] LLMCalculator [Node: LLMExecutor] Received partial execution results
[2025-12-05 07:30:31.622][152][llm_calculator][debug][servable.cpp:226] Generated complete streaming response: data: {"choices":[{"finish_reason":"stop","in
dex":0,"logprobs":null,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"tool_call>"}}]}}],"created":1764919794,"model":"Qwen/Qwen3-4B-Instruct-250
7","object":"chat.completion.chunk"}
data: [DONE]
Describe the bug
When using the OpenVINO Model Server OpenAI-compatible
/v3endpoint withstream=trueandtool_choice="required"(Hermes 3 tool parser), the streamedargumentsfield sometimes contains JSON fragments that appear to be split or positioned incorrectly. This results in the final concatenated arguments not being valid JSON.To Reproduce
export_model.py:Expected behavior
The streamed fragments for the
argumentsfield should concatenate into valid JSON.For example, the expected final output should be:
{"query": "1+1"}Logs
Configuration
OVMS version
2025.4.0.15ce0188a2025.4.0.0rc2OVMS config.json file
CPU only
Model repository directory structure
Model or publicly available similar model that reproduces the issue
Additional context