Invalid JSON produced in streamed tool_call arguments when using Hermes 3 parser

**Describe the bug**
When using the OpenVINO Model Server OpenAI-compatible `/v3` endpoint with `stream=true` and `tool_choice="required"` (Hermes 3 tool parser), the streamed `arguments` field sometimes contains JSON fragments that appear to be split or positioned incorrectly. This results in the final concatenated arguments not being valid JSON.

---

**To Reproduce**

1. Prepare the model repository using `export_model.py`:

```bash
python export_model.py text_generation \
  --source_model Qwen/Qwen3-4B-Instruct-2507 \
  --weight-format int8 \
  --config_file_path models/config.json \
  --model_repository_path models \
  --tool_parser hermes3
```

2. Launch OVMS:

```bash
docker run -d --user $(id -u):$(id -g) --rm \
  -p 8000:8000 \
  -v $(pwd)/models:/models \
  openvino/model_server:weekly \
  --rest_port 8000 \
  --model_repository_path models \
  --source_model Qwen/Qwen3-4B-Instruct-2507 \
  --tool_parser hermes3 \
  --task text_generation \
  --enable_prefix_caching true
```

3. Client code:

```python
import openai

client = openai.OpenAI(
    api_key="",
    base_url="http://127.0.0.1:8000/v3",
)

output = ""
for chunk in client.chat.completions.create(
    messages=[{"role": "user", "content": "1+1"}],
    model="Qwen/Qwen3-4B-Instruct-2507",
    stream=True,
    temperature=0.0,
    tool_choice="required",
    tools=[
        {
            "type": "function",
            "function": {
                "name": "search",
                "description": "",
                "parameters": {
                    "type": "object",
                    "properties": {"query": {"type": "string"}},
                    "required": ["query"],
                },
            },
        }
    ],
):
    arguments = chunk.choices[0].delta.tool_calls[0].function.arguments
    if arguments is not None:
        output += arguments

print(output)
```

4. Observed output:

```
{"query": "1+1"}}\n</tool_call>
```

---

**Expected behavior**
The streamed fragments for the `arguments` field should concatenate into valid JSON.
For example, the expected final output should be:

```json
{"query": "1+1"}
```

---

**Logs**

```
[2025-12-05 07:30:30.243][152][llm_calculator][debug][servable.cpp:206] Generated subsequent streaming response: data: {"choices":[{"finish_reason":null,"in
dex":0,"logprobs":null,"delta":{"tool_calls":[{"id":"BZS5lI66X","type":"function","index":0,"function":{"name":"search"}}]}}],"created":1764919794,"model":"
Qwen/Qwen3-4B-Instruct-2507","object":"chat.completion.chunk"}
                                                                                                                                                            
[2025-12-05 07:30:30.243][152][llm_calculator][debug][http_llm_calculator.cc:143] LLMCalculator  [Node: LLMExecutor] Response prepared, sending it down the
graph
[2025-12-05 07:30:30.244][152][llm_calculator][debug][http_llm_calculator.cc:156] LLMCalculator  [Node: LLMExecutor] Process end
[2025-12-05 07:30:30.252][152][llm_calculator][debug][http_llm_calculator.cc:80] LLMCalculator  [Node: LLMExecutor] Process start
[2025-12-05 07:30:30.355][152][llm_calculator][debug][http_llm_calculator.cc:136] LLMCalculator  [Node: LLMExecutor] Received partial execution results
[2025-12-05 07:30:30.358][152][llm_calculator][debug][http_llm_calculator.cc:156] LLMCalculator  [Node: LLMExecutor] Process end
[2025-12-05 07:30:30.358][152][llm_calculator][debug][http_llm_calculator.cc:80] LLMCalculator  [Node: LLMExecutor] Process start
[2025-12-05 07:30:30.485][152][llm_calculator][debug][http_llm_calculator.cc:136] LLMCalculator  [Node: LLMExecutor] Received partial execution results
[2025-12-05 07:30:30.487][152][llm_calculator][debug][servable.cpp:206] Generated subsequent streaming response: data: {"choices":[{"finish_reason":null,"index":0,"logprobs":null,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\""}}]}}],"created":1764919794,"model":"Qwen/Qwen3-4B-Instruct-2507","obj
ect":"chat.completion.chunk"}


[2025-12-05 07:30:30.487][152][llm_calculator][debug][http_llm_calculator.cc:143] LLMCalculator  [Node: LLMExecutor] Response prepared, sending it down the graph
[2025-12-05 07:30:30.487][152][llm_calculator][debug][http_llm_calculator.cc:156] LLMCalculator  [Node: LLMExecutor] Process end
[2025-12-05 07:30:30.488][152][llm_calculator][debug][http_llm_calculator.cc:80] LLMCalculator  [Node: LLMExecutor] Process start
[2025-12-05 07:30:30.609][152][llm_calculator][debug][http_llm_calculator.cc:136] LLMCalculator  [Node: LLMExecutor] Received partial execution results
[2025-12-05 07:30:30.611][152][llm_calculator][debug][servable.cpp:206] Generated subsequent streaming response: data: {"choices":[{"finish_reason":null,"in
dex":0,"logprobs":null,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"query"}}]}}],"created":1764919794,"model":"Qwen/Qwen3-4B-Instruct-2507","o
bject":"chat.completion.chunk"}                                                                                                                             

[2025-12-05 07:30:30.611][152][llm_calculator][debug][http_llm_calculator.cc:143] LLMCalculator  [Node: LLMExecutor] Response prepared, sending it down the
graph
[2025-12-05 07:30:30.611][152][llm_calculator][debug][http_llm_calculator.cc:156] LLMCalculator  [Node: LLMExecutor] Process end
[2025-12-05 07:30:30.611][152][llm_calculator][debug][http_llm_calculator.cc:80] LLMCalculator  [Node: LLMExecutor] Process start
[2025-12-05 07:30:30.732][152][llm_calculator][debug][http_llm_calculator.cc:136] LLMCalculator  [Node: LLMExecutor] Received partial execution results
[2025-12-05 07:30:30.733][152][llm_calculator][debug][servable.cpp:206] Generated subsequent streaming response: data: {"choices":[{"finish_reason":null,"in
dex":0,"logprobs":null,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\":"}}]}}],"created":1764919794,"model":"Qwen/Qwen3-4B-Instruct-2507","obj
ect":"chat.completion.chunk"}


[2025-12-05 07:30:30.733][152][llm_calculator][debug][http_llm_calculator.cc:143] LLMCalculator  [Node: LLMExecutor] Response prepared, sending it down the
graph
[2025-12-05 07:30:30.733][152][llm_calculator][debug][http_llm_calculator.cc:156] LLMCalculator  [Node: LLMExecutor] Process end
[2025-12-05 07:30:30.733][152][llm_calculator][debug][http_llm_calculator.cc:80] LLMCalculator  [Node: LLMExecutor] Process start
[2025-12-05 07:30:30.849][149][llm_executor][info][llm_executor.hpp:66] All requests: 1; Scheduled requests: 1; Cache usage 55.6%;
[2025-12-05 07:30:30.849][152][llm_calculator][debug][http_llm_calculator.cc:136] LLMCalculator  [Node: LLMExecutor] Received partial execution results
[2025-12-05 07:30:30.855][152][llm_calculator][debug][servable.cpp:206] Generated subsequent streaming response: data: {"choices":[{"finish_reason":null,"in
dex":0,"logprobs":null,"delta":{"tool_calls":[{"index":0,"function":{"arguments":" \""}}]}}],"created":1764919794,"model":"Qwen/Qwen3-4B-Instruct-2507","obj
ect":"chat.completion.chunk"}


[2025-12-05 07:30:30.855][152][llm_calculator][debug][http_llm_calculator.cc:143] LLMCalculator  [Node: LLMExecutor] Response prepared, sending it down the
graph
[2025-12-05 07:30:30.855][152][llm_calculator][debug][http_llm_calculator.cc:156] LLMCalculator  [Node: LLMExecutor] Process end
[2025-12-05 07:30:30.855][152][llm_calculator][debug][http_llm_calculator.cc:80] LLMCalculator  [Node: LLMExecutor] Process start
[2025-12-05 07:30:31.017][152][llm_calculator][debug][http_llm_calculator.cc:136] LLMCalculator  [Node: LLMExecutor] Received partial execution results
[2025-12-05 07:30:31.022][152][llm_calculator][debug][servable.cpp:206] Generated subsequent streaming response: data: {"choices":[{"finish_reason":null,"in
dex":0,"logprobs":null,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"1"}}]}}],"created":1764919794,"model":"Qwen/Qwen3-4B-Instruct-2507","objec
t":"chat.completion.chunk"}


[2025-12-05 07:30:31.022][152][llm_calculator][debug][http_llm_calculator.cc:143] LLMCalculator  [Node: LLMExecutor] Response prepared, sending it down the
graph
[2025-12-05 07:30:31.022][152][llm_calculator][debug][http_llm_calculator.cc:156] LLMCalculator  [Node: LLMExecutor] Process end
[2025-12-05 07:30:31.022][152][llm_calculator][debug][http_llm_calculator.cc:80] LLMCalculator  [Node: LLMExecutor] Process start
[2025-12-05 07:30:31.141][152][llm_calculator][debug][http_llm_calculator.cc:136] LLMCalculator  [Node: LLMExecutor] Received partial execution results
[2025-12-05 07:30:31.142][152][llm_calculator][debug][http_llm_calculator.cc:156] LLMCalculator  [Node: LLMExecutor] Process end
[2025-12-05 07:30:31.142][152][llm_calculator][debug][http_llm_calculator.cc:80] LLMCalculator  [Node: LLMExecutor] Process start
[2025-12-05 07:30:31.261][152][llm_calculator][debug][http_llm_calculator.cc:136] LLMCalculator  [Node: LLMExecutor] Received partial execution results
[2025-12-05 07:30:31.262][152][llm_calculator][debug][http_llm_calculator.cc:156] LLMCalculator  [Node: LLMExecutor] Process end
[2025-12-05 07:30:31.262][152][llm_calculator][debug][http_llm_calculator.cc:80] LLMCalculator  [Node: LLMExecutor] Process start
[2025-12-05 07:30:31.383][152][llm_calculator][debug][http_llm_calculator.cc:136] LLMCalculator  [Node: LLMExecutor] Received partial execution results
[2025-12-05 07:30:31.385][152][llm_calculator][debug][servable.cpp:206] Generated subsequent streaming response: data: {"choices":[{"finish_reason":null,"in
dex":0,"logprobs":null,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"+1\"}}\n"}}]}}],"created":1764919794,"model":"Qwen/Qwen3-4B-Instruct-2507"
,"object":"chat.completion.chunk"}


[2025-12-05 07:30:31.385][152][llm_calculator][debug][http_llm_calculator.cc:143] LLMCalculator  [Node: LLMExecutor] Response prepared, sending it down the
graph
[2025-12-05 07:30:31.385][152][llm_calculator][debug][http_llm_calculator.cc:156] LLMCalculator  [Node: LLMExecutor] Process end
[2025-12-05 07:30:31.385][152][llm_calculator][debug][http_llm_calculator.cc:80] LLMCalculator  [Node: LLMExecutor] Process start
[2025-12-05 07:30:31.507][152][llm_calculator][debug][http_llm_calculator.cc:136] LLMCalculator  [Node: LLMExecutor] Received partial execution results
[2025-12-05 07:30:31.508][152][llm_calculator][debug][servable.cpp:206] Generated subsequent streaming response: data: {"choices":[{"finish_reason":null,"in
dex":0,"logprobs":null,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"</"}}]}}],"created":1764919794,"model":"Qwen/Qwen3-4B-Instruct-2507","obje
ct":"chat.completion.chunk"}


[2025-12-05 07:30:31.508][152][llm_calculator][debug][http_llm_calculator.cc:143] LLMCalculator  [Node: LLMExecutor] Response prepared, sending it down the
graph
[2025-12-05 07:30:31.508][152][llm_calculator][debug][http_llm_calculator.cc:156] LLMCalculator  [Node: LLMExecutor] Process end
[2025-12-05 07:30:31.508][152][llm_calculator][debug][http_llm_calculator.cc:80] LLMCalculator  [Node: LLMExecutor] Process start
[2025-12-05 07:30:31.621][152][llm_calculator][debug][http_llm_calculator.cc:136] LLMCalculator  [Node: LLMExecutor] Received partial execution results
[2025-12-05 07:30:31.622][152][llm_calculator][debug][servable.cpp:226] Generated complete streaming response: data: {"choices":[{"finish_reason":"stop","in
dex":0,"logprobs":null,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"tool_call>"}}]}}],"created":1764919794,"model":"Qwen/Qwen3-4B-Instruct-250
7","object":"chat.completion.chunk"}

data: [DONE]
```

---

**Configuration**

1. OVMS version

   * OpenVINO Model Server: `2025.4.0.15ce0188a`
   * OpenVINO backend: `2025.4.0.0rc2`

2. OVMS config.json file
3. CPU only
4. Model repository directory structure
5. Model or publicly available similar model that reproduces the issue

---

**Additional context**


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalid JSON produced in streamed tool_call arguments when using Hermes 3 parser #3838

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Invalid JSON produced in streamed tool_call arguments when using Hermes 3 parser #3838

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions