Describe the bug
Sending a request containing an array inside content for user role message to the GenAI endpoint (/v3/chat/completions), the ovms TRACE log outputs the original JSON sent, but the generated pipeline input text sent to LLM isn't as expected. Only the last item in the content array is used.
To Reproduce
Steps to reproduce the behavior:
(using Powershell)
- Start OVMS GenAI endpoint (model was previously exported using optimum-cli to openvino IR, standard Granite 4.1 chat template saved in openvino_tokenizer.xml)
$ovms_config = '{"PERFORMANCE_HINT":"LATENCY","KV_CACHE_PRECISION":"u8","CACHE_DIR":"E:\\ovms_models\\cache"}'
& "D:\programs\ovms\ovms.exe" `
--rest_port=8000 `
--model_name="granite-4.1-instruct-int4" `
--model_path="E:\ovms_models\granite-4.1-instruct-int4" `
--task=text_generation `
--target_device="GPU.0" `
--max_num_batched_tokens=8192 `
--enable_prefix_caching=true `
--plugin_config=$ovms_config `
--tool_parser=hermes3 `
--enable_tool_guided_generation true `
--max_num_seqs=1 `
--log_level=TRACE
- Send the following command:
$uri = "http://localhost:8000/v3/chat/completions"
$body = @{
messages = @(
@{
role = "system"
content = "You are a highly skilled software engineer."
},
@{
role = "user"
content = @(
@{
type = "text"
text = "`ninitialize memory bank`n"
},
@{
type = "text"
text = "`n# task_progress RECOMMENDED`n`nWhen starting a new task, it is recommended to include a todo list.`n"
},
@{
type = "text"
text = "<environment_details>`nSome environment details`n</environment_details>"
}
)
}
)
model = "granite-4.1-instruct-int4"
}
$json = $body | ConvertTo-Json -Depth 20
Invoke-RestMethod `
-Uri $uri `
-Method Post `
-Body $json `
-ContentType "application/json"
-
Check ovms logs for request body and pipeline input text:
[llm_calculator][debug][servable.cpp:314] Request body: {"model":"granite-4.1-instruct-int4","messages":[{"role":"system","content":"You are a highly skilled software engineer."},{"role":"user","content":[{"type":"text","text":"\ninitialize memory bank\n"},{"type":"text","text":"\n# task_progress RECOMMENDED\n\nWhen starting a new task, it is recommended to include a todo list.\n"},{"type":"text","text":"<environment_details>\nSome environment details\n</environment_details>"}]}]}
[llm_calculator][debug][servable.cpp:315] Request uri: /v3/chat/completions
[llm_calculator][trace][servable.cpp:232] Pipeline input text: <|start_of_role|>system<|end_of_role|>You are a highly skilled software engineer.<|end_of_text|>
<|start_of_role|>user<|end_of_role|><environment_details>
Some environment details
</environment_details><|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>
-
Note only the last item of the content array has been used.
Expected behavior
Logs should show the following pipeline input text
<|start_of_role|>system<|end_of_role|>You are a highly skilled software engineer.<|end_of_text|>
<|start_of_role|>user<|end_of_role|>
initialize memory bank
# task_progress RECOMMENDED
When starting a new task, it is recommended to include a todo list.
<environment_details>
Some environment details
</environment_details><|end_of_text|>
Logs
[2026-05-05 11:40:13.780][33388][serving][debug][drogon_http_server.cpp:97] Request URI /v3/chat/completions dispatched to streaming thread pool
[2026-05-05 11:40:13.780][33388][serving][debug][http_server.cpp:173] REST request /v3/chat/completions
[2026-05-05 11:40:13.780][33388][serving][debug][http_server.cpp:184] Processing HTTP request: POST /v3/chat/completions body: 664 bytes
[2026-05-05 11:40:13.780][33388][serving][debug][http_rest_api_handler.cpp:549] Model name from deduced from JSON: granite-4.1-instruct-int4
[2026-05-05 11:40:13.781][33388][serving][debug][mediapipegraphdefinition.cpp:369] Successfully waited for mediapipe definition: granite-4.1-instruct-int4
[2026-05-05 11:40:13.781][33388][serving][debug][mediapipegraphdefinition.cpp:263] Creating Mediapipe graph executor: granite-4.1-instruct-int4
[2026-05-05 11:40:13.781][33388][serving][debug][mediapipegraphexecutor.hpp:123] Start unary KServe request mediapipe graph: granite-4.1-instruct-int4 execution
[2026-05-05 11:40:13.781][44520][llm_calculator][debug][http_llm_calculator.cc:70] LLMCalculator [Node: LLMExecutor] Open start
[2026-05-05 11:40:13.781][44520][llm_calculator][debug][http_llm_calculator.cc:76] LLMCalculator [Node: LLMExecutor] Open end
[2026-05-05 11:40:13.781][44520][llm_calculator][debug][http_llm_calculator.cc:80] LLMCalculator [Node: LLMExecutor] Process start
[2026-05-05 11:40:13.781][44520][llm_calculator][debug][servable.cpp:314] Request body: {"model":"granite-4.1-instruct-int4","messages":[{"role":"system","content":"You are a highly skilled software engineer."},{"role":"user","content":[{"type":"text","text":"\ninitialize memory bank\n"},{"type":"text","text":"\n# task_progress RECOMMENDED\n\nWhen starting a new task, it is recommended to include a todo list.\n"},{"type":"text","text":"<environment_details>\nSome environment details\n</environment_details>"}]}]}
[2026-05-05 11:40:13.781][44520][llm_calculator][debug][servable.cpp:315] Request uri: /v3/chat/completions
[2026-05-05 11:40:13.781][44520][llm_calculator][debug][http_llm_calculator.cc:96] LLMCalculator [Node: LLMExecutor] Request loaded successfully
[2026-05-05 11:40:13.781][44520][llm_calculator][debug][openai_completions.cpp:419] Parsed messages successfully
[2026-05-05 11:40:13.781][44520][llm_calculator][debug][http_llm_calculator.cc:115] LLMCalculator [Node: LLMExecutor] Request parsed successfully
[2026-05-05 11:40:13.813][44520][llm_calculator][trace][servable.cpp:232] Pipeline input text: <|start_of_role|>system<|end_of_role|>You are a highly skilled software engineer.<|end_of_text|>
<|start_of_role|>user<|end_of_role|><environment_details>
Some environment details
</environment_details><|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>
[2026-05-05 11:40:13.813][44520][llm_calculator][trace][servable.cpp:233] prompt_token_ids: [100264, 9125, 100265, 2675, 527, 264, 7701, 26611, 3241, 24490, 13, 100257, 198, 100264, 882, 100265, 27, 24175, 13563, 397, 8538, 4676, 3649, 198, 524, 24175, 13563, 29, 100257, 198, 100264, 78191, 100265]
[2026-05-05 11:40:13.813][44520][llm_calculator][debug][http_llm_calculator.cc:122] LLMCalculator [Node: LLMExecutor] Input for the pipeline prepared successfully
[2026-05-05 11:40:13.813][44520][llm_calculator][trace][servable.cpp:46] Notifying executor thread
Configuration
- OVMS version
OpenVINO Model Server 2026.1.0.72cc0624
OpenVINO backend 2026.1.0-21367-63e31528c62-releases/2026/1
OpenVINO GenAI backend 2026.1.0.0-2957-1dabb8c2255
Bazel build flags: --config=win_mp_on_py_off
- OVMS config.json file
{
"architectures": [
"GraniteForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"attention_multiplier": 0.0078125,
"bos_token_id": 100257,
"dtype": "bfloat16",
"embedding_multiplier": 12.0,
"eos_token_id": 100257,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.1,
"intermediate_size": 12800,
"logits_scaling": 16.0,
"max_position_embeddings": 131072,
"mlp_bias": false,
"model_type": "granite",
"num_attention_heads": 32,
"num_hidden_layers": 40,
"num_key_value_heads": 8,
"pad_token_id": 100256,
"residual_multiplier": 0.22,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 10000000,
"tie_word_embeddings": true,
"transformers_version": "4.57.6",
"use_cache": true,
"vocab_size": 100352
}
- CPU, accelerator's versions if applicable
using Intel ARC B580
- Model repository directory structure
Directory: E:\ovms_models\granite-4.1-instruct-int4
Mode LastWriteTime Length Name
-a--- 2026-05-04 7:53 PM 829 config.json
-a--- 2026-05-04 8:53 PM 196 generation_config.json
-a--- 2026-05-05 11:39 AM 1322 graph.pbtxt
-a--- 2026-05-04 7:53 PM 916646 merges.txt
-a--- 2026-05-04 7:58 PM 458 openvino_config.json
-a--- 2026-05-04 7:53 PM 1448197 openvino_detokenizer.bin
-a--- 2026-05-05 10:38 AM 13348 openvino_detokenizer.xml
-a--- 2026-05-04 7:58 PM 5314158177 openvino_model.bin
-a--- 2026-05-04 7:58 PM 3540092 openvino_model.xml
-a--- 2026-05-04 7:53 PM 3695636 openvino_tokenizer.bin
-a--- 2026-05-05 11:00 AM 30314 openvino_tokenizer.xml
-a--- 2026-05-04 7:53 PM 609 special_tokens_map.json
-a--- 2026-05-04 10:22 PM 22694 tokenizer_config.json
-a--- 2026-05-04 7:53 PM 7153421 tokenizer.json
-a--- 2026-05-04 7:53 PM 1612704 vocab.json
- Model or publicly available similar model that reproduces the issue
ibm-granite/granite-4.1-8b or variants using the same chat template.
Describe the bug
Sending a request containing an array inside content for user role message to the GenAI endpoint (/v3/chat/completions), the ovms TRACE log outputs the original JSON sent, but the generated pipeline input text sent to LLM isn't as expected. Only the last item in the content array is used.
To Reproduce
Steps to reproduce the behavior:
(using Powershell)
$ovms_config = '{"PERFORMANCE_HINT":"LATENCY","KV_CACHE_PRECISION":"u8","CACHE_DIR":"E:\\ovms_models\\cache"}'
& "D:\programs\ovms\ovms.exe" `
--rest_port=8000 `
--model_name="granite-4.1-instruct-int4" `
--model_path="E:\ovms_models\granite-4.1-instruct-int4" `
--task=text_generation `
--target_device="GPU.0" `
--max_num_batched_tokens=8192 `
--enable_prefix_caching=true `
--plugin_config=$ovms_config `
--tool_parser=hermes3 `
--enable_tool_guided_generation true `
--max_num_seqs=1 `
--log_level=TRACE
$uri = "http://localhost:8000/v3/chat/completions"
$body = @{
messages = @(
@{
role = "system"
content = "You are a highly skilled software engineer."
},
@{
role = "user"
content = @(
@{
type = "text"
text = "`ninitialize memory bank`n"
},
@{
type = "text"
text = "`n# task_progress RECOMMENDED`n`nWhen starting a new task, it is recommended to include a todo list.`n"
},
@{
type = "text"
text = "<environment_details>`nSome environment details`n</environment_details>"
}
)
}
)
model = "granite-4.1-instruct-int4"
}
$json = $body | ConvertTo-Json -Depth 20
Invoke-RestMethod `
-Uri $uri `
-Method Post `
-Body $json `
-ContentType "application/json"
Check ovms logs for request body and pipeline input text:
[llm_calculator][debug][servable.cpp:314] Request body: {"model":"granite-4.1-instruct-int4","messages":[{"role":"system","content":"You are a highly skilled software engineer."},{"role":"user","content":[{"type":"text","text":"\ninitialize memory bank\n"},{"type":"text","text":"\n# task_progress RECOMMENDED\n\nWhen starting a new task, it is recommended to include a todo list.\n"},{"type":"text","text":"<environment_details>\nSome environment details\n</environment_details>"}]}]}
[llm_calculator][debug][servable.cpp:315] Request uri: /v3/chat/completions
[llm_calculator][trace][servable.cpp:232] Pipeline input text: <|start_of_role|>system<|end_of_role|>You are a highly skilled software engineer.<|end_of_text|>
<|start_of_role|>user<|end_of_role|><environment_details>
Some environment details
</environment_details><|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>
Note only the last item of the content array has been used.
Expected behavior
Logs should show the following pipeline input text
<|start_of_role|>system<|end_of_role|>You are a highly skilled software engineer.<|end_of_text|>
<|start_of_role|>user<|end_of_role|>
initialize memory bank
# task_progress RECOMMENDED
When starting a new task, it is recommended to include a todo list.
<environment_details>
Some environment details
</environment_details><|end_of_text|>
Logs
[2026-05-05 11:40:13.780][33388][serving][debug][drogon_http_server.cpp:97] Request URI /v3/chat/completions dispatched to streaming thread pool
[2026-05-05 11:40:13.780][33388][serving][debug][http_server.cpp:173] REST request /v3/chat/completions
[2026-05-05 11:40:13.780][33388][serving][debug][http_server.cpp:184] Processing HTTP request: POST /v3/chat/completions body: 664 bytes
[2026-05-05 11:40:13.780][33388][serving][debug][http_rest_api_handler.cpp:549] Model name from deduced from JSON: granite-4.1-instruct-int4
[2026-05-05 11:40:13.781][33388][serving][debug][mediapipegraphdefinition.cpp:369] Successfully waited for mediapipe definition: granite-4.1-instruct-int4
[2026-05-05 11:40:13.781][33388][serving][debug][mediapipegraphdefinition.cpp:263] Creating Mediapipe graph executor: granite-4.1-instruct-int4
[2026-05-05 11:40:13.781][33388][serving][debug][mediapipegraphexecutor.hpp:123] Start unary KServe request mediapipe graph: granite-4.1-instruct-int4 execution
[2026-05-05 11:40:13.781][44520][llm_calculator][debug][http_llm_calculator.cc:70] LLMCalculator [Node: LLMExecutor] Open start
[2026-05-05 11:40:13.781][44520][llm_calculator][debug][http_llm_calculator.cc:76] LLMCalculator [Node: LLMExecutor] Open end
[2026-05-05 11:40:13.781][44520][llm_calculator][debug][http_llm_calculator.cc:80] LLMCalculator [Node: LLMExecutor] Process start
[2026-05-05 11:40:13.781][44520][llm_calculator][debug][servable.cpp:314] Request body: {"model":"granite-4.1-instruct-int4","messages":[{"role":"system","content":"You are a highly skilled software engineer."},{"role":"user","content":[{"type":"text","text":"\ninitialize memory bank\n"},{"type":"text","text":"\n# task_progress RECOMMENDED\n\nWhen starting a new task, it is recommended to include a todo list.\n"},{"type":"text","text":"<environment_details>\nSome environment details\n</environment_details>"}]}]}
[2026-05-05 11:40:13.781][44520][llm_calculator][debug][servable.cpp:315] Request uri: /v3/chat/completions
[2026-05-05 11:40:13.781][44520][llm_calculator][debug][http_llm_calculator.cc:96] LLMCalculator [Node: LLMExecutor] Request loaded successfully
[2026-05-05 11:40:13.781][44520][llm_calculator][debug][openai_completions.cpp:419] Parsed messages successfully
[2026-05-05 11:40:13.781][44520][llm_calculator][debug][http_llm_calculator.cc:115] LLMCalculator [Node: LLMExecutor] Request parsed successfully
[2026-05-05 11:40:13.813][44520][llm_calculator][trace][servable.cpp:232] Pipeline input text: <|start_of_role|>system<|end_of_role|>You are a highly skilled software engineer.<|end_of_text|>
<|start_of_role|>user<|end_of_role|><environment_details>
Some environment details
</environment_details><|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>
[2026-05-05 11:40:13.813][44520][llm_calculator][trace][servable.cpp:233] prompt_token_ids: [100264, 9125, 100265, 2675, 527, 264, 7701, 26611, 3241, 24490, 13, 100257, 198, 100264, 882, 100265, 27, 24175, 13563, 397, 8538, 4676, 3649, 198, 524, 24175, 13563, 29, 100257, 198, 100264, 78191, 100265]
[2026-05-05 11:40:13.813][44520][llm_calculator][debug][http_llm_calculator.cc:122] LLMCalculator [Node: LLMExecutor] Input for the pipeline prepared successfully
[2026-05-05 11:40:13.813][44520][llm_calculator][trace][servable.cpp:46] Notifying executor thread
Configuration
OpenVINO Model Server 2026.1.0.72cc0624
OpenVINO backend 2026.1.0-21367-63e31528c62-releases/2026/1
OpenVINO GenAI backend 2026.1.0.0-2957-1dabb8c2255
Bazel build flags: --config=win_mp_on_py_off
{
"architectures": [
"GraniteForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"attention_multiplier": 0.0078125,
"bos_token_id": 100257,
"dtype": "bfloat16",
"embedding_multiplier": 12.0,
"eos_token_id": 100257,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.1,
"intermediate_size": 12800,
"logits_scaling": 16.0,
"max_position_embeddings": 131072,
"mlp_bias": false,
"model_type": "granite",
"num_attention_heads": 32,
"num_hidden_layers": 40,
"num_key_value_heads": 8,
"pad_token_id": 100256,
"residual_multiplier": 0.22,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 10000000,
"tie_word_embeddings": true,
"transformers_version": "4.57.6",
"use_cache": true,
"vocab_size": 100352
}
using Intel ARC B580
Directory: E:\ovms_models\granite-4.1-instruct-int4
Mode LastWriteTime Length Name
-a--- 2026-05-04 7:53 PM 829 config.json
-a--- 2026-05-04 8:53 PM 196 generation_config.json
-a--- 2026-05-05 11:39 AM 1322 graph.pbtxt
-a--- 2026-05-04 7:53 PM 916646 merges.txt
-a--- 2026-05-04 7:58 PM 458 openvino_config.json
-a--- 2026-05-04 7:53 PM 1448197 openvino_detokenizer.bin
-a--- 2026-05-05 10:38 AM 13348 openvino_detokenizer.xml
-a--- 2026-05-04 7:58 PM 5314158177 openvino_model.bin
-a--- 2026-05-04 7:58 PM 3540092 openvino_model.xml
-a--- 2026-05-04 7:53 PM 3695636 openvino_tokenizer.bin
-a--- 2026-05-05 11:00 AM 30314 openvino_tokenizer.xml
-a--- 2026-05-04 7:53 PM 609 special_tokens_map.json
-a--- 2026-05-04 10:22 PM 22694 tokenizer_config.json
-a--- 2026-05-04 7:53 PM 7153421 tokenizer.json
-a--- 2026-05-04 7:53 PM 1612704 vocab.json
ibm-granite/granite-4.1-8b or variants using the same chat template.