When sending a streaming request directly to the LiteLLM endpoint, the stream may include a chunk with an empty reasoning_content:
$ curl -H "Content-Type: application/json" -H "Authorization: Bearer $API_KEY" http://localhost:4000/v1/chat/completions -d '{"model":"openai/gpt-oss-20b","stream":true,"messages":[{"role":"user","content":"Please explain Euler-Lagrange equation."}]}'
...
data: {"id":"chatcmpl-a5b6f10276aae84e","created":1767852482,"model":"my-openai/gpt-oss-20b","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"reasoning_content":"","reasoning":"","role":"assistant"}}]}
data: {"id":"chatcmpl-a5b6f10276aae84e","created":1767852482,"model":"my-openai/gpt-oss-20b","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"reasoning_content":"We","reasoning":"We"}}]}
data: {"id":"chatcmpl-a5b6f10276aae84e","created":1767852482,"model":"my-openai/gpt-oss-20b","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"reasoning_content":" need","reasoning":" need"}}]}
...
data: {"id":"chatcmpl-a5b6f10276aae84e","created":1767852482,"model":"my-openai/gpt-oss-20b","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"reasoning_content":"t","reasoning":"t"}}]}
data: {"id":"chatcmpl-a5b6f10276aae84e","created":1767852482,"model":"my-openai/gpt-oss-20b","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"reasoning_content":"(","reasoning":"("}}]}
data: {"id":"chatcmpl-a5b6f10276aae84e","created":1767852482,"model":"my-openai/gpt-oss-20b","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"reasoning_content":"","reasoning":""}}]}
data: {"id":"chatcmpl-a5b6f10276aae84e","created":1767852482,"model":"my-openai/gpt-oss-20b","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"reasoning_content":"∂","reasoning":"∂"}}]}
...
data: {"id":"chatcmpl-a5b6f10276aae84e","created":1767852482,"model":"my-openai/gpt-oss-20b","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"##"}}]}
data: {"id":"chatcmpl-a5b6f10276aae84e","created":1767852482,"model":"my-openai/gpt-oss-20b","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" The"}}]}
...
data: {"id":"chatcmpl-a5b6f10276aae84e","created":1767852482,"model":"my-openai/gpt-oss-20b","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"reasoning_content":"","reasoning":""}}]}
Description
When an LLM response stream includes a chunk with an empty
reasoning_content, the Dify chat UI ends up rendering multiple<think>blocks. As a result, multipleThoughts (x.x s)entries are displayed.Screenshots:
Environment
openai/gpt-oss-20b(served via vLLM)Details
When sending a streaming request directly to the LiteLLM endpoint, the stream may include a chunk with an empty
reasoning_content:It looks like the Dify Plugin SDK closes the current
<think>block when it encounters a chunk like the following:However, an empty
reasoning_contentchunk does not indicate the end of reasoning. Treating it as a boundary causes the SDK/UI to start a new<think>block for subsequent reasoning tokens, which results in multipleThoughts (x.x s)entries.Expected behavior
reasoning_content(e.g."delta":{"reasoning_content":""}) should not close the current<think>block.<think>block should be closed only whenreasoning_contentis absent or null.