fix(opencode): filter empty assistant content for @ai-sdk/openai-compatible#27914
Open
feanor5555 wants to merge 1 commit into
Open
fix(opencode): filter empty assistant content for @ai-sdk/openai-compatible#27914feanor5555 wants to merge 1 commit into
feanor5555 wants to merge 1 commit into
Conversation
llama.cpp's /v1/chat/completions returns HTTP 400 "Assistant response
prefill is incompatible with enable_thinking" whenever the last message
is role:assistant while the chat template has enable_thinking set.
opencode's message-v2.toModelMessagesEffect sometimes emits an assistant
UIMessage whose only parts are [step-start, reasoning("")];
convertToModelMessages collapses that to content:"", which triggers the
400.
This is a class problem, not a Qwen quirk:
- ggml-org/llama.cpp#20861 (exact symptom, opencode + Qwen3.5)
- ggml-org/llama.cpp#21889
- mastra-ai/mastra#15234 (same issue in a different framework)
Every 2025-2026 open-weight thinking family (Qwen3 hybrid/3.5/3.6/
Thinking-2507, QwQ, DeepSeek-R1, GLM-4.6/4.7 thinking, Kimi-K2-Thinking,
MiniMax-M2) has the same incompatibility.
The existing empty-content filter already covers @ai-sdk/anthropic and
@ai-sdk/amazon-bedrock. The two near-identical map().filter() branches
are unified into filterEmptyContent(msgs, signatureKey?), driven by two
small lookup tables (EMPTY_CONTENT_FILTER_NPM set, REASONING_SIGNATURE_KEY
map). Single-pass for/of replaces map().filter(), saving two array
allocations per call. @ai-sdk/openai-compatible is added to the filter
set.
Signature/redactedData round-trip preserved for anthropic/bedrock
(reasoning parts with empty text but opaque token are kept).
Tests:
- 5 new openai-compatible cases (string-empty, parts-empty,
reasoning-only, keep tool-call, keep non-empty reasoning)
- 2 regression guards for anthropic-signature + bedrock-redactedData
round-trip
Contributor
|
Hey! Your PR title Please update it to start with one of:
Where See CONTRIBUTING.md for details. |
6 tasks
Contributor
|
Thanks for updating your PR! It now meets our contributing guidelines. 👍 |
Contributor
|
Thanks for your contribution! This PR doesn't have a linked issue. All PRs must reference an existing issue. Please:
See CONTRIBUTING.md for details. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue for this PR
Closes #27920
Related upstream: ggml-org/llama.cpp#20861, ggml-org/llama.cpp#21889, mastra-ai/mastra#15234 (same symptom in a different framework).
Type of change
What does this PR do?
llama.cpp returns
HTTP 400 "Assistant response prefill is incompatible with enable_thinking"whenever the last message isrole:"assistant"while the chat template hasenable_thinkingon.message-v2.toModelMessagesEffectsometimes emits an assistantUIMessagewhose only parts are[step-start, reasoning("")];convertToModelMessagescollapses that tocontent:""and triggers the 400.opencode already drops empty assistant content for
@ai-sdk/anthropicand@ai-sdk/amazon-bedrock(two near-identical map+filter chains intransform.ts). This PR:filterEmptyContent(msgs, signatureKey?)helper driven by two lookup tables (EMPTY_CONTENT_FILTER_NPMset,REASONING_SIGNATURE_KEYmap). Single-passfor/ofreplacesmap().filter()— two array allocations and one iteration saved per call.@ai-sdk/openai-compatibleto the filter set, fixing the 400.This is one class of trailing-assistant 400. The non-empty case (e.g. the MAX_STEPS wrap-up prefill in
session/prompt.ts) still hits the same template and is covered by a separate PR introducing aModel.prefillcapability.How did you verify your code works?
bun test test/provider/transform.test.ts: 231 pass, 0 fail.llama-serverwith 8 unique 400 bodies extracted from a real opencode session: raw replay → 8/8 still 400 (control); with the filter applied → 5/8 return 200 (all the empty-trailing cases). Remaining 3 are non-empty trailing-assistant, out of scope here.Screenshots / recordings
N/A — backend change.
Checklist