Skip to content

feat(core): add Model.prefill capability for trailing-assistant support#27915

Open
feanor5555 wants to merge 1 commit into
anomalyco:devfrom
feanor5555:pr2-prefill-capability-schema
Open

feat(core): add Model.prefill capability for trailing-assistant support#27915
feanor5555 wants to merge 1 commit into
anomalyco:devfrom
feanor5555:pr2-prefill-capability-schema

Conversation

@feanor5555
Copy link
Copy Markdown

@feanor5555 feanor5555 commented May 16, 2026

Issue for this PR

Closes #27920

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

Adds an optional prefill: boolean on the Model schema (and on the user-facing provider config schema) without any consumer-side wiring. A first-class capability lets every host consult one canonical source instead of guessing from npm package + reasoning flag.

Anthropic-style providers accept (and rely on) an assistant message as the last turn in a conversation ("response continuation" / "prefill"). Most thinking-on-by-default templates reject it outright — llama.cpp returns HTTP 400 "Assistant response prefill is incompatible with enable_thinking" on Qwen3-family templates, and vLLM/TGI have equivalent behaviour for DeepSeek-R1, GLM-4.6 thinking, Kimi-K2-Thinking, etc.

Default (undefined) is treated as true, so all existing models are unaffected by this PR alone.

Models intended to carry prefill: false in a follow-up sst/models.dev data PR:

  • Qwen: Qwen3 hybrid (0.6B/1.7B/4B/8B/14B/32B/235B), Qwen3-Thinking-2507, Qwen3-VL, Qwen3.5, Qwen3.6, QwQ-32B
  • DeepSeek: R1, R1-0528, V4 (when thinking on)
  • Z.AI / GLM: GLM-4.6 thinking, GLM-4.7 thinking
  • Moonshot: Kimi-K2-Thinking
  • MiniMax: M2

Unchanged (still allow prefill): Qwen2.5, Qwen3-Coder, Qwen3-Instruct-2507, all Anthropic / OpenAI / Azure / Google / Bedrock-Anthropic.

Files:

  • packages/core/src/models.ts — add prefill: Schema.optional(Schema.Boolean) with a per-family list in the schema comment.
  • packages/opencode/src/config/provider.ts — mirror the field on the user-facing config schema with an annotation describing when to set it.

The consumer is split into a separate PR (ProviderTransform.canAcceptTrailingAssistant, MAX_STEPS role routing in session/prompt.ts, and a runtime probe of llama.cpp's /props endpoint).

How did you verify your code works?

bun run typecheck clean. No runtime behaviour change in this PR — the field is read by code that arrives in the follow-up.

Screenshots / recordings

N/A — schema-only change.

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

Anthropic-style providers accept (and rely on) an assistant message as
the last turn in a conversation ("response continuation" / "prefill"
for tool-use continuation). Most other thinking-on-by-default templates
reject it outright — llama.cpp returns HTTP 400 "Assistant response
prefill is incompatible with enable_thinking" on Qwen3-family templates,
and vLLM/TGI have equivalent behaviour for DeepSeek-R1, GLM-4.6 thinking,
Kimi-K2-Thinking, etc.

A first-class `prefill: boolean` on Model lets every host (opencode,
mastra, others) consult one canonical source of truth instead of
guessing from npm package + reasoning flag.

- packages/core/src/models.ts: add optional prefill field on Model
  with a per-family list of templates known to reject prefill
  (Qwen3 hybrid/3.5/3.6/Thinking-2507/VL, QwQ, DeepSeek-R1/R1-0528/V4,
  GLM-4.6/4.7-thinking, Kimi-K2-Thinking, MiniMax-M2).

- packages/opencode/src/config/provider.ts: mirror the field on the
  user-facing config schema with an annotation describing when to set
  it (and what the auto-default is for openai-compatible+reasoning).

Default (undefined) is treated as `true` to keep all existing models
unaffected. Consumer-side logic lives in a follow-up PR.

Sister-PR to a sst/models.dev data PR that will populate prefill: false
on the affected per-model entries.
@github-actions
Copy link
Copy Markdown
Contributor

Hey! Your PR title core: add Model.prefill capability for trailing-assistant support doesn't follow conventional commit format.

Please update it to start with one of:

  • feat: or feat(scope): new feature
  • fix: or fix(scope): bug fix
  • docs: or docs(scope): documentation changes
  • chore: or chore(scope): maintenance tasks
  • refactor: or refactor(scope): code refactoring
  • test: or test(scope): adding or updating tests

Where scope is the package name (e.g., app, desktop, opencode).

See CONTRIBUTING.md for details.

@github-actions github-actions Bot added the needs:compliance This means the issue will auto-close after 2 hours. label May 16, 2026
@github-actions
Copy link
Copy Markdown
Contributor

The following comment was made by an LLM, it may be inaccurate:

No duplicate PRs found. The search shows PR #27916 is the intended follow-up PR mentioned in the description (consumer-side logic), not a duplicate. PR #14772 is an older, unrelated fix about Claude models that predates this feature.

@feanor5555 feanor5555 changed the title core: add Model.prefill capability for trailing-assistant support feat(core): add Model.prefill capability for trailing-assistant support May 16, 2026
@github-actions github-actions Bot removed needs:compliance This means the issue will auto-close after 2 hours. needs:title labels May 16, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for updating your PR! It now meets our contributing guidelines. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Trailing-assistant 400 on llama.cpp/vLLM with thinking-on templates (Qwen3, DeepSeek-R1, GLM-thinking, Kimi-K2-Thinking, MiniMax-M2)

1 participant