feat(vllm): add grammar and structured output support by eureka928 · Pull Request #8806 · mudler/LocalAI

eureka928 · 2026-03-06T02:58:29Z

Description

This PR fixes #6857

Adds grammar and structured output support to the vLLM backend, enabling users to enforce structured outputs via JSON schema, JSON object, and BNF/GBNF grammar constraints.

Problem

The vLLM backend ignored all structured output parameters:

The Grammar field from the proto was never read
A phantom GuidedDecoding mapping referenced a non-existent proto field
response_format with json_schema or json_object had no effect on vLLM
Missing import time caused a runtime crash on video input

Solution

Proto (backend.proto):

Added JSONSchema (field 52) and ResponseFormat (field 53) to PredictOptions, allowing backends to receive the raw JSON schema and format type natively

Go endpoints (chat.go, completion.go):

Extract the raw JSON schema string from response_format: {type: "json_schema", ...} and store it on config.JSONSchema
Set config.ResponseFormat to the format type (json_object / json_schema)
GBNF grammar is still generated in parallel for llama.cpp compatibility
Added json_schema grammar support to the completion endpoint (was missing)

Go backend (options.go, model_config.go):

Pass JSONSchema and ResponseFormat through gRPCPredictOpts to backends

vLLM backend (backend.py):

Support both StructuredOutputsParams (vLLM latest) and GuidedDecodingParams (vLLM <=0.8.x) with graceful import fallback
Handle three structured output modes with priority: JSONSchema > json_object > Grammar
Fix missing import time and import json
Remove phantom GuidedDecoding mapping

Docs (constrained_grammars.md):

Updated compatibility notice to include vLLM
Added vLLM section with examples for JSON schema, JSON object, and grammar

How It Works

User request (response_format or grammar parameter)
  → chat.go/completion.go: extracts raw JSON schema → config.JSONSchema
  → also generates GBNF grammar → config.Grammar (for llama.cpp compat)
  → options.go: passes both via gRPC PredictOptions
  → vLLM backend: uses StructuredOutputsParams/GuidedDecodingParams
  → llama.cpp backend: uses Grammar (GBNF) as before — no regression

Verification

xgrammar (used by vLLM) explicitly follows the GBNF spec from llama.cpp, so grammar passthrough works
GuidedDecodingParams.json and StructuredOutputsParams.json both accept JSON strings
grammar_is_likely_lark() in vLLM correctly identifies GBNF as non-Lark (via ::= detection)
No conflict with existing config.ResponseFormat usage (image endpoint b64_json goes through a different code path)

Notes for Reviewers

The proto generated Go files are gitignored and built at compile time — only backend.proto is committed
Compatible with both vLLM <=0.8.x (GuidedDecodingParams / guided_decoding) and latest (StructuredOutputsParams / structured_outputs)
No regression for llama.cpp or other backends

Signed commits

Yes, I signed my commits.

@mudler

netlify · 2026-03-06T02:58:34Z

✅ Deploy Preview for localai ready!

Name	Link
🔨 Latest commit	`bb08454`
🔍 Latest deploy log	https://app.netlify.com/projects/localai/deploys/69aa440d185937000840e1f9
😎 Deploy Preview	https://deploy-preview-8806--localai.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Add two new fields to PredictOptions in the proto: - JSONSchema (field 52): raw JSON schema string for backends that support native structured output (e.g. vLLM guided decoding) - ResponseFormat (field 53): response format type string These fields allow backends like vLLM to receive structured output constraints natively instead of only through GBNF grammar conversion. Ref: mudler#6857 Signed-off-by: eureka928 <meobius123@gmail.com>

Add JSONSchema field to ModelConfig to carry the raw JSON schema string alongside the GBNF Grammar. Pass both JSONSchema and ResponseFormat through gRPCPredictOpts to backends via the new proto fields. This allows backends like vLLM to receive the original JSON schema for native structured output support. Ref: mudler#6857 Signed-off-by: eureka928 <meobius123@gmail.com>

In chat and completion endpoints, when response_format is json_schema, extract the raw JSON schema and store it on config.JSONSchema alongside the GBNF grammar. Also set config.ResponseFormat to the format type. This allows backends that support native structured output (like vLLM) to use the JSON schema directly instead of the GBNF grammar. Ref: mudler#6857 Signed-off-by: eureka928 <meobius123@gmail.com>

Update the vLLM backend to support structured output: - Import GuidedDecodingParams from vllm.sampling_params - Handle JSONSchema: parse and pass as GuidedDecodingParams(json_schema=...) - Handle json_object response format: GuidedDecodingParams(json_object=True) - Fall back to Grammar (GBNF) via GuidedDecodingParams(grammar=...) - Remove phantom GuidedDecoding mapping (field doesn't exist in proto) - Fix missing 'import time' and 'import json' for load_video and schema parsing Priority: JSONSchema > json_object > Grammar (GBNF fallback) Ref: mudler#6857 Signed-off-by: eureka928 <meobius123@gmail.com>

- Make GuidedDecodingParams import conditional (try/except) for backwards compatibility with older vLLM versions - Remove GBNF grammar fallback — vLLM expects EBNF, not GBNF, so passing LocalAI's GBNF grammar would produce confusing errors - Pass JSONSchema as string directly instead of parsing to dict (safer across vLLM versions) - Add GBNF grammar generation for json_schema in completion endpoint so non-vLLM backends (llama.cpp) also get grammar enforcement Ref: mudler#6857 Signed-off-by: eureka928 <meobius123@gmail.com>

- Handle both StructuredOutputsParams (vLLM latest) and GuidedDecodingParams (vLLM <=0.8.x) with graceful fallback - Use the correct SamplingParams field name for each version (structured_outputs vs guided_decoding) - Use 'json' parameter (not 'json_schema') matching both APIs - Re-add grammar (GBNF/BNF) passthrough — both vLLM APIs accept a 'grammar' parameter handled by xgrammar which supports GBNF - Priority: JSONSchema > json_object > Grammar Ref: mudler#6857 Signed-off-by: eureka928 <meobius123@gmail.com>

Update the compatibility notice to include vLLM alongside llama.cpp. Add a vLLM-specific section with examples for all three supported methods: json_schema, json_object, and grammar (via xgrammar). Ref: mudler#6857 Signed-off-by: eureka928 <meobius123@gmail.com>

eureka928 · 2026-03-06T12:53:43Z

GM @mudler
Would you review my PR?
Thank you for your time

eureka928 added 7 commits March 6, 2026 04:03

eureka928 force-pushed the feat/vllm-structured-output branch from dabd63c to bb08454 Compare March 6, 2026 03:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(vllm): add grammar and structured output support#8806

feat(vllm): add grammar and structured output support#8806
eureka928 wants to merge 7 commits intomudler:masterfrom
eureka928:feat/vllm-structured-output

eureka928 commented Mar 6, 2026 •

edited

Loading

Uh oh!

netlify bot commented Mar 6, 2026 •

edited

Loading

Uh oh!

eureka928 commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

eureka928 commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

How It Works

Verification

Uh oh!

netlify bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for localai ready!

Uh oh!

eureka928 commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

eureka928 commented Mar 6, 2026 •

edited

Loading

netlify bot commented Mar 6, 2026 •

edited

Loading