feat(vllm): add grammar and structured output support#8806
Open
eureka928 wants to merge 7 commits intomudler:masterfrom
Open
feat(vllm): add grammar and structured output support#8806eureka928 wants to merge 7 commits intomudler:masterfrom
eureka928 wants to merge 7 commits intomudler:masterfrom
Conversation
✅ Deploy Preview for localai ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Add two new fields to PredictOptions in the proto: - JSONSchema (field 52): raw JSON schema string for backends that support native structured output (e.g. vLLM guided decoding) - ResponseFormat (field 53): response format type string These fields allow backends like vLLM to receive structured output constraints natively instead of only through GBNF grammar conversion. Ref: mudler#6857 Signed-off-by: eureka928 <meobius123@gmail.com>
Add JSONSchema field to ModelConfig to carry the raw JSON schema string alongside the GBNF Grammar. Pass both JSONSchema and ResponseFormat through gRPCPredictOpts to backends via the new proto fields. This allows backends like vLLM to receive the original JSON schema for native structured output support. Ref: mudler#6857 Signed-off-by: eureka928 <meobius123@gmail.com>
In chat and completion endpoints, when response_format is json_schema, extract the raw JSON schema and store it on config.JSONSchema alongside the GBNF grammar. Also set config.ResponseFormat to the format type. This allows backends that support native structured output (like vLLM) to use the JSON schema directly instead of the GBNF grammar. Ref: mudler#6857 Signed-off-by: eureka928 <meobius123@gmail.com>
Update the vLLM backend to support structured output: - Import GuidedDecodingParams from vllm.sampling_params - Handle JSONSchema: parse and pass as GuidedDecodingParams(json_schema=...) - Handle json_object response format: GuidedDecodingParams(json_object=True) - Fall back to Grammar (GBNF) via GuidedDecodingParams(grammar=...) - Remove phantom GuidedDecoding mapping (field doesn't exist in proto) - Fix missing 'import time' and 'import json' for load_video and schema parsing Priority: JSONSchema > json_object > Grammar (GBNF fallback) Ref: mudler#6857 Signed-off-by: eureka928 <meobius123@gmail.com>
- Make GuidedDecodingParams import conditional (try/except) for backwards compatibility with older vLLM versions - Remove GBNF grammar fallback — vLLM expects EBNF, not GBNF, so passing LocalAI's GBNF grammar would produce confusing errors - Pass JSONSchema as string directly instead of parsing to dict (safer across vLLM versions) - Add GBNF grammar generation for json_schema in completion endpoint so non-vLLM backends (llama.cpp) also get grammar enforcement Ref: mudler#6857 Signed-off-by: eureka928 <meobius123@gmail.com>
- Handle both StructuredOutputsParams (vLLM latest) and GuidedDecodingParams (vLLM <=0.8.x) with graceful fallback - Use the correct SamplingParams field name for each version (structured_outputs vs guided_decoding) - Use 'json' parameter (not 'json_schema') matching both APIs - Re-add grammar (GBNF/BNF) passthrough — both vLLM APIs accept a 'grammar' parameter handled by xgrammar which supports GBNF - Priority: JSONSchema > json_object > Grammar Ref: mudler#6857 Signed-off-by: eureka928 <meobius123@gmail.com>
Update the compatibility notice to include vLLM alongside llama.cpp. Add a vLLM-specific section with examples for all three supported methods: json_schema, json_object, and grammar (via xgrammar). Ref: mudler#6857 Signed-off-by: eureka928 <meobius123@gmail.com>
dabd63c to
bb08454
Compare
Contributor
Author
|
GM @mudler |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR fixes #6857
Adds grammar and structured output support to the vLLM backend, enabling users to enforce structured outputs via JSON schema, JSON object, and BNF/GBNF grammar constraints.
Problem
The vLLM backend ignored all structured output parameters:
Grammarfield from the proto was never readGuidedDecodingmapping referenced a non-existent proto fieldresponse_formatwithjson_schemaorjson_objecthad no effect on vLLMimport timecaused a runtime crash on video inputSolution
Proto (
backend.proto):JSONSchema(field 52) andResponseFormat(field 53) toPredictOptions, allowing backends to receive the raw JSON schema and format type nativelyGo endpoints (
chat.go,completion.go):response_format: {type: "json_schema", ...}and store it onconfig.JSONSchemaconfig.ResponseFormatto the format type (json_object/json_schema)json_schemagrammar support to the completion endpoint (was missing)Go backend (
options.go,model_config.go):JSONSchemaandResponseFormatthroughgRPCPredictOptsto backendsvLLM backend (
backend.py):StructuredOutputsParams(vLLM latest) andGuidedDecodingParams(vLLM <=0.8.x) with graceful import fallbackJSONSchema>json_object>Grammarimport timeandimport jsonGuidedDecodingmappingDocs (
constrained_grammars.md):How It Works
Verification
GuidedDecodingParams.jsonandStructuredOutputsParams.jsonboth accept JSON stringsgrammar_is_likely_lark()in vLLM correctly identifies GBNF as non-Lark (via::=detection)config.ResponseFormatusage (image endpointb64_jsongoes through a different code path)Notes for Reviewers
backend.protois committedGuidedDecodingParams/guided_decoding) and latest (StructuredOutputsParams/structured_outputs)Signed commits
@mudler