feat(recipes): add Qwen3.5-0.8B vLLM aggregated recipe#9285
feat(recipes): add Qwen3.5-0.8B vLLM aggregated recipe#9285MatejKosec wants to merge 4 commits intomainfrom
Conversation
Aggregated single-GPU vLLM recipe for Qwen/Qwen3.5-0.8B (multimodal, hybrid Mamba+Attention) targeting vllm-runtime:1.1.0. Includes the qwen3 reasoning parser and qwen3_coder tool-call parser flags on the worker. Smoke-tested end-to-end on 1x H100 with both chat completion and tool calling. Signed-off-by: Matej Kosec <mkosec@nvidia.com>
WalkthroughThis PR adds a complete Qwen3.5-0.8B aggregated model deployment recipe. It includes a DynamoGraphDeployment manifest (vLLM with Frontend/VllmWorker services), Kubernetes PVC definitions for model and compilation caches, a model-download Job for HuggingFace artifact preparation, and comprehensive README documentation with quick-start instructions and operational guidance. ChangesQwen3.5-0.8B Aggregated Deployment Recipe
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@recipes/qwen3.5-0.8b/model-cache/model-download.yaml`:
- Around line 17-44: The container "model-download" is missing a securityContext
and runs as root; add a securityContext block under the container spec for
model-download to harden defaults: set runAsNonRoot: true and a non-root
runAsUser (e.g. 1000), set allowPrivilegeEscalation: false, and consider adding
readOnlyRootFilesystem: true and seccompProfile/runtimeClass if desired; update
the container spec (name: model-download) to include this securityContext so the
pod no longer runs as root and privilege escalation is disabled.
In `@recipes/qwen3.5-0.8b/README.md`:
- Around line 73-78: The fenced log snippet in recipes/qwen3.5-0.8b/README.md
lacks a language identifier (triggering markdownlint MD040); update the fence
for that block (the triple backticks surrounding the log lines) to include a
language token such as "text" so it becomes ```text, ensuring the block around
the lines like "2026-05-07T21:16:07 INFO model.__post_init__: Resolved
architecture: Qwen3_5ForConditionalGeneration" is closed with matching
backticks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 95d0a12e-9c04-458b-94d3-36f62406f148
📒 Files selected for processing (4)
recipes/qwen3.5-0.8b/README.mdrecipes/qwen3.5-0.8b/model-cache/model-cache.yamlrecipes/qwen3.5-0.8b/model-cache/model-download.yamlrecipes/qwen3.5-0.8b/vllm/agg/deploy.yaml
The model-cache PVC manifest and the HF download Job were template-only (storageClassName: "your-storage-class-name") and don't belong in the recipe -- the deploy.yaml already declares pvcs with create: false, leaving cache provisioning to the operator. Keep only the deploy.yaml + README, and update the Quick Start to point at the deploy directly. Signed-off-by: Matej Kosec <mkosec@nvidia.com>
Signed-off-by: Matej Kosec <mkosec@nvidia.com>
Recipe is small enough to read directly from deploy.yaml; the parser flags and model ID are inline, no separate documentation surface needed. Signed-off-by: Matej Kosec <mkosec@nvidia.com>
Summary
Qwen/Qwen3.5-0.8Bunderrecipes/qwen3.5-0.8b/.nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.1.0. The recipe will not work against1.0.0: that image's vLLM and Transformers pre-date theqwen3_5model type.--dyn-reasoning-parser qwen3and--dyn-tool-call-parser qwen3_coderon the worker.qwen3_coderis Dynamo's registered Qwen3-family XML tool-call parser.Closes #8988.
Notes
--no-enable-log-requests, the renamed flag introduced in fix(recipes): rename --disable-log-requests to --no-enable-log-requests for vLLM #8693. Older recipes still using--disable-log-requestswill fail withunrecognized argumentson1.1.0.qwen3.5-0.8b/(with the dot) to track the official model name -- matches the precedent set bykimi-k2.5/.