Skip to content

MAF-19543: feat(preset): set --max-model-len default to -1#102

Merged
hhk7734 merged 1 commit intomainfrom
pick-up-maf-19543-from-jira
Apr 3, 2026
Merged

MAF-19543: feat(preset): set --max-model-len default to -1#102
hhk7734 merged 1 commit intomainfrom
pick-up-maf-19543-from-jira

Conversation

@hhk7734
Copy link
Copy Markdown
Member

@hhk7734 hhk7734 commented Apr 3, 2026

Summary

  • Set --max-model-len to -1 across all quickstart, vLLM v0.15.1, and GLM5 presets (77 files)
  • Lets vLLM auto-determine context length from model config instead of hardcoding conservative values
  • Excludes deepseek-r1/ PD disaggregation presets (retain 16384 due to memory constraints)

Test plan

  • helm template renders without errors
  • Representative preset deploys successfully with auto-determined context length
  • Verify inference works on a sample model (e.g., quickstart Qwen 1.5B on MI250)

🤖 Generated with Claude Code

Let vLLM auto-determine the maximum context length from the model
config instead of hardcoding conservative values. This avoids
unnecessarily limiting the usable context window when there are no
memory constraints.

Excludes deepseek-r1 PD disaggregation presets which retain their
current values due to memory constraints.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@hhk7734 hhk7734 requested a review from a team as a code owner April 3, 2026 03:23
Copilot AI review requested due to automatic review settings April 3, 2026 03:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates vLLM-based Helm preset templates to set --max-model-len to -1, allowing vLLM to auto-determine context length from the model config rather than using preset-specific hardcoded values.

Changes:

  • Switched multiple vLLM v0.15.1 presets (including GPT-OSS 120B, Kimi K2.5, DeepSeek-R1 variants) to --max-model-len -1.
  • Switched GLM5 presets to --max-model-len -1.
  • Switched many “quickstart” vLLM presets (AMD MI250/MI300X) to --max-model-len -1, including DeepSeek-R1 PD prefill/decode presets.

Reviewed changes

Copilot reviewed 77 out of 77 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.15.1/openai-gpt-oss-120b-nvidia-h200-sxm-tp2-moe-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.15.1/openai-gpt-oss-120b-nvidia-h200-sxm-1.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.15.1/openai-gpt-oss-120b-nvidia-h100-sxm-tp8-moe-tp8.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.15.1/openai-gpt-oss-120b-nvidia-h100-sxm-tp4-moe-tp4.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.15.1/openai-gpt-oss-120b-nvidia-h100-sxm-tp2-moe-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.15.1/moonshotai-kimi-k2.5-nvidia-h200-sxm-tp8-moe-tp8.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.15.1/moonshotai-kimi-k2.5-nvidia-h200-sxm-tp8-moe-ep8.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.15.1/moonshotai-kimi-k2.5-nvidia-h100-sxm-tp8-moe-ep8.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.15.1/deepseek-ai-deepseek-r1-nvidia-h200-sxm-tp8-moe-tp8.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.15.1/deepseek-ai-deepseek-r1-nvidia-h200-sxm-tp8-moe-ep8.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.15.1/deepseek-ai-deepseek-r1-nvidia-h200-sxm-dp8-moe-ep8.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.15.1/deepseek-ai-deepseek-r1-nvidia-h200-sxm-dp16-moe-ep16.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.15.1/deepseek-ai-deepseek-r1-nvidia-h100-sxm-dp16-moe-ep16.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/vllm/glm5/zai-org-glm-4.7-flash-nvidia-h200-sxm-tp2-moe-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/vllm/glm5/zai-org-glm-4.7-flash-nvidia-h200-sxm-1.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/vllm/glm5/zai-org-glm-4.7-flash-nvidia-h100-sxm-tp4-moe-tp4.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/vllm/glm5/zai-org-glm-4.7-flash-nvidia-h100-sxm-tp2-moe-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/vllm/glm5/zai-org-glm-4.7-flash-nvidia-h100-sxm-1.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-vl-8b-instruct-prefill-amd-mi300x-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-vl-8b-instruct-prefill-amd-mi250-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-vl-8b-instruct-decode-amd-mi300x-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-vl-8b-instruct-decode-amd-mi250-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-vl-8b-instruct-amd-mi300x-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-vl-8b-instruct-amd-mi250-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-1.7b-prefill-amd-mi300x-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-1.7b-prefill-amd-mi250-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-1.7b-decode-amd-mi300x-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-1.7b-decode-amd-mi250-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-1.7b-amd-mi300x-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-1.7b-amd-mi250-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen2.5-1.5b-instruct-prefill-amd-mi300x-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen2.5-1.5b-instruct-prefill-amd-mi250-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen2.5-1.5b-instruct-decode-amd-mi300x-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen2.5-1.5b-instruct-decode-amd-mi250-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen2.5-1.5b-instruct-amd-mi300x-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen2.5-1.5b-instruct-amd-mi250-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen2-0.5b-instruct-prefill-amd-mi300x-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen2-0.5b-instruct-prefill-amd-mi250-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen2-0.5b-instruct-decode-amd-mi300x-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen2-0.5b-instruct-decode-amd-mi250-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen2-0.5b-instruct-amd-mi300x-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen2-0.5b-instruct-amd-mi250-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-openai-gpt-oss-20b-prefill-amd-mi300x-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-openai-gpt-oss-20b-prefill-amd-mi250-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-openai-gpt-oss-20b-decode-amd-mi300x-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-openai-gpt-oss-20b-decode-amd-mi250-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-openai-gpt-oss-20b-amd-mi300x-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-openai-gpt-oss-20b-amd-mi250-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-mistralai-mistral-7b-instruct-v0.3-prefill-amd-mi300x-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-mistralai-mistral-7b-instruct-v0.3-prefill-amd-mi250-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-mistralai-mistral-7b-instruct-v0.3-decode-amd-mi300x-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-mistralai-mistral-7b-instruct-v0.3-decode-amd-mi250-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-mistralai-mistral-7b-instruct-v0.3-amd-mi300x-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-mistralai-mistral-7b-instruct-v0.3-amd-mi250-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-microsoft-phi-mini-moe-instruct-prefill-amd-mi250-dp2-moe-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-microsoft-phi-mini-moe-instruct-decode-amd-mi250-dp2-moe-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-microsoft-phi-mini-moe-instruct-amd-mi250-dp2-moe-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-meta-llama-llama-3.2-1b-instruct-prefill-amd-mi300x-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-meta-llama-llama-3.2-1b-instruct-prefill-amd-mi250-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-meta-llama-llama-3.2-1b-instruct-decode-amd-mi300x-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-meta-llama-llama-3.2-1b-instruct-decode-amd-mi250-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-meta-llama-llama-3.2-1b-instruct-amd-mi300x-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-meta-llama-llama-3.2-1b-instruct-amd-mi250-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-ibm-granite-granite-3.3-8b-instruct-prefill-amd-mi300x-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-ibm-granite-granite-3.3-8b-instruct-prefill-amd-mi250-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-ibm-granite-granite-3.3-8b-instruct-decode-amd-mi300x-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-ibm-granite-granite-3.3-8b-instruct-decode-amd-mi250-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-ibm-granite-granite-3.3-8b-instruct-amd-mi300x-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-ibm-granite-granite-3.3-8b-instruct-amd-mi250-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-deepseek-ai-deepseek-r1-prefill-amd-mi300x-dp8-moe-ep8.helm.yaml Set --max-model-len to -1 (PD prefill)
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-deepseek-ai-deepseek-r1-decode-amd-mi300x-dp8-moe-ep8.helm.yaml Set --max-model-len to -1 (PD decode)
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-deepseek-ai-deepseek-r1-distill-llama-8b-prefill-amd-mi300x-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-deepseek-ai-deepseek-r1-distill-llama-8b-prefill-amd-mi250-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-deepseek-ai-deepseek-r1-distill-llama-8b-decode-amd-mi300x-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-deepseek-ai-deepseek-r1-distill-llama-8b-decode-amd-mi250-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-deepseek-ai-deepseek-r1-distill-llama-8b-amd-mi300x-tp2.helm.yaml Set --max-model-len to -1
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-deepseek-ai-deepseek-r1-distill-llama-8b-amd-mi250-tp2.helm.yaml Set --max-model-len to -1

@hhk7734 hhk7734 merged commit 078610f into main Apr 3, 2026
10 checks passed
@hhk7734 hhk7734 deleted the pick-up-maf-19543-from-jira branch April 3, 2026 05:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants