Update xllm integration by mylibrar · Pull Request #4 · LLM360/vllm

mylibrar · 2026-04-25T05:30:37Z

Purpose

Enable vLLM to load and serve XLLM models, specifically XLLM models with MoE architecture.

This updates the XLLM integration to support the model components needed for MoE inference, including grouped RMSNorm, query/key norm, sparse MoE execution with FusedMoE, expert weight mapping, shared experts, EP/EPLB metadata, sequence parallel handling, split RoPE head dimensions, and EAGLE3 auxiliary hidden state support.

This is needed for the LLM360 K2-V3 workflow documented in docs/features/k2_v3.md, where an internal K2-V3 instruct checkpoint is served through vLLM with the k2_v3 reasoning parser and multi_format tool-call parser.

Test Plan

Validate touched Python files parse successfully.
Validate the diff has no whitespace errors.
Build a vLLM image with this patch applied.
Start a vLLM endpoint with an XLLM MoE / K2-V3 model using the documented K2-V3 settings:
- --tensor-parallel-size 8
- --reasoning-parser k2_v3
- --enable-auto-tool-choice
- --tool-call-parser multi_format
Run the OpenAI-compatible client example from docs/features/k2_v3.md against the endpoint.

Test Result

git diff HEAD^ HEAD --check: passed.
AST parse validation passed for:
- vllm/model_executor/models/xllm.py
- vllm/model_executor/models/registry.py
Built a vLLM image with this patch applied.
Successfully loaded an XLLM MoE model for inference through the patched vLLM endpoint.
Successfully ran the K2-V3 reasoning and tool-calling example from docs/features/k2_v3.md against that endpoint.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

Update xllm integration

cdcd4d8

mylibrar requested review from hanseungwook and shauryr April 26, 2026 05:16

shauryr reviewed Apr 26, 2026

View reviewed changes

Comment thread vllm/model_executor/models/xllm.py

Update Copyright message

477759a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update xllm integration#4

Update xllm integration#4
mylibrar wants to merge 2 commits intov0.12.0-ifmfrom
v0.12.0-ifm_xllm-fix

mylibrar commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mylibrar commented Apr 25, 2026

Purpose

Test Plan

Test Result

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants