Skip to content

Update xllm integration#4

Open
mylibrar wants to merge 2 commits intov0.12.0-ifmfrom
v0.12.0-ifm_xllm-fix
Open

Update xllm integration#4
mylibrar wants to merge 2 commits intov0.12.0-ifmfrom
v0.12.0-ifm_xllm-fix

Conversation

@mylibrar
Copy link
Copy Markdown
Collaborator

Purpose

Enable vLLM to load and serve XLLM models, specifically XLLM models with MoE architecture.

This updates the XLLM integration to support the model components needed for MoE inference, including grouped RMSNorm, query/key norm, sparse MoE execution with FusedMoE, expert weight mapping, shared experts, EP/EPLB metadata, sequence parallel handling, split RoPE head dimensions, and EAGLE3 auxiliary hidden state support.

This is needed for the LLM360 K2-V3 workflow documented in docs/features/k2_v3.md, where an internal K2-V3 instruct checkpoint is served through vLLM with the k2_v3 reasoning parser and multi_format tool-call parser.

Test Plan

  • Validate touched Python files parse successfully.
  • Validate the diff has no whitespace errors.
  • Build a vLLM image with this patch applied.
  • Start a vLLM endpoint with an XLLM MoE / K2-V3 model using the documented K2-V3 settings:
    • --tensor-parallel-size 8
    • --reasoning-parser k2_v3
    • --enable-auto-tool-choice
    • --tool-call-parser multi_format
  • Run the OpenAI-compatible client example from docs/features/k2_v3.md against the endpoint.

Test Result

  • git diff HEAD^ HEAD --check: passed.
  • AST parse validation passed for:
    • vllm/model_executor/models/xllm.py
    • vllm/model_executor/models/registry.py
  • Built a vLLM image with this patch applied.
  • Successfully loaded an XLLM MoE model for inference through the patched vLLM endpoint.
  • Successfully ran the K2-V3 reasoning and tool-calling example from docs/features/k2_v3.md against that endpoint.

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

Comment thread vllm/model_executor/models/xllm.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants