bugfix: fix mbox model(qwen2.5) multi round core in xattention by DragonFive · Pull Request #988 · jd-opensource/xllm

DragonFive · 2026-03-04T02:07:29Z

Summary

This PR fixes the REC multi-round core path in xattention for the mbox (Qwen2.5-related) model flow by migrating the missing KV-cache
attachment logic.

Problem

In REC multi-round mode, the model forward path did not fully attach multi-round cache tensors (full_k/v and unshared_k/v) into
attention metadata for affected model implementations.
This could break or degrade multi-round behavior in xattention.

Changes

Extend REC model-type detection to include qwen3_moe.
In LlmModelImplBase forward:
- Read LlmRecMultiRoundParams only when REC multi-round mode is enabled and params are present.
- Add per-layer size checks for full_k_caches, full_v_caches, unshared_k_caches, and unshared_v_caches.
- Attach the corresponding cache tensors into attn_metadata before each layer forward.
Apply the same REC multi-round cache attachment logic in Qwen3MoeModelImpl forward.

Impact

Enables correct multi-round KV-cache wiring for xattention in the affected REC path.
No behavior change for non-REC-multi-round execution paths.

Files Changed

xllm/core/common/rec_model_utils.h
xllm/models/llm/llm_model_base.h
xllm/models/llm/qwen3_moe.h

Notes

This PR focuses on wiring/attachment correctness for multi-round REC cache metadata.
Follow-up cleanups (for example, redundant null checks) can be done separately to keep this bugfix focused.

gemini-code-assist

Code Review

The pull request introduces support for qwen3_moe models in the is_llmrec_model_type function and integrates multi-round caching parameters (LlmRecMultiRoundParams) into the LlmModelImplBase and Qwen3MoeModelImpl forward passes. This change is crucial for enabling multi-round core functionality in xattention for these models. The added includes and logic for handling llmrec_params are appropriate for the stated bugfix.

fix: migrate pr37 rec multi-round kv cache attachment

b40ce9f

DragonFive requested review from DongheJin, JimHsiung, RobbieLeung, XuZhang99, liutongxuan, walsonyang and yq33victor as code owners March 4, 2026 02:07

gemini-code-assist Bot reviewed Mar 4, 2026

View reviewed changes

Comment thread xllm/models/llm/llm_model_base.h Outdated

Comment thread xllm/models/llm/llm_model_base.h

Comment thread xllm/models/llm/qwen3_moe.h Outdated

Comment thread xllm/models/llm/qwen3_moe.h

fix: remove redundant llmrec_params null checks in rec forward paths.

d0fffd8

XuZhang99 approved these changes Mar 5, 2026

View reviewed changes

XuZhang99 reviewed Mar 5, 2026

View reviewed changes

Comment thread xllm/models/llm/llm_model_base.h

walsonyang approved these changes Mar 5, 2026

View reviewed changes

DragonFive merged commit 9c805b8 into jd-opensource:main Mar 7, 2026
74 of 99 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bugfix: fix mbox model(qwen2.5) multi round core in xattention#988

bugfix: fix mbox model(qwen2.5) multi round core in xattention#988
DragonFive merged 2 commits intojd-opensource:mainfrom
DragonFive:feat/migrate-pr37-main

DragonFive commented Mar 4, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

DragonFive commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Changes

Impact

Files Changed

Notes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DragonFive commented Mar 4, 2026 •

edited

Loading