Skip to content

bugfix: fix mbox model(qwen2.5) multi round core in xattention#988

Merged
DragonFive merged 2 commits intojd-opensource:mainfrom
DragonFive:feat/migrate-pr37-main
Mar 7, 2026
Merged

bugfix: fix mbox model(qwen2.5) multi round core in xattention#988
DragonFive merged 2 commits intojd-opensource:mainfrom
DragonFive:feat/migrate-pr37-main

Conversation

@DragonFive
Copy link
Copy Markdown
Collaborator

@DragonFive DragonFive commented Mar 4, 2026

Summary

This PR fixes the REC multi-round core path in xattention for the mbox (Qwen2.5-related) model flow by migrating the missing KV-cache
attachment logic.

Problem

In REC multi-round mode, the model forward path did not fully attach multi-round cache tensors (full_k/v and unshared_k/v) into
attention metadata for affected model implementations.
This could break or degrade multi-round behavior in xattention.

Changes

  • Extend REC model-type detection to include qwen3_moe.
  • In LlmModelImplBase forward:
    • Read LlmRecMultiRoundParams only when REC multi-round mode is enabled and params are present.
    • Add per-layer size checks for full_k_caches, full_v_caches, unshared_k_caches, and unshared_v_caches.
    • Attach the corresponding cache tensors into attn_metadata before each layer forward.
  • Apply the same REC multi-round cache attachment logic in Qwen3MoeModelImpl forward.

Impact

  • Enables correct multi-round KV-cache wiring for xattention in the affected REC path.
  • No behavior change for non-REC-multi-round execution paths.

Files Changed

  • xllm/core/common/rec_model_utils.h
  • xllm/models/llm/llm_model_base.h
  • xllm/models/llm/qwen3_moe.h

Notes

  • This PR focuses on wiring/attachment correctness for multi-round REC cache metadata.
  • Follow-up cleanups (for example, redundant null checks) can be done separately to keep this bugfix focused.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces support for qwen3_moe models in the is_llmrec_model_type function and integrates multi-round caching parameters (LlmRecMultiRoundParams) into the LlmModelImplBase and Qwen3MoeModelImpl forward passes. This change is crucial for enabling multi-round core functionality in xattention for these models. The added includes and logic for handling llmrec_params are appropriate for the stated bugfix.

Comment thread xllm/models/llm/llm_model_base.h Outdated
Comment thread xllm/models/llm/llm_model_base.h
Comment thread xllm/models/llm/qwen3_moe.h Outdated
Comment thread xllm/models/llm/qwen3_moe.h
Comment thread xllm/models/llm/llm_model_base.h
@DragonFive DragonFive merged commit 9c805b8 into jd-opensource:main Mar 7, 2026
74 of 99 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants