[feat] Support UCM hybrid KV cache connector for mixed Mamba/Attention block layouts by wangwenxin0312 · Pull Request #957 · ModelEngine-Group/unified-cache-management

wangwenxin0312 · 2026-05-13T13:36:24Z

Purpose

Add UCM support for vLLM hybrid/HMA KV cache layouts, including group-aware block hashing, external cache lookup, and load/dump metadata generation for mixed full-attention and sliding-window cache groups.

Modifications

Add per-group request metadata for hybrid KV cache blocks.
Introduce KVCacheGroupManager to parse kv_cache_groups, derive per-group hash chains, and validate group block/window alignment.
Implement two-stage external cache lookup:
- full-attention prefix lookup via lookup_on_prefix
- sliding-window tail verification via lookup
Add HMAKVCacheLayout to build cache memory layout from kv_cache_tensors and per-layer KV cache specs.
Add UCMHMAConnector for scheduler-side hit detection and worker-side KV cache registration.
Generate HMA dispatch metadata by flattening per-group load/dump blocks into the existing UCM store API format.
Route hybrid KV cache manager configurations through the HMA connector.

Test

export ENABLE_UCM_PATCH=1
export MODEL_PATH="/home/models/Qwen3-Next-80B-A3B-Instruct/"
python unified-cache-management/examples/offline_inference.py

ygwpz · 2026-05-14T06:02:39Z

+      ``request.all_token_ids`` with group ``gid``'s own block size and
+      chain seed. ``group_ucm_block_ids[full_attn_group_id]`` equals the
+      inherited ``ucm_block_ids``.
+    - ``group_vllm_block_ids[gid]``: per-group VLLM physical block ids; this


💡 Note: TODO/FIXME marker found. Consider addressing before merge or creating a tracked issue.

qyh111 and others added 4 commits May 13, 2026 21:24

Adapt deepseek v4

0ed1922

[Feat]Adapt Deepseek-V4-Flash on ascend and cuda

0cf6f50

hybrid adapt

7b22117

patch update

05fe35f

wangwenxin0312 requested review from Infinite666, harrisonyhq, mag1c-h, qyh111 and ygwpz as code owners May 13, 2026 13:36

clean fix

bebfa45

ygwpz reviewed May 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] Support UCM hybrid KV cache connector for mixed Mamba/Attention block layouts#957

[feat] Support UCM hybrid KV cache connector for mixed Mamba/Attention block layouts#957
wangwenxin0312 wants to merge 5 commits into
ModelEngine-Group:developfrom
wangwenxin0312:dev_hybrid_pr_new

wangwenxin0312 commented May 13, 2026

Uh oh!

ygwpz May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wangwenxin0312 commented May 13, 2026

Purpose

Modifications

Test

Uh oh!

ygwpz May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants