Skip to content

[feat] Support UCM hybrid KV cache connector for mixed Mamba/Attention block layouts#957

Open
wangwenxin0312 wants to merge 5 commits into
ModelEngine-Group:developfrom
wangwenxin0312:dev_hybrid_pr_new
Open

[feat] Support UCM hybrid KV cache connector for mixed Mamba/Attention block layouts#957
wangwenxin0312 wants to merge 5 commits into
ModelEngine-Group:developfrom
wangwenxin0312:dev_hybrid_pr_new

Conversation

@wangwenxin0312
Copy link
Copy Markdown
Contributor

Purpose

Add UCM support for vLLM hybrid/HMA KV cache layouts, including group-aware block hashing, external cache lookup, and load/dump metadata generation for mixed full-attention and sliding-window cache groups.

Modifications

  • Add per-group request metadata for hybrid KV cache blocks.
  • Introduce KVCacheGroupManager to parse kv_cache_groups, derive per-group hash chains, and validate group block/window alignment.
  • Implement two-stage external cache lookup:
    • full-attention prefix lookup via lookup_on_prefix
    • sliding-window tail verification via lookup
  • Add HMAKVCacheLayout to build cache memory layout from kv_cache_tensors and per-layer KV cache specs.
  • Add UCMHMAConnector for scheduler-side hit detection and worker-side KV cache registration.
  • Generate HMA dispatch metadata by flattening per-group load/dump blocks into the existing UCM store API format.
  • Route hybrid KV cache manager configurations through the HMA connector.

Test

export ENABLE_UCM_PATCH=1
export MODEL_PATH="/home/models/Qwen3-Next-80B-A3B-Instruct/"
python unified-cache-management/examples/offline_inference.py

``request.all_token_ids`` with group ``gid``'s own block size and
chain seed. ``group_ucm_block_ids[full_attn_group_id]`` equals the
inherited ``ucm_block_ids``.
- ``group_vllm_block_ids[gid]``: per-group VLLM physical block ids; this
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Note: TODO/FIXME marker found. Consider addressing before merge or creating a tracked issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants