Skip to content

[Feat]Adapt Deepseek-v4-Flash on cuda and ascend#950

Open
qyh111 wants to merge 2 commits into
developfrom
dev-deepseek-v4
Open

[Feat]Adapt Deepseek-v4-Flash on cuda and ascend#950
qyh111 wants to merge 2 commits into
developfrom
dev-deepseek-v4

Conversation

@qyh111
Copy link
Copy Markdown
Contributor

@qyh111 qyh111 commented May 9, 2026

Purpose

Modifications

Test

@qyh111 qyh111 force-pushed the dev-deepseek-v4 branch from af24a14 to d80e601 Compare May 9, 2026 06:34
@qyh111 qyh111 changed the title Dev deepseek v4 [Feat]Adapt Deepseek-v4-Flash on cuda and ascend May 9, 2026
"""

group_ucm_block_ids: list[list[bytes]] = field(default_factory=list)
group_vllm_block_ids: list[list[int]] = field(default_factory=list)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This _build_layout method duplicates significant logic from HMAKVCacheLayout._build_layout. Consider extracting common ptr/tensor_size extraction into a helper method to reduce code duplication.

Comment thread ucm/store/cache/cc/load_queue.cc Outdated
for (size_t i = 0, offset = 0; i < number; i++) {
auto pHost = (void*)(((int8_t*)host) + offset);
auto pDevice = device[i];
if (!pDevice) { continue; }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if (!pDevice) check is good, but should pHost also be checked for null? The symmetry with dump_queue.cc would improve safety.

load_tok_end = total_hit_tokens
start_blk = load_tok_start // group.block_size
end_blk = load_tok_end // group.block_size
if start_blk >= end_blk:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For SW groups, load_tok_start = total_hit_tokens - group.sliding_window. Consider adding a comment explaining why this calculation ensures the SW window tail is loaded correctly on resume.



class AscendDSV4Layout(HMAKVCacheLayout):
def __init__(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AscendDSV4Layout duplicates significant logic from HMAKVCacheLayout._build_layout. The ptr/tensor_size extraction loop is nearly identical. Consider extracting into a shared helper method.

inherited ``ucm_block_ids``.
- ``group_vllm_block_ids[gid]``: per-group VLLM physical block ids; this
is initialised as an empty list per group here and populated later by
the dispatch path (still a TODO for HMA dump/load).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO indicates incomplete implementation for HMA dump/load. Should this be tracked as a separate issue or addressed before merge?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants