Skip to content

Commit 7bfc3d0

Browse files
committed
[None][bugfix] Restore _get_window_size_to_layers method on KVCacheManager
The method was removed during the VSWA extraction refactor but is still called by disaggregated serving code (kv_extractor, test_mamba_transfer). Re-add it as a thin wrapper around the extracted standalone function. Signed-off-by: Yueh-Ting Chen <yueh.ting.chen@gmail.com>
1 parent 348cff3 commit 7bfc3d0

1 file changed

Lines changed: 8 additions & 1 deletion

File tree

tensorrt_llm/_torch/pyexecutor/resource_manager/kv_cache_manager.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,11 @@
4141
from ..scheduler import ScheduledRequests
4242
from .base import BaseResourceManager, request_context
4343
from .kv_cache_spec_ops import _update_kv_cache_draft_token_location, get_pp_layers
44-
from .vswa import calculate_max_num_blocks_for_vswa, validate_and_adjust_attention_windows
44+
from .vswa import (
45+
calculate_max_num_blocks_for_vswa,
46+
get_window_size_to_layers,
47+
validate_and_adjust_attention_windows,
48+
)
4549

4650
if TYPE_CHECKING:
4751
from tensorrt_llm._torch.attention_backend.interface import AttentionMetadata
@@ -662,6 +666,9 @@ def get_cache_bytes_per_token(self):
662666
)
663667
return cache_size_bytes_per_token
664668

669+
def _get_window_size_to_layers(self) -> Dict[int, List[int]]:
670+
return get_window_size_to_layers(self.max_attention_window_vec, self.num_local_layers)
671+
665672
def calculate_max_num_blocks(
666673
self,
667674
kv_cache_config: KvCacheConfig,

0 commit comments

Comments
 (0)