Skip to content

[None][refactor] Modularize resource_manager.py into a package#12883

Draft
eopXD wants to merge 7 commits intoNVIDIA:mainfrom
eopXD:yuehtingc/modularize-resource-manager
Draft

[None][refactor] Modularize resource_manager.py into a package#12883
eopXD wants to merge 7 commits intoNVIDIA:mainfrom
eopXD:yuehtingc/modularize-resource-manager

Conversation

@eopXD
Copy link
Copy Markdown
Collaborator

@eopXD eopXD commented Apr 9, 2026

Summary

  • Split the 2,956-line monolithic resource_manager.py into a resource_manager/ package with 7 focused submodules
  • All 36+ existing importers work unchanged via __init__.py re-exports — zero external changes required
  • Separates v1 (KVCacheManager) and v2 (KVCacheManagerV2) into independent files, reducing merge conflict surface
  • Extracts VSWA calculation utilities, spec-dec KV relocation ops, PeftCacheManager, and simple managers into dedicated modules

Module breakdown

Module Contents Lines
base.py BaseResourceManager ABC, ResourceManager coordinator, enums ~150
kv_cache_manager.py KVCacheManager (v1, C++ binding-backed) ~910
kv_cache_manager_v2.py KVCacheManagerV2 (Python runtime-backed) ~1000
vswa.py Variable Sliding Window Attention calculation utilities ~350
kv_cache_spec_ops.py Spec-dec × KV-cache cross-cutting operations ~140
peft_cache_manager.py PeftCacheManager for LoRA adapters ~170
simple_managers.py SlotManager, BlockManager ~140
__init__.py Full re-exports (backward compat) ~60

Motivation

See docs/rfcs/resource-manager-modularization.md in this repo for the full RFC with problem analysis, design rationale, and migration strategy.

Test plan

  • All 8 new files pass Python syntax validation (py_compile)
  • All relative import depths verified correct (4-dot for tensorrt_llm, 3-dot for _torch, 2-dot for pyexecutor)
  • Every symbol imported by the 36+ external consumers is re-exported in __init__.py
  • CacheTypeCpp and DataType binding aliases re-exported for mamba_cache_manager.py compatibility
  • Full CI validation needed

Split the 2,956-line monolithic resource_manager.py into a
resource_manager/ package with focused submodules:

- base.py: BaseResourceManager ABC, ResourceManager coordinator, enums
- kv_cache_manager.py: KVCacheManager (v1, C++ binding-backed)
- kv_cache_manager_v2.py: KVCacheManagerV2 (Python runtime-backed)
- vswa.py: Variable Sliding Window Attention calculation utilities
- kv_cache_spec_ops.py: Spec-dec x KV-cache cross-cutting operations
- peft_cache_manager.py: PeftCacheManager for LoRA adapters
- simple_managers.py: SlotManager and BlockManager utilities
- __init__.py: Full re-exports for backward compatibility

All 36+ existing importers work unchanged. No runtime behavior changes.

Signed-off-by: Yueh-Ting Chen <yueh.ting.chen@gmail.com>
eopXD added 5 commits April 9, 2026 15:10
…ation

- Fix P0: Restore `mpi_rank() == 0` check (was incorrectly changed to
  `mpi_disabled()`) for KV cache event manager creation on rank 0
- Fix P0: Remove stale `model_config=` kwarg in vswa.py call to
  `adjust_window_sizes_for_vswa` (would cause TypeError at runtime)
- Fix P1: Update copyright year to 2026 on all new files
- Fix P1: Remove `_locate_accepted_draft_tokens` from __init__.py
  re-exports (private helper, no external consumers)

Signed-off-by: Yueh-Ting Chen <yueh.ting.chen@gmail.com>
Signed-off-by: Yueh-Ting Chen <yueh.ting.chen@gmail.com>
Add DecodingBaseConfig and AttentionMetadata under TYPE_CHECKING to
fix F821 (undefined name) errors in kv_cache_manager.py and
kv_cache_manager_v2.py.

Signed-off-by: Yueh-Ting Chen <yueh.ting.chen@gmail.com>
Break long log/error message strings to comply with 120-char line
limit enforced by ruff in CI.

Signed-off-by: Yueh-Ting Chen <yueh.ting.chen@gmail.com>
These files are not in the legacy-files list, so CI runs ruff-format
instead of yapf. Apply ruff-format as the authoritative formatter.

Signed-off-by: Yueh-Ting Chen <yueh.ting.chen@gmail.com>
@eopXD
Copy link
Copy Markdown
Collaborator Author

eopXD commented Apr 9, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42499 [ run ] triggered by Bot. Commit: 348cff3 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42499 [ run ] completed with state SUCCESS. Commit: 348cff3
/LLM/main/L0_MergeRequest_PR pipeline #33246 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

…nager

The method was removed during the VSWA extraction refactor but is still
called by disaggregated serving code (kv_extractor, test_mamba_transfer).
Re-add it as a thin wrapper around the extracted standalone function.

Signed-off-by: Yueh-Ting Chen <yueh.ting.chen@gmail.com>
@eopXD eopXD force-pushed the yuehtingc/modularize-resource-manager branch from 3e6802d to 7bfc3d0 Compare April 10, 2026 03:25
@eopXD
Copy link
Copy Markdown
Collaborator Author

eopXD commented Apr 10, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42646 [ run ] triggered by Bot. Commit: 7bfc3d0 Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants