[Feat] Add layerwise load-ahead window by dante159753 · Pull Request #952 · ModelEngine-Group/unified-cache-management

dante159753 · 2026-05-09T03:56:12Z

Summary

Add configurable layerwise_load_ahead for vLLM layerwise KV loading.
Keep the default at 1 to preserve the existing one-layer lookahead behavior.
Refill the load window by local layer order after each layer wait, avoiding duplicate submissions.
Document the tuning option in the example config and PipelineStore guide.

Validation

Manually ran the new scheduling tests because local Python does not have pytest installed.
Ran python -m compileall ucm\integration\vllm\ucm_connector.py ucm\integration\vllm\tests\test_layerwise_load_ahead.py.
Commit hooks passed: codespell, black, isort.

Notes

layerwise_load_ahead=1 preserves the previous behavior. Values like 2 or 4 can improve overlap when layer load latency is higher than per-layer inference latency, but should be tuned with Cache Store queue, host buffer, and H2D stream pressure in mind.

ygwpz · 2026-05-09T06:35:02Z

+
+
+def _make_metadata(*, include_failing_request: bool = False):
+    request_meta = {


Missing test coverage: no test for load_ahead > total_layer_count. What behavior is expected when the window exceeds available layers? Consider adding a test case for this edge condition.

ygwpz · 2026-05-09T06:35:03Z

+    connector.kv_cache_layout = FakeKVCacheLayout(len(connector.layer_ids))
+    connector.tp_rank = 0
+    connector.tp_size = 1
+    connector.is_mla = False


Missing test for validation errors: _get_layerwise_load_ahead should raise ValueError for negative or non-integer values, but there's no test verifying this behavior.

ygwpz · 2026-05-09T06:35:05Z

@@ -836,6 +882,8 @@ def wait_for_layer_load(self, layer_name: str) -> None:
            return


The should_refill_window check adds to _waited_load_layers before popping from load_tasks. If the same layer is revisited (e.g., rollback or MTP paths), this could cause issues. Should the check happen after the pop?

ygwpz · 2026-05-09T06:35:06Z

+    def _submit_next_load_layers(
+        self, metadata: "UCMConnectorMetadata", count: int
+    ) -> None:
+        submitted_count = 0


The _submit_next_load_layers loop condition self._next_load_layer_index < len(self.layer_ids) stops silently when index exceeds. Should there be a debug log or assertion to clarify expected behavior?

Add layerwise load-ahead window

91a5ae1

dante159753 requested review from FangRun2, Infinite666, Tarrei, flesher0813, harrisonyhq, mag1c-h, qyh111 and ygwpz as code owners May 9, 2026 03:56

ygwpz reviewed May 9, 2026

View reviewed changes

dante159753 changed the title ~~[codex] Add layerwise load-ahead window~~ [Feat] Add layerwise load-ahead window May 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Add layerwise load-ahead window#952

[Feat] Add layerwise load-ahead window#952
dante159753 wants to merge 1 commit into
developfrom
layerwise-load-ahead

dante159753 commented May 9, 2026

Uh oh!

ygwpz May 9, 2026

Uh oh!

ygwpz May 9, 2026

Uh oh!

ygwpz May 9, 2026

Uh oh!

ygwpz May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants



		def _make_metadata(*, include_failing_request: bool = False):
		request_meta = {

		@@ -836,6 +882,8 @@ def wait_for_layer_load(self, layer_name: str) -> None:
		return

Conversation

dante159753 commented May 9, 2026

Summary

Validation

Notes

Uh oh!

ygwpz May 9, 2026

Choose a reason for hiding this comment

Uh oh!

ygwpz May 9, 2026

Choose a reason for hiding this comment

Uh oh!

ygwpz May 9, 2026

Choose a reason for hiding this comment

Uh oh!

ygwpz May 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants