Skip to content

[Feat] Add layerwise load-ahead window#952

Open
dante159753 wants to merge 1 commit into
developfrom
layerwise-load-ahead
Open

[Feat] Add layerwise load-ahead window#952
dante159753 wants to merge 1 commit into
developfrom
layerwise-load-ahead

Conversation

@dante159753
Copy link
Copy Markdown
Contributor

Summary

  • Add configurable layerwise_load_ahead for vLLM layerwise KV loading.
  • Keep the default at 1 to preserve the existing one-layer lookahead behavior.
  • Refill the load window by local layer order after each layer wait, avoiding duplicate submissions.
  • Document the tuning option in the example config and PipelineStore guide.

Validation

  • Manually ran the new scheduling tests because local Python does not have pytest installed.
  • Ran python -m compileall ucm\integration\vllm\ucm_connector.py ucm\integration\vllm\tests\test_layerwise_load_ahead.py.
  • Commit hooks passed: codespell, black, isort.

Notes

layerwise_load_ahead=1 preserves the previous behavior. Values like 2 or 4 can improve overlap when layer load latency is higher than per-layer inference latency, but should be tuned with Cache Store queue, host buffer, and H2D stream pressure in mind.



def _make_metadata(*, include_failing_request: bool = False):
request_meta = {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test coverage: no test for load_ahead > total_layer_count. What behavior is expected when the window exceeds available layers? Consider adding a test case for this edge condition.

connector.kv_cache_layout = FakeKVCacheLayout(len(connector.layer_ids))
connector.tp_rank = 0
connector.tp_size = 1
connector.is_mla = False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test for validation errors: _get_layerwise_load_ahead should raise ValueError for negative or non-integer values, but there's no test verifying this behavior.

@@ -836,6 +882,8 @@ def wait_for_layer_load(self, layer_name: str) -> None:
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The should_refill_window check adds to _waited_load_layers before popping from load_tasks. If the same layer is revisited (e.g., rollback or MTP paths), this could cause issues. Should the check happen after the pop?

def _submit_next_load_layers(
self, metadata: "UCMConnectorMetadata", count: int
) -> None:
submitted_count = 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _submit_next_load_layers loop condition self._next_load_layer_index < len(self.layer_ids) stops silently when index exceeds. Should there be a debug log or assertion to clarify expected behavior?

@dante159753 dante159753 changed the title [codex] Add layerwise load-ahead window [Feat] Add layerwise load-ahead window May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants