Skip to content

Conversation

@noooop
Copy link
Collaborator

@noooop noooop commented Dec 9, 2025

Purpose

Allow users to modify part of the scheduler configuration online, which will greatly simplify the benchmark process.

e.g.

vllm bench sweep only needs to be started once

cc @DarkLight1337 @lengrongfu

Test Plan

benchmark demo:
offline: https://github.com/noooop/snippet/blob/main/benchmarks/embed5/v1_offline.py
online: https://github.com/noooop/snippet/blob/main/benchmarks/embed5/v1_online.py

Test Result

nan

Known Issues

cudagraph_capture_sizes are different, the results will be slightly different.

  • [1, 2]
  • [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256]
image
Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces functionality to dynamically reconfigure scheduler parameters (max_num_seqs and max_num_batched_tokens) online. This is a valuable addition for benchmarking and dynamic resource management. The changes involve defining a SchedulerReconfigure data structure, adding reconfigure_scheduler methods across the engine components (LLM, EngineCoreClient, EngineCore), and implementing the reconfigure method in the Scheduler class. I've identified a critical bug in the fallback logic for max_num_batched_tokens and a high-severity design constraint regarding the modification limits.

noooop and others added 2 commits December 9, 2025 16:25
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
@noooop noooop changed the title [Frontend] Allow users to modify part of the scheduler configuration online. [Frontend] Allow users to modify the scheduler configuration online. Dec 9, 2025
@DarkLight1337
Copy link
Member

Supporting this will introduce some constraints on the scheduler since it cannot assume that these parameters are constant anymore. That being said, I definitely see the value of being able to adjust the scheduling parameters on the fly as we don't have to restart the server each time the parameters change during benchmarking.

@WoosukKwon @njhill WDYT about this?

@noooop noooop changed the title [Frontend] Allow users to modify the scheduler configuration online. [Frontend] Allow users to modify the scheduler configuration online in dev mode. Dec 9, 2025
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
@mergify
Copy link

mergify bot commented Dec 11, 2025

Hi @noooop, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
raise NotImplementedError

def reconfigure(
self, max_num_seqs: int | None, max_num_batched_tokens: int | None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this parameter be designed to be more universal? If other parameters need to be modified in the future, there's no need to add new parameters. It's suggested to pass a structure or dic instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also agree that passing a structure or dict would be better. Initially, the first version created a structure, but since there are only two parameters that can be modified, adding a structure just for that would make the code overly verbose. That’s why it was changed to the current approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants