-
-
Notifications
You must be signed in to change notification settings - Fork 12k
[Frontend] Allow users to modify the scheduler configuration online in dev mode. #30316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request introduces functionality to dynamically reconfigure scheduler parameters (max_num_seqs and max_num_batched_tokens) online. This is a valuable addition for benchmarking and dynamic resource management. The changes involve defining a SchedulerReconfigure data structure, adding reconfigure_scheduler methods across the engine components (LLM, EngineCoreClient, EngineCore), and implementing the reconfigure method in the Scheduler class. I've identified a critical bug in the fallback logic for max_num_batched_tokens and a high-severity design constraint regarding the modification limits.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: wang.yuqi <noooop@126.com>
|
Supporting this will introduce some constraints on the scheduler since it cannot assume that these parameters are constant anymore. That being said, I definitely see the value of being able to adjust the scheduling parameters on the fly as we don't have to restart the server each time the parameters change during benchmarking. @WoosukKwon @njhill WDYT about this? |
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
Hi @noooop, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
| raise NotImplementedError | ||
|
|
||
| def reconfigure( | ||
| self, max_num_seqs: int | None, max_num_batched_tokens: int | None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this parameter be designed to be more universal? If other parameters need to be modified in the future, there's no need to add new parameters. It's suggested to pass a structure or dic instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also agree that passing a structure or dict would be better. Initially, the first version created a structure, but since there are only two parameters that can be modified, adding a structure just for that would make the code overly verbose. That’s why it was changed to the current approach.
Purpose
Allow users to modify part of the scheduler configuration online, which will greatly simplify the benchmark process.
e.g.
vllm bench sweep only needs to be started once
cc @DarkLight1337 @lengrongfu
Test Plan
benchmark demo:
offline: https://github.com/noooop/snippet/blob/main/benchmarks/embed5/v1_offline.py
online: https://github.com/noooop/snippet/blob/main/benchmarks/embed5/v1_online.py
Test Result
nan
Known Issues
cudagraph_capture_sizes are different, the results will be slightly different.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.