Skip to content

[plugin][OOT Benchmark] Refine OOT benchmark(manual trigger) to cover key models#409

Open
zejunchen-zejun wants to merge 10 commits intomainfrom
zejun/add_benchmark_3.24
Open

[plugin][OOT Benchmark] Refine OOT benchmark(manual trigger) to cover key models#409
zejunchen-zejun wants to merge 10 commits intomainfrom
zejun/add_benchmark_3.24

Conversation

@zejunchen-zejun
Copy link
Contributor

@zejunchen-zejun zejunchen-zejun commented Mar 25, 2026

Refine OOT benchmark with following points:

  • only manual trigger OOT benchmark, no schedule, no nightly
  • benchmark kimi(TP8 TP4), gptoss(TP1), qwen3.5(TP8), ds-fp8(TP8) and ds-mxfp4(TP8)
  • benchmark concurrency 4 8 16 32 64
  • benchmark 1k/1k, 8k/1k, 1k/8k
  • default all False for all models
  • 选择main branch发起OOT benchmark,默认直接拉取最新的OOT docker进行benchmark
  • 选择main branch发起OOT benchmark,并且指定了某个OOT docker,直接拉取该docker进行benchmark
  • 选择非main branch发起OOT benchmark,自动基于该branch构建OOT docker,随后进行benchmark,数据不上传到dashboard
image

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
change to manual trigger
align env and arguments
choice box default false

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
Copilot AI review requested due to automatic review settings March 25, 2026 09:24
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refines the manual OOT vLLM benchmark workflow to target a curated set of key models and benchmark parameter combinations, while switching from building a custom OOT image in-workflow to pulling a prebuilt “latest” image.

Changes:

  • Make the OOT benchmark workflow manual-only with model toggles defaulting to false, and add an oot_image input to pull a prebuilt benchmark image.
  • Change the benchmark execution from an in-job loop over param_lists to a full job matrix over (model × params) and generate per-config artifacts.
  • Update the OOT model config list to adjust env vars and add Qwen3.5-397B-A17B-FP8.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
.github/workflows/atom-vllm-oot-benchmark.yaml Switch to pulling a prebuilt OOT image; add param matrix expansion; default-disable models for manual selection.
.github/benchmark/oot_benchmark_models.json Update env vars for existing models and add the Qwen3.5 FP8 model entry.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

will not be dispatched

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
@wuhuikx
Copy link
Contributor

wuhuikx commented Mar 25, 2026

  1. for each concurrency, we need to re-launch the container, do you follow this instruction?
  2. warmup num = 2x con
  3. prompt num = 10x con

Ensure each model can be triggered separately. For example, when we have optimization on DS, we only need to refresh data on this model while keep the others silent to save hardware resource.

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
Copilot AI review requested due to automatic review settings March 25, 2026 09:57
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@zejunchen-zejun
Copy link
Contributor Author

  1. for each concurrency, we need to re-launch the container, do you follow this instruction?
  2. warmup num = 2x con
  3. prompt num = 10x con

Ensure each model can be triggered separately. For example, when we have optimization on DS, we only need to refresh data on this model while keep the others silent to save hardware resource.

  1. followed in this PR
  2. followed, set the warm up steps: --num-warmups=\"$(( 2 * CONC ))\" \
  3. followed, set the promopt steps: --num-prompts=\"$(( CONC * 10 ))\" \

@wuhuikx
Copy link
Contributor

wuhuikx commented Mar 26, 2026

One more thing, we need to make benchmark running on different atom branch for vllm upgrading. Do we have the function now for manually selecting the branch?

@zejunchen-zejun
Copy link
Contributor Author

One more thing, we need to make benchmark running on different atom branch for vllm upgrading. Do we have the function now for manually selecting the branch?

make sense, we need it can also do acceptance test. Let me add it

for acceptance test when upgrading vLLM

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
Copilot AI review requested due to automatic review settings March 26, 2026 03:50
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
Copilot AI review requested due to automatic review settings March 26, 2026 04:14
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants