Skip to content

Commit 86cc700

Browse files
Update MiniMax MXFP4 benchmark to M2.5 with vLLM v0.17.1
- Model: amd/MiniMax-M2.1-MXFP4 → amd/MiniMax-M2.5-MXFP4 - Image: vllm/vllm-openai-rocm v0.16.0 → v0.17.1 - Rename config key and script from m2.1 to m2.5 - Update perf-changelog entry Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 42bb501 commit 86cc700

File tree

3 files changed

+14
-16
lines changed

3 files changed

+14
-16
lines changed

.github/configs/amd-master.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -384,10 +384,10 @@ minimaxm2.5-fp8-mi355x-vllm:
384384
- { tp: 2, conc-start: 4, conc-end: 64 }
385385
- { tp: 4, conc-start: 4, conc-end: 64 }
386386

387-
minimaxm2.1-fp4-mi355x-vllm:
388-
image: vllm/vllm-openai-rocm:v0.16.0
389-
model: amd/MiniMax-M2.1-MXFP4
390-
model-prefix: minimaxm2.1
387+
minimaxm2.5-fp4-mi355x-vllm:
388+
image: vllm/vllm-openai-rocm:v0.17.1
389+
model: amd/MiniMax-M2.5-MXFP4
390+
model-prefix: minimaxm2.5
391391
runner: mi355x
392392
precision: fp4
393393
framework: vllm

benchmarks/single_node/minimaxm2.1_fp4_mi355x.sh renamed to benchmarks/single_node/minimaxm2.5_fp4_mi355x.sh

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,6 @@ vllm serve $MODEL --port $PORT \
4242
--gpu-memory-utilization 0.95 \
4343
--max-model-len $MAX_MODEL_LEN \
4444
--block-size=32 \
45-
--disable-log-requests \
4645
--trust-remote-code > $SERVER_LOG 2>&1 &
4746

4847
SERVER_PID=$!

perf-changelog.yaml

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -856,7 +856,7 @@
856856
- "TP=8, concurrency 4-64 for 1k1k, 1k8k, and 8k1k sequence lengths"
857857
- "following https://docs.vllm.ai/projects/recipes/en/latest/moonshotai/Kimi-K2.5.html"
858858
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/839
859-
859+
860860
- config-keys:
861861
- dsr1-fp8-mi355x-sglang-disagg
862862
- dsr1-fp8-mi355x-sglang-disagg-mtp
@@ -888,7 +888,7 @@
888888
- "Enable SGLANG_ENABLE_FLASHINFER_GEMM=true, NCCL_NVLS_ENABLE=1"
889889
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/804
890890

891-
- config-keys:
891+
- config-keys:
892892
- qwen3.5-fp8-h200-sglang
893893
description:
894894
- "Add Qwen 3.5 FP8 H200 SGLang configuration"
@@ -918,7 +918,7 @@
918918
- "Redo qwen eval"
919919
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/892
920920
evals-only: true
921-
921+
922922
- config-keys:
923923
- gptoss-fp4-mi300x-vllm
924924
- gptoss-fp4-mi325x-vllm
@@ -931,7 +931,7 @@
931931
- "Switch to --attention-backend ROCM_AITER_UNIFIED_ATTN and add fuse_rope_kvcache compilation pass"
932932
- "Remove deprecated VLLM_ROCM_USE_AITER_UNIFIED_ATTENTION/VLLM_ROCM_USE_AITER_MHA env vars and compilation-config cudagraph_mode"
933933
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/867
934-
934+
935935
- config-keys:
936936
- kimik2.5-fp4-b200-vllm
937937
description:
@@ -970,7 +970,7 @@
970970
- "Replace old per-file recipes with resolved variants from consolidated 8k1k.yaml"
971971
- "14 variants: STP/MTP x low-latency/max-throughput with updated concurrencies and scale points"
972972
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/907
973-
973+
974974
- config-keys:
975975
- glm5-fp8-h200-sglang
976976
description:
@@ -981,12 +981,11 @@
981981
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/914
982982

983983
- config-keys:
984-
- minimaxm2.1-fp4-mi355x-vllm
984+
- minimaxm2.5-fp4-mi355x-vllm
985985
description:
986-
- "Add MiniMax M2.1 MXFP4 vLLM benchmark for MI355X"
987-
- "Model: amd/MiniMax-M2.1-MXFP4 with --trust-remote-code and --block-size=32"
988-
- "Image: vllm/vllm-openai-rocm:v0.16.0"
986+
- "Add MiniMax M2.5 MXFP4 vLLM benchmark for MI355X"
987+
- "Model: amd/MiniMax-M2.5-MXFP4 with --trust-remote-code and --block-size=32"
988+
- "Image: vllm/vllm-openai-rocm:v0.17.1"
989989
- "Environment: VLLM_ROCM_USE_AITER=1"
990-
- "TP=2 only (TP=4 disabled due to vLLM bug https://github.com/vllm-project/vllm/issues/35637)"
991-
- "Concurrency 4-64 for 1k1k, 1k8k, and 8k1k sequence lengths"
990+
- "TP=2 and TP=4, concurrency 4-64 for 1k1k, 1k8k, and 8k1k sequence lengths"
992991
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/827

0 commit comments

Comments
 (0)