Skip to content

[Performance] [WIP] Relaxed mtp #411

Draft
haoyangli0109 wants to merge 1 commit intoROCm:mainfrom
haoyangli0109:lhy/relaxed_mtp1
Draft

[Performance] [WIP] Relaxed mtp #411
haoyangli0109 wants to merge 1 commit intoROCm:mainfrom
haoyangli0109:lhy/relaxed_mtp1

Conversation

@haoyangli0109
Copy link
Contributor

@haoyangli0109 haoyangli0109 commented Mar 25, 2026

python -m atom.entrypoints.openai_server \
  --model  /shareddata/amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4  -tp 8   --method mtp --num-speculative-tokens 3

MODEL=/shareddata/amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4
ISL=1024
OSL=1024
CONC=128
PORT=8000
RESULT_FILENAME=Deepseek-R1-result
python -m atom.benchmarks.benchmark_serving \
  --model=$MODEL --backend=vllm --base-url=http://localhost:$PORT \
  --dataset-name=random \
  --random-input-len=$ISL --random-output-len=$OSL \
  --random-range-ratio 0.8 \
  --num-prompts=$(( $CONC * 20 )) \
  --max-concurrency=$CONC \
  --request-rate=inf --ignore-eos \
  --save-result --percentile-metrics="ttft,tpot,itl,e2el" \

mtp=3, concurrency=1, inp=out=1000    
  original strict N=10,delta=0.5
TPOT/ms 3.89 3.31
gsm8k 0.94  0.945
mtp=3, concurrency=128, inp=out=1000    
  original strict N=10,delta=0.5
TPOT/ms 17.39 16.89
gsm8k 0.94  0.95

Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com>
@haoyangli0109 haoyangli0109 changed the title [WIP] relaxed mtp [WIP] [don't merge] add relaxed mtp Mar 25, 2026
@haoyangli0109 haoyangli0109 changed the title [WIP] [don't merge] add relaxed mtp [Performance] [WIP] Relaxed mtp Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant