Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
980d610
chore: agentic benchmark infrastructure (v0.1)
cquil11 Apr 27, 2026
9b12096
cleanup
cquil11 Apr 28, 2026
2a420e3
agentic: rename USERS/users → CONC/conc throughout
cquil11 Apr 28, 2026
a1108f9
bump trace-replay: kimi tokenizer + reasoning support
cquil11 Apr 28, 2026
fab6d72
agentic: add gptoss + kimik2.5 single-node launchers
cquil11 Apr 28, 2026
3d42c64
agentic: add pareto-plot analysis tooling + extra Python deps
cquil11 Apr 28, 2026
63d01df
configs: add agentic-coding sections for kimik2.5 + gptoss
cquil11 Apr 28, 2026
6ec4af2
runners: thread SCENARIO_SUBDIR through B200/B300 dispatch
cquil11 Apr 29, 2026
f587b37
agentic: add launchers + master configs for 4 model families on B200/…
cquil11 Apr 29, 2026
45cf5a1
agentic: add mi355x launchers for minimaxm2.5/qwen3.5/glm5.1/kimik2.5
cquil11 Apr 29, 2026
c5969c5
agentic: add b200 launchers for gptoss-fp4, kimik2.5-int4, minimaxm2.…
cquil11 Apr 29, 2026
04a1ade
agentic: add qwen3.5-fp8-b200-sglang variant (bf16 image is buggy)
cquil11 Apr 29, 2026
8663191
docs: add agentic trace replayer test coverage map
cquil11 Apr 29, 2026
9b69e44
docs: add agentic trace replayer coverage test results
cquil11 Apr 29, 2026
8af1760
docs: finalize agentic trace replayer test results
cquil11 Apr 29, 2026
43f8da1
fix(agentic): collect_sweep_results regex matches actual offload values
cquil11 Apr 29, 2026
d6a5904
agentic: expand sweep configs for the 10 verified models
cquil11 Apr 29, 2026
ae222b4
runners(b200-dgxc): SLURM-exclude gpu-10/gpu-15 (stuck CUDA + full fs)
cquil11 Apr 29, 2026
b221c0d
agentic: --disable-hybrid-kv-cache-manager when OFFLOADING=cpu
cquil11 Apr 29, 2026
a3fad54
agentic-coding: bump vllm-openai images to v0.19.1 for cpu-offload co…
cquil11 Apr 30, 2026
869152b
agentic: minimax-fp8 sweep across all 6 SKUs
cquil11 Apr 30, 2026
5a15cae
agentic minimax-fp8: drop tp=8, follow fixed-seq-len TPs
cquil11 Apr 30, 2026
83fa3a7
agentic minimax-fp8: trim conc to creep up to per-SKU compute ceiling
cquil11 Apr 30, 2026
68439f7
agentic minimax-fp8: cliff-dense conc ladders (v4)
cquil11 Apr 30, 2026
9817524
agentic minimax: AMD native cpu offload + b300-p1 runner
cquil11 May 1, 2026
f9f0464
agentic: drop --no-enable-prefix-caching from all launchers
cquil11 May 1, 2026
8a56769
agentic minimax mi300x/mi355x: switch attention backend to UNIFIED_ATTN
cquil11 May 1, 2026
16d7c0c
agentic minimax b200/b300: extend none past KV cliff for fall-off demo
cquil11 May 1, 2026
689ef0e
agentic minimax-fp8-b300: revert to standard b300 runner tag
cquil11 May 1, 2026
e074201
agentic minimax-fp8-b300: bump cpu DRAM offload to 2.2 TB (B300 has p…
cquil11 May 1, 2026
041c3a3
agentic minimax-fp8-b300: dense conc 100-124 to resolve cpu offload d…
cquil11 May 1, 2026
373d5cc
agentic minimax-fp8-b200: bump cpu DRAM offload to 1.5 TB, target b20…
cquil11 May 1, 2026
d7f67d8
Merge remote-tracking branch 'origin/main' into chore/agentx-v0.1-tes…
cquil11 May 1, 2026
7235bc9
fix(matrix): drop duplicate agentic-coding loop from merge
cquil11 May 1, 2026
95fb189
agentic: dsv4-fp4 B200/B300 initial sweep + restore SCENARIO_SUBDIR o…
cquil11 May 1, 2026
77c069f
agentic dsv4-fp4: switch B200/B300 to official blog recipe layout (DP…
cquil11 May 1, 2026
66511c9
agentic dsv4-fp4: keep image at v0.20.0-cu130 (deepseekv4-cu130 not p…
cquil11 May 1, 2026
1a7c16c
agentic dsv4-fp4: drop cpu-offload sweep entries (HMA conflict at 1M)
cquil11 May 1, 2026
de08e9a
rm diable hma connector
cquil11 May 4, 2026
bcf8644
agentic dsv4-fp4: enable simple-offload + HMA, restore cpu-offload sweep
cquil11 May 4, 2026
8a3e851
runners(b200-dgxc): switch SLURM partition gpu -> gpu-2 (cluster re-p…
cquil11 May 4, 2026
dc16779
agentic dsv4-fp4: pre-divide kv_offloading_size by TP; cpu-only sweep
cquil11 May 4, 2026
5418020
Merge remote-tracking branch 'origin/main' into chore/agentx-v0.1-tes…
cquil11 May 4, 2026
7e0d5b2
agentic dsv4-fp4: align parallelism with fixed-seq-len; conditional o…
cquil11 May 4, 2026
4208910
agentic dsv4-fp4: enable lazy_offload to mitigate popleft_n assertion
cquil11 May 4, 2026
333a7c3
agentic dsv4-fp4: bump image to v0.20.1, revert to eager offload
cquil11 May 4, 2026
1f64bc3
agentic dsv4-fp4: revert to v0.20.0-cu130 + lazy_offload, scale max-n…
cquil11 May 5, 2026
3bac41a
Merge remote-tracking branch 'origin/main' into chore/agentx-v0.1-tes…
cquil11 May 5, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 50 additions & 62 deletions .github/configs/amd-master.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -239,6 +239,10 @@ qwen3.5-fp8-mi355x-sglang:
search-space:
- { tp: 2, ep: 2, conc-start: 4, conc-end: 32 }
- { tp: 4, ep: 1, conc-start: 32, conc-end: 256 }
agentic-coding:
- duration: 1800
search-space:
- { tp: 8, ep: 1, offloading: none, conc-list: [1, 2, 4, 8, 16, 32] }

qwen3.5-fp8-mi355x-sglang-mtp:
image: lmsysorg/sglang-rocm:v0.5.10rc0-rocm720-mi35x-20260414
Expand Down Expand Up @@ -327,27 +331,6 @@ qwen3.5-fp4-mi355x-sglang:
- { tp: 2, conc-start: 4, conc-end: 256 }
- { tp: 4, conc-start: 4, conc-end: 16 }

qwen3.5-fp4-mi355x-atom:
image: rocm/atom:rocm7.2.2_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.2.post
model: amd/Qwen3.5-397B-A17B-MXFP4
model-prefix: qwen3.5
runner: mi355x
precision: fp4
framework: atom
multinode: false
scenarios:
fixed-seq-len:
- isl: 1024
osl: 1024
search-space:
- { tp: 2, conc-start: 4, conc-end: 256 }
- { tp: 4, conc-start: 4, conc-end: 16 }
- isl: 8192
osl: 1024
search-space:
- { tp: 2, conc-start: 4, conc-end: 256 }
- { tp: 4, conc-start: 4, conc-end: 16 }

qwen3.5-fp8-mi300x-sglang:
image: lmsysorg/sglang:v0.5.10-rocm720-mi30x
model: Qwen/Qwen3.5-397B-A17B-FP8
Expand Down Expand Up @@ -399,13 +382,11 @@ glm5-fp8-mi355x-sglang-mtp:
- isl: 1024
osl: 1024
search-space:
- { tp: 4, conc-start: 4, conc-end: 128, spec-decoding: mtp }
- { tp: 8, conc-start: 4, conc-end: 8, spec-decoding: mtp }
- { tp: 8, conc-start: 4, conc-end: 64, spec-decoding: mtp }
- isl: 8192
osl: 1024
search-space:
- { tp: 4, conc-start: 4, conc-end: 128, spec-decoding: mtp }
- { tp: 8, conc-start: 4, conc-end: 8, spec-decoding: mtp }
- { tp: 8, conc-start: 4, conc-end: 64, spec-decoding: mtp }

glm5-fp8-mi355x-atom:
image: rocm/atom:rocm7.2.2_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.2.post
Expand All @@ -420,12 +401,10 @@ glm5-fp8-mi355x-atom:
- isl: 1024
osl: 1024
search-space:
- { tp: 4, conc-start: 4, conc-end: 256 }
- { tp: 8, conc-start: 4, conc-end: 256 }
- isl: 8192
osl: 1024
search-space:
- { tp: 4, conc-start: 4, conc-end: 256 }
- { tp: 8, conc-start: 4, conc-end: 256 }

glm5.1-fp4-mi355x-sglang:
Expand All @@ -448,6 +427,11 @@ glm5.1-fp4-mi355x-sglang:
search-space:
- { tp: 2, conc-start: 4, conc-end: 256 }
- { tp: 4, conc-start: 4, conc-end: 16 }
agentic-coding:
- duration: 1800
search-space:
# sglang manages KV eviction; mi355x glm5.1 caps at tp=4 conc=16 in fixed-seq, so cap conservatively
- { tp: 4, offloading: none, conc-list: [1, 2, 4, 8, 16, 32] }

glm5.1-fp4-mi355x-atom:
image: rocm/atom:rocm7.2.2_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.2.post
Expand Down Expand Up @@ -526,7 +510,7 @@ kimik2.5-int4-mi300x-vllm:
- { tp: 8, conc-start: 4, conc-end: 64 }

kimik2.5-fp4-mi355x-vllm:
image: vllm/vllm-openai-rocm:v0.18.0
image: vllm/vllm-openai-rocm:v0.19.1
model: amd/Kimi-K2.5-MXFP4
model-prefix: kimik2.5
runner: mi355x
Expand All @@ -545,6 +529,13 @@ kimik2.5-fp4-mi355x-vllm:
search-space:
- { tp: 8, conc-start: 4, conc-end: 64 }
- { tp: 4, conc-start: 4, conc-end: 64 }
agentic-coding:
- duration: 1800
search-space:
- { tp: 4, offloading: none, conc-list: [1, 2, 4, 8, 16, 32, 64] }
- { tp: 8, offloading: none, conc-list: [1, 2, 4, 8, 16, 32, 64] }
- { tp: 4, offloading: cpu, conc-list: [64, 96, 128, 192, 256] }
- { tp: 8, offloading: cpu, conc-list: [64, 96, 128, 192, 256] }

kimik2.5-fp4-mi355x-atom:
image: rocm/atom:rocm7.2.1-ubuntu24.04-pytorch2.9.1-atom0.1.2
Expand All @@ -568,7 +559,7 @@ kimik2.5-fp4-mi355x-atom:
- { tp: 4, conc-start: 4, conc-end: 128 }

minimaxm2.5-fp8-mi355x-vllm:
image: vllm/vllm-openai-rocm:v0.19.0
image: vllm/vllm-openai-rocm:v0.19.1
model: MiniMaxAI/MiniMax-M2.5
model-prefix: minimaxm2.5
runner: mi355x
Expand All @@ -589,6 +580,14 @@ minimaxm2.5-fp8-mi355x-vllm:
- { tp: 2, ep: 2, conc-start: 2, conc-end: 256 }
- { tp: 4, ep: 4, conc-start: 4, conc-end: 512 }
- { tp: 8, ep: 8, conc-start: 2, conc-end: 2 }
agentic-coding:
# MI355X tp=4 ep=4: compute ceiling ~60 (empirical), KV cliff ~91 (analytical).
# Compute saturates first; cpu offload likely won't help, but worth confirming.
# AMD uses native OffloadingConnector (NOT SimpleCPUOffloadConnector).
- duration: 1800
search-space:
- { tp: 4, ep: 4, offloading: none, conc-list: [1, 2, 4, 8, 16, 32, 48, 56, 64, 72, 96] }
- { tp: 4, ep: 4, offloading: cpu, conc-list: [48, 56, 64, 72, 96] }

minimaxm2.5-fp8-mi355x-atom:
image: rocm/atom:rocm7.2.2_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.2.post
Expand All @@ -611,31 +610,6 @@ minimaxm2.5-fp8-mi355x-atom:
- { tp: 2, conc-start: 4, conc-end: 256 }
- { tp: 4, conc-start: 4, conc-end: 256 }

minimaxm2.5-fp4-mi355x-atom:
image: rocm/atom:rocm7.2.2_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.2.post
model: amd/MiniMax-M2.5-MXFP4
model-prefix: minimaxm2.5
runner: mi355x
precision: fp4
framework: atom
multinode: false
scenarios:
fixed-seq-len:
- isl: 1024
osl: 1024
search-space:
- { tp: 1, conc-start: 4, conc-end: 1024 }
- { tp: 2, conc-start: 4, conc-end: 1024 }
- { tp: 4, conc-start: 4, conc-end: 128 }
- { tp: 8, conc-start: 4, conc-end: 16 }
- isl: 8192
osl: 1024
search-space:
- { tp: 1, conc-start: 4, conc-end: 1024 }
- { tp: 2, conc-start: 4, conc-end: 1024 }
- { tp: 4, conc-start: 4, conc-end: 128 }
- { tp: 8, conc-start: 4, conc-end: 16 }

minimaxm2.5-fp4-mi355x-vllm:
image: vllm/vllm-openai-rocm:v0.19.1
model: amd/MiniMax-M2.5-MXFP4
Expand All @@ -660,7 +634,7 @@ minimaxm2.5-fp4-mi355x-vllm:
- { tp: 4, conc-start: 4, conc-end: 64 }

minimaxm2.5-fp8-mi300x-vllm:
image: vllm/vllm-openai-rocm:v0.16.0
image: vllm/vllm-openai-rocm:v0.19.1
model: MiniMaxAI/MiniMax-M2.5
model-prefix: minimaxm2.5
runner: mi300x
Expand All @@ -679,6 +653,14 @@ minimaxm2.5-fp8-mi300x-vllm:
search-space:
- { tp: 2, conc-start: 4, conc-end: 64 }
- { tp: 4, conc-start: 4, conc-end: 64 }
agentic-coding:
# MI300X tp=4: compute ceiling ~25 (estimated, between H100 and H200);
# KV cliff ~52. Compute saturates first.
# AMD uses native OffloadingConnector (NOT SimpleCPUOffloadConnector).
- duration: 1800
search-space:
- { tp: 4, offloading: none, conc-list: [1, 2, 4, 8, 16, 20, 24, 28, 32, 40, 48] }
- { tp: 4, offloading: cpu, conc-list: [16, 20, 24, 28, 32] }

minimaxm2.5-fp8-mi325x-vllm:
image: vllm/vllm-openai-rocm:v0.18.0
Expand Down Expand Up @@ -1635,13 +1617,13 @@ dsv4-fp8-mi355x-vllm:
search-space:
- { tp: 8, conc-start: 1, conc-end: 1 }

# Day-0 single-sequence marker for DeepSeek-V4 on ATOM (ROCm/ATOM#650).
# PR1 of the ATOM DSv4 series still uses torch sparse-attention fallbacks
# that OOM once warmup/prefill batches multiple requests; keep CONC=1 until
# the AITER sparse-attention kernel / multi-request path lands upstream.
# --enforce-eager and ATOM_USE_TRITON_MOE=1 are required on gfx950. Image is
# the standard atom0.1.2.post MI355X base (matching qwen3.5-fp8-mi355x-atom);
# the DSv4 PR is overlaid at runtime by dsv4_fp4_mi355x_atom.sh at a pinned SHA.
# Day-0 single-sequence marker for DeepSeek-V4 on ATOM (ROCm/ATOM#650).
# PR1 of the ATOM DSv4 series — single-sequence only (kv_cache[:1,...]
# hardcode), --enforce-eager required, ATOM_USE_TRITON_MOE=1 required on
# gfx950. Image is the standard atom0.1.2.post MI355X base (matching
# qwen3.5-fp8-mi355x-atom); the DSv4 PR is overlaid at runtime by
# benchmarks/single_node/dsv4_fp4_mi355x_atom.sh at a pinned SHA. Sweep
# will expand once ATOM PR3 (multi-request) and PR4 (CUDAGraph) land.
dsv4-fp4-mi355x-atom:
image: rocm/atom:rocm7.2.2_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.2.post
model: deepseek-ai/DeepSeek-V4-Pro
Expand All @@ -1656,7 +1638,13 @@ dsv4-fp4-mi355x-atom:
osl: 1024
search-space:
- { tp: 8, ep: 1, conc-start: 1, conc-end: 1 }
- { tp: 8, ep: 1, conc-start: 4, conc-end: 4 }
- { tp: 8, ep: 1, conc-start: 16, conc-end: 16 }
- { tp: 8, ep: 1, conc-start: 32, conc-end: 32 }
- isl: 8192
osl: 1024
search-space:
- { tp: 8, ep: 1, conc-start: 1, conc-end: 1 }
- { tp: 8, ep: 1, conc-start: 4, conc-end: 4 }
- { tp: 8, ep: 1, conc-start: 16, conc-end: 16 }
- { tp: 8, ep: 1, conc-start: 32, conc-end: 32 }
Loading
Loading