Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
e27718b
fix(atom): clean up dsv4 pr2998 overlay
Oseltamivir May 2, 2026
b239475
fix(atom): add dsv4 skeleton overlay
Oseltamivir May 2, 2026
d09a1d9
Merge branch 'main' into dsv4-atom-pr2998-clean
Oseltamivir May 2, 2026
6bc2e56
fix(atom): add dsv4 skeleton overlay
Oseltamivir May 2, 2026
2e4fffc
fix(evals): stop dsv4 gsm8k generations
Oseltamivir May 2, 2026
a4c4f2a
fix(atom): bump dsv4 skeleton pin
Oseltamivir May 2, 2026
196f960
test(atom): retest dsv4 higher concurrency
Oseltamivir May 2, 2026
651f5b8
pr2998
Oseltamivir May 2, 2026
49bf655
Merge branch 'main' into dsv4-atom-pr2998-clean
Oseltamivir May 2, 2026
d5059f1
order
Oseltamivir May 2, 2026
13d6d81
Bump ATOM PR overlay for eval fix
Oseltamivir May 3, 2026
5d40b4a
Merge branch 'main' into dsv4-atom-pr2998-clean
Oseltamivir May 3, 2026
6ca3ade
Merge remote-tracking branch 'origin/main' into dsv4-atom-pr2998-clean
Oseltamivir May 3, 2026
51a5d5e
fix: update dsv4 atom overlay pins
Oseltamivir May 3, 2026
6923eaf
fix: pin aiter topk width api
Oseltamivir May 3, 2026
ad76bbd
fix: avoid decorated aiter signature guard
Oseltamivir May 3, 2026
929d63e
remove log
Oseltamivir May 4, 2026
4ac957e
eval
Oseltamivir May 4, 2026
5ac8972
Merge branch 'main' into dsv4-atom-pr2998-clean
Oseltamivir May 4, 2026
207d203
profile fix, full sweep
Oseltamivir May 4, 2026
b252d1f
fix: pin dsv4 atom repeat sync fix
Oseltamivir May 4, 2026
fd6b449
DIAG
Oseltamivir May 4, 2026
fb9422a
fix: update dsv4 atom fork pins
Oseltamivir May 4, 2026
bc97651
Merge branch 'main' into dsv4-atom-pr2998-clean
Oseltamivir May 4, 2026
0c5dfff
diag: expand dsv4 atom eval probes
Oseltamivir May 4, 2026
4b37d96
fix: lower dsv4 atom batched token budget
Oseltamivir May 4, 2026
2fecabe
poor eval
Oseltamivir May 5, 2026
f4c2d58
Merge branch 'main' into dsv4-atom-pr2998-clean
Oseltamivir May 5, 2026
a858c13
Merge branch 'main' into dsv4-atom-pr2998-clean
Oseltamivir May 5, 2026
3b7d402
fix: update dsv4 atom fork pin
Oseltamivir May 5, 2026
9bf6fcf
diagnostics: deepen dsv4 atom attention checks
Oseltamivir May 5, 2026
834fd4c
diagnostics: update dsv4 atom l0 trace ref
Oseltamivir May 5, 2026
b4edb6a
Bump AITER DSv4 sparse indexer pin
Oseltamivir May 5, 2026
39b4b26
Bump AITER small-M GEMM fix
Oseltamivir May 5, 2026
4b33d27
Bump AITER wo_b GEMM fix
Oseltamivir May 5, 2026
20cbf4a
Bump AITER DSv4 wo_b fix
Oseltamivir May 5, 2026
2b62936
Merge branch 'main' into dsv4-atom-pr2998-clean
Oseltamivir May 5, 2026
f415118
Bump AITER DSv4 wo_b CK dispatch
Oseltamivir May 6, 2026
45e5f95
Merge branch 'main' into dsv4-atom-pr2998-clean
Oseltamivir May 6, 2026
56cd89e
Bump ATOM DSv4 wo_b diagnostics
Oseltamivir May 6, 2026
48064f9
Use torch TP reduce for DSv4 ATOM
Oseltamivir May 6, 2026
d718993
Bump cleaned DSv4 ATOM ref
Oseltamivir May 6, 2026
6d498be
Bump DSv4 ATOM warmup reduce fix
Oseltamivir May 6, 2026
c69b3f5
Enable DSv4 ATOM prefill diagnostics
Oseltamivir May 6, 2026
5b7480a
Bump DSv4 ATOM sparse diagnostics
Oseltamivir May 6, 2026
ffaf4b7
Bump DSv4 AITER and ATOM diagnostic refs
Oseltamivir May 6, 2026
209ef79
Fix DSv4 ATOM diagnostic gating
Oseltamivir May 6, 2026
e789daf
Merge branch 'main' into dsv4-atom-pr2998-clean
Oseltamivir May 6, 2026
241da68
Bump DSv4 ATOM prefill reduce sync
Oseltamivir May 6, 2026
9d907f3
Merge branch 'main' into dsv4-atom-pr2998-clean
Oseltamivir May 6, 2026
6753a22
Bump ATOM DSv4 diagnostics ref
Oseltamivir May 6, 2026
95dabd3
Bump ATOM DSv4 TP reduce fix
Oseltamivir May 6, 2026
ed40381
Bump ATOM DSv4 diagnostic reduce input fix
Oseltamivir May 7, 2026
0acd25e
Merge branch 'main' into dsv4-atom-pr2998-clean
Oseltamivir May 7, 2026
e5ba547
Bump ATOM DSv4 FP32 TP reduce
Oseltamivir May 7, 2026
dfa1af9
Merge branch 'main' into dsv4-atom-pr2998-clean
Oseltamivir May 7, 2026
f9bcc34
Bump DSv4 ATOM diagnostic SHA
Oseltamivir May 7, 2026
5edc023
patch
Oseltamivir May 7, 2026
84dc6db
Test DSv4 wkv CKTile GEMM route
Oseltamivir May 7, 2026
b93026a
Update AITER PR2998 test pin
Oseltamivir May 7, 2026
acd2773
Merge branch 'main' into dsv4-atom-pr2998-clean
Oseltamivir May 7, 2026
8f63008
Point DSv4 ATOM to AITER wkv fix
Oseltamivir May 7, 2026
257d2a5
Update DSv4 AITER projection dispatch pin
Oseltamivir May 7, 2026
c363937
Update DSv4 shared w2 AITER pin
Oseltamivir May 7, 2026
510cef2
Update DSv4 shared w2 AITER kernel pin
Oseltamivir May 7, 2026
3edd6bb
Expand DSv4 ATOM diagnostics
Oseltamivir May 8, 2026
dca9aa3
Bump DSv4 ATOM diagnostics
Oseltamivir May 8, 2026
335b7cc
Use narrow DSv4 mHC diagnostics
Oseltamivir May 8, 2026
65102dc
Bump DSv4 ATOM mHC normalization fix
Oseltamivir May 8, 2026
7699a6c
Bump DSv4 ATOM HC diagnostics
Oseltamivir May 8, 2026
d371c20
Merge branch 'main' into dsv4-atom-pr2998-clean
Oseltamivir May 8, 2026
b9629d8
Bump DSv4 ATOM HC replay fix
Oseltamivir May 8, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 5 additions & 9 deletions .github/configs/amd-master.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1636,13 +1636,9 @@ dsv4-fp8-mi355x-vllm:
search-space:
- { tp: 8, conc-start: 1, conc-end: 1 }

# Day-0 single-sequence marker for DeepSeek-V4 on ATOM (ROCm/ATOM#650).
# PR1 of the ATOM DSv4 series still uses torch sparse-attention fallbacks
# that OOM once warmup/prefill batches multiple requests; keep CONC=1 until
# the AITER sparse-attention kernel / multi-request path lands upstream.
# --enforce-eager and ATOM_USE_TRITON_MOE=1 are required on gfx950. Image is
# the standard atom0.1.2.post MI355X base (matching qwen3.5-fp8-mi355x-atom);
# the DSv4 PR is overlaid at runtime by dsv4_fp4_mi355x_atom.sh at a pinned SHA.
# DeepSeek-V4 on ATOM using the updated atom0.1.2.post image. The launcher
# overlays ROCm/ATOM#650 only for DSv4 model registration/skeleton support,
# then overlays ROCm/aiter#2998 for sparse/indexer kernels.
dsv4-fp4-mi355x-atom:
image: rocm/atom:rocm7.2.2_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.2.post
model: deepseek-ai/DeepSeek-V4-Pro
Expand All @@ -1656,8 +1652,8 @@ dsv4-fp4-mi355x-atom:
- isl: 1024
osl: 1024
search-space:
- { tp: 8, ep: 1, conc-start: 1, conc-end: 1 }
- { tp: 8, ep: 1, conc-start: 4, conc-end: 128 }
- isl: 8192
osl: 1024
search-space:
- { tp: 8, ep: 1, conc-start: 1, conc-end: 1 }
- { tp: 8, ep: 1, conc-start: 4, conc-end: 64 }
22 changes: 10 additions & 12 deletions .github/workflows/claude.yml
Original file line number Diff line number Diff line change
Expand Up @@ -161,8 +161,8 @@ jobs:
- If jobs cannot be run, say exactly what you could not run and why
- **Important** Modify perf-changelog.yaml for any config changes affecting performance

## Profiling (SGLang only)
When asked to profile a config, dispatch the `profile.yml` workflow. **Only SGLang configs can be profiled** — the profiler uses SGLang's `/start_profile` and `/stop_profile` HTTP endpoints. Reject profiling requests for vLLM, TRT, or other frameworks.
## Profiling
When asked to profile a config, dispatch the `profile.yml` workflow. SGLang, vLLM, and ATOM single-node configs can be profiled through their `/start_profile` and `/stop_profile` HTTP endpoints when the server is launched with the corresponding torch profiler directory. Reject profiling requests for TRT, disaggregated/multi-node configs, or other frameworks.

**Syntax:**
```
Expand All @@ -172,9 +172,10 @@ jobs:
workflow_id="profile.yml",
ref="main",
inputs={
"config-key": "<config-key-ending-in-sglang>",
"config-key": "<config-key>",
"config-file": "<.github/configs/nvidia-master.yaml or amd-master.yaml>",
"conc": "<concurrency>"
"conc": "<concurrency>",
"seq-len": "<1k1k or 8k1k>"
}
)
```
Expand All @@ -184,19 +185,16 @@ jobs:
- Model: "deepseek" / "dsr1" → model-prefix `dsr1`; "gptoss" → `gptoss`; "qwen" → `qwen3.5`
- Precision: "fp4" / "fp8" / "bf16"
- Runner/hardware: "b200", "h200", "h100", "mi300x", "mi325x", "mi355x", etc.
- Framework: must be "sglang" (reject if not)
- Framework: must be "sglang", "vllm", or "atom" (reject TRT and disaggregated/multi-node)
- Concurrency: "conc=N" → `"conc": "N"`. Default to `"64"` if not specified.
- Sequence length: default to `"1k1k"` unless the user asks for `"8k1k"`.

Construct the config-key as: `{model-prefix}-{precision}-{runner}-sglang`
Construct the config-key as: `{model-prefix}-{precision}-{runner}-{framework}`
Choose config-file: NVIDIA runners (b200, h200, h100, gb200, gb300) → `nvidia-master.yaml`; AMD runners (mi300x, mi325x, mi355x) → `amd-master.yaml`

**Available SGLang config keys:**
NVIDIA: `dsr1-fp4-b200-sglang`, `dsr1-fp8-b200-sglang`, `dsr1-fp8-h200-sglang`, `qwen3.5-bf16-b200-sglang`
AMD: `dsr1-fp4-mi355x-sglang`, `dsr1-fp8-mi300x-sglang`, `dsr1-fp8-mi325x-sglang`, `dsr1-fp8-mi355x-sglang`, `qwen3.5-bf16-mi355x-sglang`, `qwen3.5-fp8-mi355x-sglang`

**Examples:**
- "profile sglang b200 deepseek fp4 conc=4" → `config-key: dsr1-fp4-b200-sglang`, `config-file: .github/configs/nvidia-master.yaml`, `conc: 4`
- "profile sglang mi355x dsr1 fp8" → `config-key: dsr1-fp8-mi355x-sglang`, `config-file: .github/configs/amd-master.yaml`, `conc: 64`
- "profile sglang b200 deepseek fp4 conc=4" → `config-key: dsr1-fp4-b200-sglang`, `config-file: .github/configs/nvidia-master.yaml`, `conc: 4`, `seq-len: 1k1k`
- "profile atom mi355x dsv4 fp4 conc=4 8k1k" → `config-key: dsv4-fp4-mi355x-atom`, `config-file: .github/configs/amd-master.yaml`, `conc: 4`, `seq-len: 8k1k`

**After dispatch:**
Monitor with `mcp__github__get_workflow_run`. The profile workflow takes ~15-30 minutes. When complete, the **Perfetto relay link** is in the workflow run's step summary. Retrieve it with:
Expand Down
41 changes: 36 additions & 5 deletions .github/workflows/profile.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,14 @@ on:
required: false
type: string
default: '64'
seq-len:
description: "Sequence length config to profile"
required: false
type: choice
options:
- 1k1k
- 8k1k
default: 1k1k
moe-debug:
description: "Enable MoE debug patch and log (MOE_DEBUG_LOG)"
required: false
Expand Down Expand Up @@ -54,7 +62,7 @@ jobs:
name: Generate matrix via script
run: |
pip install pydantic
CLI_ARGS="test-config --config-files ${{ inputs.config-file }} --config-keys ${{ inputs.config-key }} --conc ${{ inputs.conc }}"
CLI_ARGS="test-config --config-files ${{ inputs.config-file }} --config-keys ${{ inputs.config-key }} --conc ${{ inputs.conc }} --seq-lens ${{ inputs.seq-len }}"
CONFIG_JSON=$(python3 ${GITHUB_WORKSPACE}/utils/matrix_logic/generate_sweep_configs.py $CLI_ARGS)
echo "raw=$CONFIG_JSON" >> $GITHUB_OUTPUT

Expand Down Expand Up @@ -148,13 +156,16 @@ jobs:
ref: ${{ inputs.ref || github.sha }}
clean: false

- name: Launch + Profile (single-node sglang/vllm)
- name: Launch + Profile (single-node)
id: run
env:
RUNNER_NAME: ${{ runner.name }}
PROFILE: '1'
SGLANG_TORCH_PROFILER_DIR: /workspace/
VLLM_TORCH_PROFILER_DIR: /workspace/
ATOM_TORCH_PROFILER_DIR: /workspace/atom_profiles
PROFILE_NUM_STEPS: '1'
PROFILE_OUTPUT_LEN: '1'
VLLM_RPC_TIMEOUT: '1800000'
shell: bash
run: |
Expand All @@ -173,6 +184,11 @@ jobs:

trace_path="profile_${res_name}.trace.json.gz"
if [ -f "$trace_path" ]; then
if [ ! -s "$trace_path" ]; then
echo "Profile trace is empty: $trace_path" >&2
exit 1
fi
gzip -t "$trace_path"
echo "trace=$trace_path" >> "$GITHUB_OUTPUT"
if [ "${FRAMEWORK}" = "sglang" ]; then
# Try to locate corresponding TP-0 traces produced by SGLang profiler
Expand All @@ -193,32 +209,47 @@ jobs:
fi
else
echo "Profile trace not found: $trace_path" >&2
exit 1
fi

- name: Process result (json -> agg)
continue-on-error: true
env:
RUNNER_TYPE: ${{ matrix.config.runner }}
run: |
python3 utils/process_result.py

- name: Upload profile diagnostics
if: ${{ always() && env.RESULT_FILENAME != '' }}
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: profile_diagnostics_${{ env.RESULT_FILENAME }}
path: |
${{ env.RESULT_FILENAME }}.json
agg_${{ env.RESULT_FILENAME }}.json
server.log
gpu_metrics.csv
atom_profiles/**/*.trace.json.gz
if-no-files-found: ignore

- name: Upload profile as artifact
if: ${{ steps.run.outputs.trace != '' }}
if: ${{ always() && steps.run.outputs.trace != '' }}
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: profile_${{ env.RESULT_FILENAME }}
path: profile_${{ env.RESULT_FILENAME }}.trace.json.gz
if-no-files-found: ignore

- name: Upload TP-0-DECODE trace as artifact
if: ${{ steps.run.outputs.tp0_decode != '' }}
if: ${{ always() && steps.run.outputs.tp0_decode != '' }}
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: profile_${{ env.RESULT_FILENAME }}_TP0_DECODE
path: ${{ steps.run.outputs.tp0_decode }}
if-no-files-found: ignore

- name: Upload TP-0-EXTEND trace as artifact
if: ${{ steps.run.outputs.tp0_extend != '' }}
if: ${{ always() && steps.run.outputs.tp0_extend != '' }}
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: profile_${{ env.RESULT_FILENAME }}_TP0_EXTEND
Expand Down
Loading
Loading