Skip to content

[Perf Regression] 17 config(s) regressed @ 117f7d2c #403

@github-actions

Description

@github-actions

Performance Regression Detected

Commit: 117f7d2c
Run: https://github.com/ROCm/ATOM/actions/runs/23503063060
Date: 2026-03-25T02:06:23.707234+00:00

Regressed Configurations

Model ISL/OSL Conc Tput (cur) Tput (base) Δ% TPOT (cur) TPOT (base) Δ%
DeepSeek-R1-0528 1024/1024 1 85.6 96.2 -11.0% 11.58 10.32 12.1%
DeepSeek-R1-0528 1024/1024 32 1657.5 1744.9 -5.0% 18.72 17.74 5.5%
DeepSeek-R1-0528 1024/1024 64 2645.3 2832.8 -6.6% 23.25 21.73 7.0%
DeepSeek-R1-0528 8192/1024 1 76.6 85.5 -10.4% 12.75 11.42 11.7%
DeepSeek-R1-0528 8192/1024 4 282.2 316.4 -10.8% 13.40 12.02 11.4%
DeepSeek-R1-0528 8192/1024 8 523.1 556.5 -6.0% 14.68 13.67 7.4%
DeepSeek-R1-0528 8192/1024 64 1668.5 1764.4 -5.4% 36.48 34.46 5.9%
DeepSeek-R1-0528-mtp3 1024/1024 8 756.7 878.9 -13.9% 10.21 8.74 16.9%
DeepSeek-R1-0528-mtp3 8192/1024 4 438.6 551.2 -20.4% 8.03 6.54 22.9%
DeepSeek-R1-0528-mtp3 8192/1024 8 773.2 735.2 5.2% 9.63 10.11 -4.7%
GLM-5-FP8 1024/1024 4 162.6 176.7 -8.0% 23.57 21.77 8.3%
GLM-5-FP8 8192/1024 2 83.4 83.4 -0.1% 23.29 23.40 -0.5%
gpt-oss-120b 1024/1024 16 2394.8 2420.9 -1.1% 6.46 6.41 0.8%
gpt-oss-120b 1024/1024 128 8684.9 8716.7 -0.4% 14.10 14.14 -0.3%
gpt-oss-120b 1024/8192 4 950.7 936.9 1.5% 4.14 4.20 -1.4%
gpt-oss-120b 1024/8192 128 8998.8 9073.1 -0.8% 13.84 13.75 0.6%
gpt-oss-120b 8192/1024 8 1336.1 1342.8 -0.5% 5.69 5.68 0.3%

Performance Summary

# Trace Performance Summary

**File:** `DeepSeek-R1-0528_ts_20260325_021655_879.pt.trace.json.gz`

## Prefill

| # | Label | Duration |
|---|-------|----------|
| 0 | `prefill[bs=1 tok=991 ctx=991]` | 78.10 ms |
| 1 | `prefill[bs=1 tok=866 ctx=866]` | 77.14 ms |

**Total prefill:** 155.24 ms

## Decode

- **Iterations:** 1947
- **Mean:** 827.4 us
- **Min:** 672.2 us
- **Max:** 1.74 ms
- **Total:** 1610.92 ms

Profiler Traces

Download from workflow artifacts.
Open in Perfetto UI or Chrome chrome://tracing for analysis.

Next Steps

  1. Download profiler-analysis-23503063060 artifact
  2. Open trace files in Perfetto UI
  3. Compare kernel durations against previous traces
  4. Identify bottleneck changes

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions