Skip to content

【Hackathon 9th No.29】add test_cutlass_fp8_block_gemm [cf]#7706

Open
ghost wants to merge 1 commit intoPaddlePaddle:developfrom
CloudForge-Solutions:task/029-cutlass-fp8-block-gemm-unit-test-v3
Open

【Hackathon 9th No.29】add test_cutlass_fp8_block_gemm [cf]#7706
ghost wants to merge 1 commit intoPaddlePaddle:developfrom
CloudForge-Solutions:task/029-cutlass-fp8-block-gemm-unit-test-v3

Conversation

@ghost
Copy link
Copy Markdown

@ghost ghost commented May 2, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

  • I have submitted the CLA (only first PR)
  • My PR title follows the convention
  • My changes pass all tests

@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:15 — with GitHub Actions Inactive
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 2, 2026

CLA assistant check
All committers have signed the CLA.

@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:15 — with GitHub Actions Inactive
@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:15 — with GitHub Actions Inactive
@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:15 — with GitHub Actions Inactive
@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:15 — with GitHub Actions Inactive
@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:15 — with GitHub Actions Inactive
@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:15 — with GitHub Actions Inactive
@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:15 — with GitHub Actions Inactive
@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 2, 2026

Thanks for your contribution!

@paddle-bot paddle-bot Bot added the contributor External developers label May 2, 2026
PaddlePaddle-bot

This comment was marked as outdated.

@PaddlePaddle-bot
Copy link
Copy Markdown

PaddlePaddle-bot commented May 2, 2026

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-03 22:03:18

CI报告基于以下代码生成(30分钟更新一次):


1 任务总览

当前已执行的 CI 任务全部通过 ✅,但有 7 个 Workflow 处于 action_required 状态,等待人工审批后才会执行。

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
2(0) 2 2 0 0 0 0

⚠️ 注意:以下 7 个 Workflow 处于 action_required 状态(等待审批后才会执行):CI_HPU、ILUVATAR-CI、CI_XPU、PR Build and Test、Codestyle-Check、Check PR Template、Approval。这些 Workflow 需人工审批触发。

注意:action_required workflows 不计入上表的任务统计。

2 任务状态汇总

2.1 Required 任务:0/0 通过

当前未检测到 Required 任务(分支保护规则可能未配置,或主要 CI 任务尚在 action_required 状态等待触发)。

2.2 可选任务 — 2/2 通过

可选任务不阻塞合并,失败仅供参考。

状态 任务 耗时 日志 重跑
其余 2 个可选任务通过 - - -

3 失败详情(仅 required)

无 required 失败任务。

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-05-03 22:11:58

📋 Review 摘要

PR 概述:为 cutlass_fp8_fp8_half_block_gemm_fused(FP8 block-scaled GEMM 算子)新增单元测试,覆盖 BF16/FP16 输出正确性及非对齐维度边界场景
变更范围tests/operators/
影响面 Tag[OP] [Quantization]

📝 PR 规范检查

标题使用了中文括号 【】[cf] 非官方 Tag,不符合规范;PR 描述各 section 均为 TODO 占位符,结构核验不合规。

标题建议(可直接复制):

  • [OP] Add unit test for cutlass_fp8_fp8_half_block_gemm_fused

PR 描述建议(可直接复制,必须复刻 checklist §D2 模板的完整结构):

## Motivation`cutlass_fp8_fp8_half_block_gemm_fused`(FP8 block-scaled GEMM)补充单元测试,提升算子测试覆盖率,验证 BF16/FP16 输出的正确性及非对齐维度的边界处理。

## Modifications
- 新增 `tests/operators/test_cutlass_fp8_fp8_half_block_gemm_fused.py`
  - `test_bfloat16_correctness`:验证 BF16 输出,覆盖 (32,2048,2048)、(64,4096,4096)、(128,5120,5120) 三种形状
  - `test_float16_output`:验证 FP16 输出,形状 (64,2048,2048)
  - `test_non_aligned_dimensions`:验证 K=5504(非 128 对齐)场景

## Usage or Command
```bash
python -m pytest tests/operators/test_cutlass_fp8_fp8_half_block_gemm_fused.py -v
```

## Accuracy Tests
N/A(本 PR 为测试代码新增,精度通过 `assert_allclose(rtol=5e-2, atol=5e-2)` 内嵌验证)

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

问题

级别 文件 概述
📝 PR 规范 标题使用非标准格式,描述各 section 均为 TODO 占位符
🟡 建议 test_cutlass_fp8_fp8_half_block_gemm_fused.py:26 随机种子在模块级别设置,测试用例之间随机状态相互依赖
❓ 疑问 test_cutlass_fp8_fp8_half_block_gemm_fused.py:103 方法文档字符串与实际测试参数不一致(N=2048 实际已对齐)

总体评价

测试结构清晰,覆盖了 BF16/FP16 输出和非对齐 K 维度等主要场景,reference 实现逻辑正确。PR 规范需要修正标题格式和补全描述内容,随机种子建议移入 setUp 以保证测试隔离性。

BLOCK_SIZE = 128

paddle.seed(2025)
np.random.seed(2025)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 随机种子在模块级别设置,各测试用例之间随机状态相互依赖。

当测试用例以不同顺序或单独运行时,每次得到的随机张量可能不同,降低可重复性。建议将 seed 移入 setUp 方法:

def setUp(self):
    paddle.seed(2025)
    np.random.seed(2025)
    paddle.set_device("gpu")
    ...


def test_non_aligned_dimensions(self):
"""N and K not aligned to block size 128."""
self._skip_if_not_sm90()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ 疑问 文档字符串称 "N and K not aligned to block size 128",但实际传入的 n=2048 是 128 的整数倍(2048 / 128 = 16),仅 K=5504 不对齐。

建议修正文档字符串以准确描述测试内容:

"""K not aligned to block size 128 (K=5504 is not a multiple of 128)."""

或同时测试 N 不对齐的情况,例如 _check_output(32, 2050, 5504)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants