Skip to content

[CI]【Hackathon 10th Spring No.44】fused_moe_deepgemm_backend unit test [cf]#7696

Open
ghost wants to merge 1 commit intoPaddlePaddle:developfrom
CloudForge-Solutions:task/h10-044-moe-deepgemm-backend-test-v3
Open

[CI]【Hackathon 10th Spring No.44】fused_moe_deepgemm_backend unit test [cf]#7696
ghost wants to merge 1 commit intoPaddlePaddle:developfrom
CloudForge-Solutions:task/h10-044-moe-deepgemm-backend-test-v3

Conversation

@ghost
Copy link
Copy Markdown

@ghost ghost commented May 2, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

  • I have submitted the CLA (only first PR)
  • My PR title follows the convention
  • My changes pass all tests

@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:14 — with GitHub Actions Inactive
@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 2, 2026

Thanks for your contribution!

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 2, 2026

CLA assistant check
All committers have signed the CLA.

@paddle-bot paddle-bot Bot added the contributor External developers label May 2, 2026
@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:14 — with GitHub Actions Inactive
@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:14 — with GitHub Actions Inactive
@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:14 — with GitHub Actions Inactive
@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:14 — with GitHub Actions Inactive
@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:14 — with GitHub Actions Inactive
@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:14 — with GitHub Actions Inactive
@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:15 — with GitHub Actions Inactive
@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:15 — with GitHub Actions Inactive
@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:15 — with GitHub Actions Inactive
@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:15 — with GitHub Actions Inactive
@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:15 — with GitHub Actions Inactive
@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:15 — with GitHub Actions Inactive
@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:15 — with GitHub Actions Inactive
@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:15 — with GitHub Actions Inactive
@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:15 — with GitHub Actions Inactive
@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:15 — with GitHub Actions Inactive
@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:15 — with GitHub Actions Inactive
PaddlePaddle-bot

This comment was marked as outdated.

@PaddlePaddle-bot
Copy link
Copy Markdown

PaddlePaddle-bot commented May 2, 2026

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-03 21:56:21

CI报告基于以下代码生成(30分钟更新一次):


1 任务总览

所有已执行的 Required 任务均通过(当前无 Required 任务),2 个可选任务全部通过。⚠️7 个 Workflow 处于 action_required 状态,等待人工审批后才会执行。

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
2(0) 2 2 0 0 0 0

⚠️ 注意:以下 7 个 Workflow 处于 action_required 状态(等待审批后才会执行):ILUVATAR-CI、PR Build and Test、CI_XPU、Approval、Codestyle-Check、Check PR Template、CI_HPU。这些 Workflow 需人工审批触发。

注意:action_required workflows 不计入上表的任务统计。


2 任务状态汇总

2.1 Required 任务 : 0/0 通过

当前 PR 未配置 Required 任务(Branch Protection Rules 中无必选任务),无阻塞合并的任务。

2.2 可选任务 — 2/2 通过

可选任务不阻塞合并,失败仅供参考。

状态 任务 耗时 日志 重跑
其余 2 个可选任务通过 - - -

3 失败详情(仅 required)

无 required 失败任务。

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-05-03 21:40:00

📋 Review 摘要

PR 概述:为 fused_moe_deepgemm_backend.py 新增单测,补充 Hackathon 10th Spring No.44 要求的额外代码覆盖路径(权重管理、apply_tp 非 Phi 路径、masked GEMM prefill、shared_experts 分支)
变更范围tests/layers/test_fused_moe_deepgemm_backend.py
影响面 Tag[CI]

📝 PR 规范检查

标题包含 [CI] 官方 Tag,但末尾有非标准后缀 [cf],含义不明;描述四个正文 section(Motivation / Modifications / Usage or Command / Accuracy Tests)均为 TODO 占位符,未填写实际内容。

标题建议(可直接复制):

  • [CI] Add unit tests for fused_moe_deepgemm_backend (Hackathon 10th Spring No.44)

PR 描述建议(可直接复制,必须复刻 checklist §D2 模板的完整结构):

## Motivation`fused_moe_deepgemm_backend.py` 补充单测,覆盖现有 `test_deepgemm_fused_moe.py`(PR #6840)未触达的代码路径,包括:独立 helper 函数(infermeta、call_prefill_permute_to_masked_gemm、call_depermute_prefill_combine)、权重管理(create_weights、process_weights_after_loading、process_loaded_weights、process_prequanted_weights)、apply_tp 的 per_token_quant 输入路径(FD_USE_PHI_FP8_QUANT=False)、apply_ep_prefill 的 masked GEMM 路径(num_worst_tokens > 0)、apply_ep_prefill 和 apply_ep_decode 的 shared_experts 分支。

## Modifications
- 新增 `tests/layers/test_fused_moe_deepgemm_backend.py`(555 行)
  - 实现 GPU-only 模块 stub 框架(`_GpuOpsStub``_install_stubs` module-scope fixture)
  - `TestHelperFunctions`:测试 infermeta 输出 shape 和 permute/depermute 辅助函数
  - `TestWeightManagement`:测试权重创建、加载后处理、预量化权重加载(含 ue8m0 参数化)
  - `TestApplyTpNonPhiInput`:测试 per_token_quant 输入路径
  - `TestApplyEpPrefillAdditive`:测试 masked GEMM 路径及 shared_experts 调用
  - `TestApplyEpDecodeAdditive`:测试 decode 阶段 shared_experts 调用

## Usage or Command
pytest tests/layers/test_fused_moe_deepgemm_backend.py -v

## Accuracy Tests
N/A(纯单测,不涉及精度对比)

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

问题

级别 文件 概述
🟡 建议 tests/layers/test_fused_moe_deepgemm_backend.py:433 assert 用于运行时校验,Python -O 下失效
❓ 疑问 tests/layers/test_fused_moe_deepgemm_backend.py:108 scope="module" fixture 与 function-scope monkeypatch 隔离安全性
❓ 疑问 tests/layers/test_fused_moe_deepgemm_backend.py:222 paddle.static.MetaTensor 构造方式是否为稳定公开 API

总体评价

测试结构设计完善,stub 框架较为精巧,有效覆盖了上游测试的盲区。建议修复 assert 用法,并在 PR 描述中补充实际内容(当前为 TODO 占位符)。

topk_weights, # recv_topk_weights
[n], # recv_num_tokens_per_expert_list (simplified)
None, # handle
_BufferStub.capture(), # event
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 assert 用于运行时校验

assert len(ffn.shape) == 3 在 Python -O 优化模式下会被静默跳过(AssertionError 不会抛出),违反 §C 规范:运行时对输入/行为的校验应使用显式 raise

建议修改为:

if len(ffn.shape) != 3:
    raise AssertionError(f"Expected 3D ffn tensor, got shape {ffn.shape}")


Uses ``scope="module"`` so stubs are installed once. The global ``dgb``
reference is safe because individual tests mutate it only via
``monkeypatch`` (function-scoped by default), which auto-reverts.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ 疑问 scope="module" fixture 与 function-scope monkeypatch 的隔离安全性

_install_stubs 使用 scope="module" 安装 stub,而各测试通过 function-scope monkeypatch 打 patch。dgb 是全局变量,module 级别只初始化一次。

需确认:当多个测试并发执行(pytest-xdist)或测试顺序发生变化时,是否可能出现 stub 未恢复导致的跨测试污染?建议在注释中明确标注此假设(串行执行),或改用 scope="function" 重新导入模块以彻底隔离。

meta_in,
meta_in,
meta_in,
meta_in,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ 疑问 paddle.static.MetaTensor 构造方式待确认

paddle.static.MetaTensor(shape=[2, 3], dtype=paddle.float16) 是否为有效的公开 API?该类在部分 Paddle 版本中可能是内部类,或构造签名不同。建议在 CI 矩阵中确认所使用的 Paddle 版本支持此用法,或改为 paddle.ones([2, 3], dtype='float16') 等稳定 API 构造 mock 输入。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants