[Feature]【Hackathon 10th Spring No.47】Add MiniMax-M1 model support [cf] · Pull Request #7703 · PaddlePaddle/FastDeploy

ghost · 2026-05-02T17:14:56Z

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

I have submitted the CLA (only first PR)
My PR title follows the convention
My changes pass all tests

CLAassistant · 2026-05-02T17:15:02Z

All committers have signed the CLA.

paddle-bot · 2026-05-02T17:17:39Z

Thanks for your contribution!

PaddlePaddle-bot · 2026-05-02T18:20:10Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-03 22:43:44

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: 74e5f61
Merge base: d70f33d (branch: develop)
查看完整 Diff
CI 详情

1 任务总览

所有已执行的 Required 任务均已通过（当前 required_total = 0，7 个核心 Workflow 处于 action_required 状态，等待人工审批后才会执行）。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
2(0)	2	2	0	0	0	0

⚠️ 注意：以下 7 个 Workflow 处于 action_required 状态（等待审批后才会执行）：Codestyle-Check、Approval、Check PR Template、CI_HPU、CI_XPU、PR Build and Test、ILUVATAR-CI。这些 Workflow 需人工审批触发。

注意：action_required workflows 不计入上表的任务统计。

2 任务状态汇总

2.1 Required 任务 : 0/0 通过

当前无 Required 任务（核心 CI Workflows 均处于 action_required 状态，尚未执行）。

2.2 可选任务 — 2/2 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	日志	重跑
✅	其余 2 个可选任务通过	-	-	-

3 失败详情（仅 required）

无 required 失败任务。

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-05-03 22:33:56

📋 Review 摘要

PR 概述：为 FastDeploy 新增 MiniMax-M1 混合解码器模型支持（70 层 Lightning 线性注意力 + 10 层全注意力 + MoE 路由）

变更范围：fastdeploy/model_executor/models/、ops/triton_ops/lightning_attn.py、layers/rotary_embedding.py、docs/

影响面 Tag：Models OP Docs

📝 PR 规范检查

标题含非标准噪声标记 【Hackathon 10th Spring No.47】 和 [cf]，PR 描述各段落均为 TODO 占位符，需补全。

标题建议（可直接复制）：

[Feature][Models][OP] Add MiniMax-M1 hybrid attention model support

PR 描述建议（可直接复制，必须复刻 checklist §D2 模板的完整结构）：

## Motivation

为 FastDeploy 添加 MiniMax-M1 模型支持。MiniMax-M1（MiniMaxText01ForCausalLM）采用混合解码器结构（70 层 Lightning 线性注意力 + 10 层标准全注意力）和 MoE 路由（32 专家，每 token top-2），当前版本以 BF16 推理为主，完成模型组网与后端接线。

## Modifications

- 新增 `fastdeploy/model_executor/models/minimax_m1.py`：MiniMax-M1 完整模型组网，注册 `MiniMaxM1ForCausalLM` 和 `MiniMaxText01ForCausalLM` 两个架构名
- 新增 `fastdeploy/model_executor/ops/triton_ops/lightning_attn.py`：Lightning Attention Triton kernel（含 prefill 对角块、KV 并行累积、跨块前向传播）
- 修改 `fastdeploy/model_executor/layers/rotary_embedding.py`：为 MiniMax-M1 全注意力层添加 RoPE dispatch 分支
- 新增中英文最佳实践文档（`docs/best_practices/MiniMax-M1.md`）并更新模型支持列表
- 新增测试文件：`tests/model_executor/test_minimax_m1*.py`、`tests/operators/test_lightning_attn_triton.py`、`tests/e2e/validate_minimax_m1_e2e.py`

## Usage or Command

```shell
MODEL_PATH=/models/MiniMax-Text-01

python -m fastdeploy.entrypoints.openai.api_server \
    --model "$MODEL_PATH" \
    --port 8180 \
    --metrics-port 8181 \
    --engine-worker-queue-port 8182 \
    --max-model-len 32768 \
    --max-num-seqs 32
```

## Accuracy Tests

当前版本为首版集成，专注于模型组网与后端接线，未提供精度对比数据。Lightning Attention prefill/decode 路径仍需在 GPU 环境完成端到端验证，后续版本补充。

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

问题

级别	文件	概述
🔴 Bug	`fastdeploy/model_executor/layers/rotary_embedding.py:344`	`MiniMaxText01ForCausalLM` 架构名不匹配 `startswith("MiniMaxM1")`，全注意力层 RoPE 选错
🟡 建议	`fastdeploy/model_executor/models/minimax_m1.py:358`	`_kv_history` 实例变量导致多请求并发 KV 状态污染，并发场景下模型不可用

总体评价

整体架构合理，MoE 量化适配和 LinearAttention 的 ALiBi slope 构建实现完整，测试文件覆盖较全面。但 rotary_embedding.py 对 MiniMaxText01ForCausalLM 架构名的 dispatch 存在漏匹配 bug，会导致全注意力层静默使用错误的 RoPE 实现，需修复后合入；_kv_history 的生产环境请求隔离问题已有 TODO 标注，建议在后续版本中优先完成 slot-based cache 迁移。

PaddlePaddle-bot · 2026-05-03T14:39:48Z


    architecture = model_config.architectures[0]
-    if architecture.startswith("Qwen"):
+    if architecture.startswith("Qwen") or architecture.startswith("MiniMaxM1"):


🔴 Bug MiniMaxText01ForCausalLM 架构名不匹配 startswith("MiniMaxM1")

代码注册了两个架构名：MiniMaxM1ForCausalLM 和 MiniMaxText01ForCausalLM（HuggingFace 原始权重常用后者）。但 "MiniMaxText01ForCausalLM".startswith("MiniMaxM1") 为 False，加载 HF 标准权重时全注意力层的 RoPE dispatch 会落入默认分支，使用错误的位置编码，产生静默错误输出。

建议修复：

if architecture.startswith("Qwen") or architecture.startswith("MiniMaxM1") or architecture.startswith("MiniMaxText01"):

PaddlePaddle-bot · 2026-05-03T14:39:48Z

+        # Retrieve or initialize KV history for recurrent state persistence.
+        # TODO: Migrate to ForwardMeta.caches / slot-based cache management for
+        #       proper multi-request isolation in production serving scenarios.
+        if not hasattr(self, "_kv_history") or self._kv_history is None or self._kv_history.shape[0] != batch_size:


🟡 建议 _kv_history 实例变量在多请求并发场景下导致 KV 状态污染

FastDeploy 生产服务将多个请求 token 打包成 ids_remove_padding 批处理，_kv_history 保存在层实例上，仅在 batch_size 不一致时重置。若两个并发请求恰好 batch_size 相同，后一个请求会继承前一个请求的 KV 历史，导致错误输出，该模型在并发场景下不可用。

已有 TODO 标注迁移方向（ForwardMeta.caches / slot-based cache），建议在合入前或紧接后的 follow-up PR 中完成。

ghost temporarily deployed to Metax_ci May 2, 2026 17:15 — with GitHub Actions Inactive

paddle-bot Bot added the contributor External developers label May 2, 2026

This comment was marked as outdated.

Sign in to view

[Feature]【Hackathon 10th Spring No.47】MiniMax-M1 model support

74e5f61

This comment was marked as outdated.

Sign in to view

PaddlePaddle-bot suggested changes May 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]【Hackathon 10th Spring No.47】Add MiniMax-M1 model support [cf]#7703

[Feature]【Hackathon 10th Spring No.47】Add MiniMax-M1 model support [cf]#7703
ghost wants to merge 1 commit intoPaddlePaddle:developfrom
CloudForge-Solutions:task/h10-47-minimax-m1-port-v3

ghost commented May 2, 2026

Uh oh!

CLAassistant commented May 2, 2026 •

edited

Loading

Uh oh!

paddle-bot Bot commented May 2, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot commented May 2, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot May 3, 2026

Uh oh!

PaddlePaddle-bot May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ghost commented May 2, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

CLAassistant commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paddle-bot Bot commented May 2, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1 任务总览

2 任务状态汇总

2.1 Required 任务 : 0/0 通过

2.2 可选任务 — 2/2 通过

3 失败详情（仅 required）

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

📝 PR 规范检查

问题

总体评价

Uh oh!

PaddlePaddle-bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CLAassistant commented May 2, 2026 •

edited

Loading

PaddlePaddle-bot commented May 2, 2026 •

edited

Loading