Skip to content

[Feature]【Hackathon 10th Spring No.47】Add MiniMax-M1 model support [cf]#7703

Open
ghost wants to merge 1 commit intoPaddlePaddle:developfrom
CloudForge-Solutions:task/h10-47-minimax-m1-port-v3
Open

[Feature]【Hackathon 10th Spring No.47】Add MiniMax-M1 model support [cf]#7703
ghost wants to merge 1 commit intoPaddlePaddle:developfrom
CloudForge-Solutions:task/h10-47-minimax-m1-port-v3

Conversation

@ghost
Copy link
Copy Markdown

@ghost ghost commented May 2, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

  • I have submitted the CLA (only first PR)
  • My PR title follows the convention
  • My changes pass all tests

@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:15 — with GitHub Actions Inactive
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 2, 2026

CLA assistant check
All committers have signed the CLA.

@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:15 — with GitHub Actions Inactive
@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:15 — with GitHub Actions Inactive
@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:15 — with GitHub Actions Inactive
@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:15 — with GitHub Actions Inactive
@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:15 — with GitHub Actions Inactive
@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:15 — with GitHub Actions Inactive
@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:15 — with GitHub Actions Inactive
@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:15 — with GitHub Actions Inactive
@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:15 — with GitHub Actions Inactive
@ghost ghost temporarily deployed to Metax_ci May 2, 2026 17:15 — with GitHub Actions Inactive
@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 2, 2026

Thanks for your contribution!

@paddle-bot paddle-bot Bot added the contributor External developers label May 2, 2026
PaddlePaddle-bot

This comment was marked as outdated.

@PaddlePaddle-bot
Copy link
Copy Markdown

PaddlePaddle-bot commented May 2, 2026

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-03 22:43:44

CI报告基于以下代码生成(30分钟更新一次):


1 任务总览

所有已执行的 Required 任务均已通过(当前 required_total = 0,7 个核心 Workflow 处于 action_required 状态,等待人工审批后才会执行)。

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
2(0) 2 2 0 0 0 0

⚠️ 注意:以下 7 个 Workflow 处于 action_required 状态(等待审批后才会执行):Codestyle-Check、Approval、Check PR Template、CI_HPU、CI_XPU、PR Build and Test、ILUVATAR-CI。这些 Workflow 需人工审批触发。

注意:action_required workflows 不计入上表的任务统计。


2 任务状态汇总

2.1 Required 任务 : 0/0 通过

当前无 Required 任务(核心 CI Workflows 均处于 action_required 状态,尚未执行)。

2.2 可选任务 — 2/2 通过

可选任务不阻塞合并,失败仅供参考。

状态 任务 耗时 日志 重跑
其余 2 个可选任务通过 - - -

3 失败详情(仅 required)

无 required 失败任务。

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-05-03 22:33:56

📋 Review 摘要

PR 概述:为 FastDeploy 新增 MiniMax-M1 混合解码器模型支持(70 层 Lightning 线性注意力 + 10 层全注意力 + MoE 路由)

变更范围fastdeploy/model_executor/models/ops/triton_ops/lightning_attn.pylayers/rotary_embedding.pydocs/

影响面 TagModels OP Docs

📝 PR 规范检查

标题含非标准噪声标记 【Hackathon 10th Spring No.47】[cf],PR 描述各段落均为 TODO 占位符,需补全。

标题建议(可直接复制):

  • [Feature][Models][OP] Add MiniMax-M1 hybrid attention model support

PR 描述建议(可直接复制,必须复刻 checklist §D2 模板的完整结构):

## Motivation

为 FastDeploy 添加 MiniMax-M1 模型支持。MiniMax-M1(MiniMaxText01ForCausalLM)采用混合解码器结构(70 层 Lightning 线性注意力 + 10 层标准全注意力)和 MoE 路由(32 专家,每 token top-2),当前版本以 BF16 推理为主,完成模型组网与后端接线。

## Modifications

- 新增 `fastdeploy/model_executor/models/minimax_m1.py`:MiniMax-M1 完整模型组网,注册 `MiniMaxM1ForCausalLM``MiniMaxText01ForCausalLM` 两个架构名
- 新增 `fastdeploy/model_executor/ops/triton_ops/lightning_attn.py`:Lightning Attention Triton kernel(含 prefill 对角块、KV 并行累积、跨块前向传播)
- 修改 `fastdeploy/model_executor/layers/rotary_embedding.py`:为 MiniMax-M1 全注意力层添加 RoPE dispatch 分支
- 新增中英文最佳实践文档(`docs/best_practices/MiniMax-M1.md`)并更新模型支持列表
- 新增测试文件:`tests/model_executor/test_minimax_m1*.py``tests/operators/test_lightning_attn_triton.py``tests/e2e/validate_minimax_m1_e2e.py`

## Usage or Command

```shell
MODEL_PATH=/models/MiniMax-Text-01

python -m fastdeploy.entrypoints.openai.api_server \
    --model "$MODEL_PATH" \
    --port 8180 \
    --metrics-port 8181 \
    --engine-worker-queue-port 8182 \
    --max-model-len 32768 \
    --max-num-seqs 32
```

## Accuracy Tests

当前版本为首版集成,专注于模型组网与后端接线,未提供精度对比数据。Lightning Attention prefill/decode 路径仍需在 GPU 环境完成端到端验证,后续版本补充。

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

问题

级别 文件 概述
🔴 Bug fastdeploy/model_executor/layers/rotary_embedding.py:344 MiniMaxText01ForCausalLM 架构名不匹配 startswith("MiniMaxM1"),全注意力层 RoPE 选错
🟡 建议 fastdeploy/model_executor/models/minimax_m1.py:358 _kv_history 实例变量导致多请求并发 KV 状态污染,并发场景下模型不可用

总体评价

整体架构合理,MoE 量化适配和 LinearAttention 的 ALiBi slope 构建实现完整,测试文件覆盖较全面。但 rotary_embedding.pyMiniMaxText01ForCausalLM 架构名的 dispatch 存在漏匹配 bug,会导致全注意力层静默使用错误的 RoPE 实现,需修复后合入;_kv_history 的生产环境请求隔离问题已有 TODO 标注,建议在后续版本中优先完成 slot-based cache 迁移。


architecture = model_config.architectures[0]
if architecture.startswith("Qwen"):
if architecture.startswith("Qwen") or architecture.startswith("MiniMaxM1"):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug MiniMaxText01ForCausalLM 架构名不匹配 startswith("MiniMaxM1")

代码注册了两个架构名:MiniMaxM1ForCausalLMMiniMaxText01ForCausalLM(HuggingFace 原始权重常用后者)。但 "MiniMaxText01ForCausalLM".startswith("MiniMaxM1")False,加载 HF 标准权重时全注意力层的 RoPE dispatch 会落入默认分支,使用错误的位置编码,产生静默错误输出。

建议修复:

if architecture.startswith("Qwen") or architecture.startswith("MiniMaxM1") or architecture.startswith("MiniMaxText01"):

# Retrieve or initialize KV history for recurrent state persistence.
# TODO: Migrate to ForwardMeta.caches / slot-based cache management for
# proper multi-request isolation in production serving scenarios.
if not hasattr(self, "_kv_history") or self._kv_history is None or self._kv_history.shape[0] != batch_size:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 _kv_history 实例变量在多请求并发场景下导致 KV 状态污染

FastDeploy 生产服务将多个请求 token 打包成 ids_remove_padding 批处理,_kv_history 保存在层实例上,仅在 batch_size 不一致时重置。若两个并发请求恰好 batch_size 相同,后一个请求会继承前一个请求的 KV 历史,导致错误输出,该模型在并发场景下不可用。

已有 TODO 标注迁移方向(ForwardMeta.caches / slot-based cache),建议在合入前或紧接后的 follow-up PR 中完成。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants