[Feature]【Hackathon 10th Spring No.46】Python Windows runtime compatibility [cf] · Pull Request #7702 · PaddlePaddle/FastDeploy

ghost · 2026-05-02T17:14:51Z

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

I have submitted the CLA (only first PR)
My PR title follows the convention
My changes pass all tests

CLAassistant · 2026-05-02T17:14:58Z

All committers have signed the CLA.

paddle-bot · 2026-05-02T17:17:11Z

Thanks for your contribution!

PaddlePaddle-bot · 2026-05-02T18:20:13Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-03 22:43:44

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: 5a2f8dd
Merge base: d70f33d (branch: develop)
查看完整 Diff
CI 详情

1 任务总览

所有已执行的任务均已通过。但有 7 个 Workflow 处于 action_required 状态，等待人工审批后才会执行（含主要 CI 流水线），当前 CI 尚未完整运行。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
2(0)	2	2	0	0	0	0

⚠️ 注意：以下 7 个 Workflow 处于 action_required 状态（等待审批后才会执行）：Approval、Codestyle-Check、Check PR Template、CI_HPU、CI_XPU、PR Build and Test、ILUVATAR-CI。这些 Workflow 需人工审批触发。

2 任务状态汇总

2.1 Required任务 : 0/0 通过

当前未检测到已配置的 Required 任务（分支保护规则未配置，或 API 权限不足）。主要 CI 流水线（PR Build and Test 等）均处于 action_required 状态，尚未执行。

2.2 可选任务 — 2/2 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	日志	重跑
✅	其余 2 个可选任务通过	-	-	-

3 失败详情（仅 required）

无

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-05-03 22:05:16

📋 Review 摘要

PR 概述：为 Python 代码添加 Windows 平台运行时兼容性支持，替换 /dev/shm 路径、os.setsid/os.killpg 等 POSIX 专有 API
变更范围：fastdeploy/engine/、fastdeploy/cache_manager/、fastdeploy/inter_communicator/、fastdeploy/eplb/、fastdeploy/worker/
影响面 Tag：[Engine] [KVCache] [Feature]

📝 PR 规范检查

标题含非官方后缀 [cf]，且 Motivation/Modifications/Usage/Accuracy Tests 各段均为空占位符，未填写实际内容。

标题建议（可直接复制）：

[Feature] Python Windows runtime compatibility

PR 描述建议（可直接复制，必须复刻 checklist §D2 模板的完整结构）：

## Motivation
修复 Python 在 Windows 平台下的运行时兼容性问题。代码中大量硬编码了 `/dev/shm` 路径及 `os.setsid`、`os.killpg` 等 POSIX 专有 API，导致在 Windows 上无法运行。

## Modifications
- `cache_manager/cache_messager.py`：将 `/dev/shm` 路径替换为通过 `sys.platform` 判断的 `tempfile.gettempdir()`
- `cache_manager/prefix_cache_manager.py`：`subprocess.Popen` 中 `preexec_fn=os.setsid` 改为平台条件化 `**_popen_kwargs`
- `engine/common_engine.py`：`/dev/shm` 路径替换 + `os.killpg` 改为 Windows 下 `proc.terminate()`
- `engine/engine.py`：同上，并将 `multiprocessing.get_context("fork")` 改为 Windows 下使用 `"spawn"`
- `engine/expert_service.py`：进程终止逻辑兼容 Windows
- `eplb/async_expert_loader.py`：`/dev/shm` 路径替换
- `inter_communicator/fmq.py`：`Config.ipc_root` 默认值平台适配
- `inter_communicator/zmq_client.py`、`zmq_server.py`：socket 路径平台适配
- `worker/worker_process.py`：task queue 路径平台适配

## Usage or Command
N/A

## Accuracy Tests
N/A

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

问题

级别	文件	概述
🔴 Bug	`fastdeploy/engine/expert_service.py:113`	`init_cache_info` 方法在 `FDConfig` 中不存在（仅有 `init_pd_info`），`dp`/`splitwise` 调度场景启动时将抛 `AttributeError`
🔴 Bug	`fastdeploy/engine/expert_service.py:126`	同上，`splitwise` 路径下的第二处调用
🔴 Bug	`fastdeploy/engine/engine.py:872`	dp worker 初始化由并行改为串行：先批量 `start()` 再统一等待的设计被破坏，多 dp 场景启动耗时线性增加
🟡 建议	`fastdeploy/engine/engine.py:680`	`enable_flashinfer_allreduce_fusion` 从 `worker_store_true_flag` 和 worker 命令行参数中删除，但 `model_executor` 层仍读取该字段，导致用户开启该 flag 后实际效果静默失效

总体评价

Windows 平台路径和进程 API 的兼容性改动思路正确，但引入了两个明确的运行时 Bug：init_cache_info 方法不存在将在 dp/splitwise 场景导致启动失败；dp worker 并行初始化被意外改为串行。这两处必须修复后再合入。

PaddlePaddle-bot · 2026-05-03T14:08:54Z

            self.engine.create_data_processor()
        if self.cfg.scheduler_config.name == "dp":
-            self.cfg.init_pd_info()
+            self.cfg.init_cache_info()


🔴 Bug init_cache_info 方法在 FDConfig 中不存在（仅有 init_pd_info），此处调用将在 dp 调度场景下运行时抛出 AttributeError。

建议修复：

self.cfg.init_pd_info()

PaddlePaddle-bot · 2026-05-03T14:08:54Z


        if self.cfg.scheduler_config.name == "splitwise":
-            self.cfg.init_pd_info()
+            self.cfg.init_cache_info()


🔴 Bug 同上，splitwise 调度场景下同样调用了不存在的 init_cache_info 方法，运行时抛 AttributeError。

建议修复：

self.cfg.init_pd_info()

PaddlePaddle-bot · 2026-05-03T14:08:54Z

-
                    while self.launched_expert_service_signal.value[i] == 0:
-                        time.sleep(0.1)
+                        time.sleep(1)


🔴 Bug dp worker 初始化逻辑从「并行启动→批量等待」被意外改为「串行启动→逐个等待」。

原代码设计：先 start() 所有 dp 子进程（允许并发加载权重），再独立循环等待所有就绪，总耗时约等于最慢进程的时间。

改动后：每启动一个进程都立即等待其就绪，多 dp 场景总启动耗时线性增加为所有进程初始化时间之和。

建议恢复原始两循环结构：

for i in range(1, self.cfg.parallel_config.data_parallel_size // self.cfg.nnode): # ... 创建并 start() 进程 self.dp_processed[-1].start() for i in range(1, self.cfg.parallel_config.data_parallel_size // self.cfg.nnode): while self.launched_expert_service_signal.value[i] == 0: time.sleep(0.1)

PaddlePaddle-bot · 2026-05-03T14:08:54Z

@@ -700,7 +678,6 @@ def _start_worker_service(self):
            "enable_entropy": self.cfg.model_config.enable_entropy,
            "ep_prefill_use_worst_num_tokens": self.cfg.parallel_config.ep_prefill_use_worst_num_tokens,
            "enable_overlap_schedule": self.cfg.scheduler_config.enable_overlap_schedule,


🟡 建议 enable_flashinfer_allreduce_fusion 已从此处 worker_store_true_flag 和 worker_process.py 的 argparse 参数中删除，但 model_executor/layers/normalization.py 和 linear.py 仍从 config 读取该字段控制融合行为。

这导致 flag 无法再从 engine 侧传递到 worker 子进程命令行，model_executor 层将始终以 config 的默认值（False）运行，若用户通过 EngineArgs 开启了该 flag，实际效果将静默失效。

建议：如果是有意移除，需同步删除 config.py、args_utils.py、normalization.py、linear.py 中的相关代码；如果不是有意移除，则需恢复这三处删除。

ghost temporarily deployed to Metax_ci May 2, 2026 17:14 — with GitHub Actions Inactive

ghost temporarily deployed to Metax_ci May 2, 2026 17:15 — with GitHub Actions Inactive

paddle-bot Bot added the contributor External developers label May 2, 2026

This comment was marked as outdated.

Sign in to view

[CI]【Hackathon 10th Spring No.46】Windows Python runtime guards

5a2f8dd

PaddlePaddle-bot suggested changes May 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]【Hackathon 10th Spring No.46】Python Windows runtime compatibility [cf]#7702

[Feature]【Hackathon 10th Spring No.46】Python Windows runtime compatibility [cf]#7702
ghost wants to merge 1 commit intoPaddlePaddle:developfrom
CloudForge-Solutions:task/h10-46-python-compat-v3

ghost commented May 2, 2026

Uh oh!

CLAassistant commented May 2, 2026 •

edited

Loading

Uh oh!

paddle-bot Bot commented May 2, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot commented May 2, 2026 •

edited

Loading

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot May 3, 2026

Uh oh!

PaddlePaddle-bot May 3, 2026

Uh oh!

PaddlePaddle-bot May 3, 2026

Uh oh!

PaddlePaddle-bot May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ghost commented May 2, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

CLAassistant commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paddle-bot Bot commented May 2, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1 任务总览

2 任务状态汇总

2.1 Required任务 : 0/0 通过

2.2 可选任务 — 2/2 通过

3 失败详情（仅 required）

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

📝 PR 规范检查

问题

总体评价

Uh oh!

PaddlePaddle-bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CLAassistant commented May 2, 2026 •

edited

Loading

PaddlePaddle-bot commented May 2, 2026 •

edited

Loading