Skip to content

The backend crashes when loading the model with mmap. #13220

@304523475

Description

@304523475

Custom Node Testing

Expected Behavior

FETCH ComfyRegistry Data: 10/135
Exception in callback _ProactorBasePipeTransport._call_connection_lost(None)
handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)>
Traceback (most recent call last):
File "H:\ai\comfyui2\ComfyUI_windows_portable\python_embeded\Lib\asyncio\events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "H:\ai\comfyui2\ComfyUI_windows_portable\python_embeded\Lib\asyncio\proactor_events.py", line 165, in _call_connection_lost
self._sock.shutdown(socket.SHUT_RDWR)
ConnectionResetError: [WinError 10054] 远程主机强迫关闭了一个现有的连接。
Exception in callback _ProactorBasePipeTransport._call_connection_lost(None)
handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)>
Traceback (most recent call last):
File "H:\ai\comfyui2\ComfyUI_windows_portable\python_embeded\Lib\asyncio\events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "H:\ai\comfyui2\ComfyUI_windows_portable\python_embeded\Lib\asyncio\proactor_events.py", line 165, in _call_connection_lost
self._sock.shutdown(socket.SHUT_RDWR)
ConnectionResetError: [WinError 10054] 远程主机强迫关闭了一个现有的连接。
FETCH ComfyRegistry Data: 15/135
FETCH ComfyRegistry Data: 20/135
FETCH ComfyRegistry Data: 25/135
FETCH ComfyRegistry Data: 30/135
FETCH ComfyRegistry Data: 35/135
FETCH ComfyRegistry Data: 40/135
FETCH ComfyRegistry Data: 45/135
got prompt
Failed to validate prompt for output 704:

  • (prompt):
    • Required input is missing: images
  • VHS_VideoCombine 704:
    • Required input is missing: images
      Output will be ignored
      FETCH ComfyRegistry Data: 50/135
      [ReservedVRAM]执行前置GPU显存清理...
      [ReservedVRAM]GPU显存清理完成
      [ReservedVRAM]set EXTRA_RESERVED_VRAM=2.59GB (自动模式: 总显存=24.00GB, 已用=1.99GB)
      [DynamicRAMCache] Switched mode: CLASSIC -> RAM_PRESSURE (Headroom: 4.0GB)
      [ReservedVRAM]执行前置GPU显存清理...
      [ReservedVRAM]GPU显存清理完成
      [ReservedVRAM]set EXTRA_RESERVED_VRAM=2.54GB (自动模式: 总显存=24.00GB, 已用=1.94GB)
      FETCH ComfyRegistry Data: 55/135
      FETCH ComfyRegistry Data: 60/135
      FETCH ComfyRegistry Data: 65/135
      FETCH ComfyRegistry Data: 70/135
      FETCH ComfyRegistry Data: 75/135
      FETCH ComfyRegistry Data: 80/135
      FETCH ComfyRegistry Data: 85/135
      CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
      [ReservedVRAM]执行前置GPU显存清理...
      [ReservedVRAM]GPU显存清理完成
      [ReservedVRAM]set EXTRA_RESERVED_VRAM=2.53GB (自动模式: 总显存=24.00GB, 已用=1.93GB)
      FETCH ComfyRegistry Data: 90/135
      Found quantization metadata version 1
      Detected mixed precision quantization
      Using mixed precision operations
      model weight dtype torch.bfloat16, manual cast: torch.bfloat16
      model_type FLUX
      VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
      no CLIP/text encoder weights in checkpoint, the text encoder model will not be loaded.
      FETCH ComfyRegistry Data: 95/135
      Requested to load LTXAVTEModel_
      FETCH ComfyRegistry Data: 100/135
      loaded partially; 18774.56 MB usable, 18212.06 MB loaded, 7228.54 MB offloaded, 562.50 MB buffer reserved, lowvram patches: 0
      loaded partially; 18474.69 MB usable, 17912.19 MB loaded, 7528.57 MB offloaded, 562.50 MB buffer reserved, lowvram patches: 0
      Requested to load VideoVAE
      FETCH ComfyRegistry Data: 105/135
      loaded completely; 19823.23 MB usable, 1384.94 MB loaded, full load: True
      FETCH ComfyRegistry Data: 110/135
      Windows fatal exception: access violation

Stack (most recent call first):
File "H:\ai\comfyui2\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\storage.py", line 466 in getitem
File "H:\ai\comfyui2\ComfyUI_windows_portable\ComfyUI\comfy\utils.py", line 136 in load_torch_file
File "H:\ai\comfyui2\ComfyUI_windows_portable\ComfyUI\comfy_extras\nodes_lt_audio.py", line 30 in execute
File "H:\ai\comfyui2\ComfyUI_windows_portable\ComfyUI\comfy_api\latest_io.py", line 1764 in EXECUTE_NORMALIZED
File "H:\ai\comfyui2\ComfyUI_windows_portable\ComfyUI\comfy_api\internal_init_.py", line 149 in wrapped_func
File "H:\ai\comfyui2\ComfyUI_windows_portable\ComfyUI\execution.py", line 296 in process_inputs
File "H:\ai\comfyui2\ComfyUI_windows_portable\ComfyUI\execution.py", line 308 in _async_map_node_over_list
File "H:\ai\comfyui2\ComfyUI_windows_portable\ComfyUI\execution.py", line 334 in get_output_data
File "H:\ai\comfyui2\ComfyUI_windows_portable\ComfyUI\execution.py", line 525 in execute
File "H:\ai\comfyui2\ComfyUI_windows_portable\ComfyUI\execution.py", line 758 in execute_async
File "H:\ai\comfyui2\ComfyUI_windows_portable\python_embeded\Lib\asyncio\events.py", line 88 in _run
File "H:\ai\comfyui2\ComfyUI_windows_portable\python_embeded\Lib\asyncio\base_events.py", line 1986 in _run_once
File "H:\ai\comfyui2\ComfyUI_windows_portable\python_embeded\Lib\asyncio\base_events.py", line 641 in run_forever
File "H:\ai\comfyui2\ComfyUI_windows_portable\python_embeded\Lib\asyncio\windows_events.py", line 322 in run_forever
File "H:\ai\comfyui2\ComfyUI_windows_portable\python_embeded\Lib\asyncio\base_events.py", line 674 in run_until_complete
File "H:\ai\comfyui2\ComfyUI_windows_portable\python_embeded\Lib\asyncio\runners.py", line 118 in run
File "H:\ai\comfyui2\ComfyUI_windows_portable\python_embeded\Lib\asyncio\runners.py", line 194 in run
File "H:\ai\comfyui2\ComfyUI_windows_portable\ComfyUI\execution.py", line 702 in execute
File "H:\ai\comfyui2\ComfyUI_windows_portable\ComfyUI\main.py", line 276 in prompt_worker
File "H:\ai\comfyui2\ComfyUI_windows_portable\python_embeded\Lib\threading.py", line 1012 in run
File "H:\ai\comfyui2\ComfyUI_windows_portable\python_embeded\Lib\threading.py", line 1075 in _bootstrap_inner
File "H:\ai\comfyui2\ComfyUI_windows_portable\python_embeded\Lib\threading.py", line 1032 in _bootstrap

H:\ai\comfyui2\ComfyUI_windows_portable>pause
请按任意键继续. . .

Actual Behavior

我的本地显卡为3090,64g内存,当我跑ltx2.3时,偶尔加载成功可以跑通工作流,但是经常是会出现加载模型时后台自动崩溃,比如跑完ltx2.3的工作流时,再换个其他的工作流跑时就一定会出现后台崩溃。以下是我问claud的错误回复:

这个 bug 在 comfy_aimdo 模块里,让我看看它是 Python 代码还是编译的二进制文件:

Ran command: Get-ChildItem -Path "H:\ai\comfyui2\ComfyUI_windows_portable\python_embeded\Lib\site-packages\comfy_aimdo" -Recurse | Select-Object FullName, Extension, Length | Format-Table -AutoSize
Ran command: Get-ChildItem -Path "H:\ai\comfyui2\ComfyUI_windows_portable\python_embeded\Lib\site-packages\comfy_aimdo" -Recurse | ForEach-Object { $_.FullName + " | " + $_.Extension + " | " + $_.Length }
Listed directory comfy_aimdo
Viewed model_mmap.py:1-48

这就是答案了:

没办法修复,因为 bug 在编译好的 DLL 里

comfy_aimdo/
├── aimdo.dll          ← 223KB 编译的二进制文件(C/C++ 写的)
├── model_mmap.py      ← Python 封装层,只是调用 DLL
└── ...

model_mmap.py 只是一层薄封装,真正的 mmap 逻辑在 aimdo.dll 里:

self.state = lib.model_mmap_allocate(filepath_bytes)  # → 调用 DLL
return lib.model_mmap_get(self.state)                  # → 调用 DLL
  • aimdo.dll 是 ComfyUI 官方团队用 C/C++ 编译的闭源二进制文件
  • 我们看不到源码,改不了
  • access violation 就发生在 DLL 返回的内存地址被 PyTorch 访问时

所以只有两个选择

方案 可行性
修复 aimdo.dll ❌ 闭源二进制,无法修改,只能等官方更新
绕过它(--disable-mmap ✅ 已帮你做好,代价极小

--disable-mmap 后模型文件改用标准 safetensors.safe_open() 方式加载,加载速度增加几秒,但推理速度完全不受影响。这是目前唯一可行的方案。

Steps to Reproduce

当我跑ltx2.3时,偶尔加载成功可以跑通工作流,但是经常是会出现加载模型时后台自动崩溃,比如跑完ltx2.3的工作流时,再换个其他的工作流跑时如z image就一定会出现后台崩溃。

Debug Logs

FETCH ComfyRegistry Data: 10/135
Exception in callback _ProactorBasePipeTransport._call_connection_lost(None)
handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)>
Traceback (most recent call last):
  File "H:\ai\comfyui2\ComfyUI_windows_portable\python_embeded\Lib\asyncio\events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "H:\ai\comfyui2\ComfyUI_windows_portable\python_embeded\Lib\asyncio\proactor_events.py", line 165, in _call_connection_lost
    self._sock.shutdown(socket.SHUT_RDWR)
ConnectionResetError: [WinError 10054] 远程主机强迫关闭了一个现有的连接。
Exception in callback _ProactorBasePipeTransport._call_connection_lost(None)
handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)>
Traceback (most recent call last):
  File "H:\ai\comfyui2\ComfyUI_windows_portable\python_embeded\Lib\asyncio\events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "H:\ai\comfyui2\ComfyUI_windows_portable\python_embeded\Lib\asyncio\proactor_events.py", line 165, in _call_connection_lost
    self._sock.shutdown(socket.SHUT_RDWR)
ConnectionResetError: [WinError 10054] 远程主机强迫关闭了一个现有的连接。
FETCH ComfyRegistry Data: 15/135
FETCH ComfyRegistry Data: 20/135
FETCH ComfyRegistry Data: 25/135
FETCH ComfyRegistry Data: 30/135
FETCH ComfyRegistry Data: 35/135
FETCH ComfyRegistry Data: 40/135
FETCH ComfyRegistry Data: 45/135
got prompt
Failed to validate prompt for output 704:
* (prompt):
  - Required input is missing: images
* VHS_VideoCombine 704:
  - Required input is missing: images
Output will be ignored
FETCH ComfyRegistry Data: 50/135
[ReservedVRAM]执行前置GPU显存清理...
[ReservedVRAM]GPU显存清理完成
[ReservedVRAM]set EXTRA_RESERVED_VRAM=2.59GB (自动模式: 总显存=24.00GB, 已用=1.99GB)
[DynamicRAMCache] Switched mode: CLASSIC -> RAM_PRESSURE (Headroom: 4.0GB)
[ReservedVRAM]执行前置GPU显存清理...
[ReservedVRAM]GPU显存清理完成
[ReservedVRAM]set EXTRA_RESERVED_VRAM=2.54GB (自动模式: 总显存=24.00GB, 已用=1.94GB)
FETCH ComfyRegistry Data: 55/135
FETCH ComfyRegistry Data: 60/135
FETCH ComfyRegistry Data: 65/135
FETCH ComfyRegistry Data: 70/135
FETCH ComfyRegistry Data: 75/135
FETCH ComfyRegistry Data: 80/135
FETCH ComfyRegistry Data: 85/135
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
[ReservedVRAM]执行前置GPU显存清理...
[ReservedVRAM]GPU显存清理完成
[ReservedVRAM]set EXTRA_RESERVED_VRAM=2.53GB (自动模式: 总显存=24.00GB, 已用=1.93GB)
FETCH ComfyRegistry Data: 90/135
Found quantization metadata version 1
Detected mixed precision quantization
Using mixed precision operations
model weight dtype torch.bfloat16, manual cast: torch.bfloat16
model_type FLUX
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
no CLIP/text encoder weights in checkpoint, the text encoder model will not be loaded.
FETCH ComfyRegistry Data: 95/135
Requested to load LTXAVTEModel_
FETCH ComfyRegistry Data: 100/135
loaded partially; 18774.56 MB usable, 18212.06 MB loaded, 7228.54 MB offloaded, 562.50 MB buffer reserved, lowvram patches: 0
loaded partially; 18474.69 MB usable, 17912.19 MB loaded, 7528.57 MB offloaded, 562.50 MB buffer reserved, lowvram patches: 0
Requested to load VideoVAE
FETCH ComfyRegistry Data: 105/135
loaded completely; 19823.23 MB usable, 1384.94 MB loaded, full load: True
FETCH ComfyRegistry Data: 110/135
Windows fatal exception: access violation

Stack (most recent call first):
  File "H:\ai\comfyui2\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\storage.py", line 466 in __getitem__
  File "H:\ai\comfyui2\ComfyUI_windows_portable\ComfyUI\comfy\utils.py", line 136 in load_torch_file
  File "H:\ai\comfyui2\ComfyUI_windows_portable\ComfyUI\comfy_extras\nodes_lt_audio.py", line 30 in execute
  File "H:\ai\comfyui2\ComfyUI_windows_portable\ComfyUI\comfy_api\latest\_io.py", line 1764 in EXECUTE_NORMALIZED
  File "H:\ai\comfyui2\ComfyUI_windows_portable\ComfyUI\comfy_api\internal\__init__.py", line 149 in wrapped_func
  File "H:\ai\comfyui2\ComfyUI_windows_portable\ComfyUI\execution.py", line 296 in process_inputs
  File "H:\ai\comfyui2\ComfyUI_windows_portable\ComfyUI\execution.py", line 308 in _async_map_node_over_list
  File "H:\ai\comfyui2\ComfyUI_windows_portable\ComfyUI\execution.py", line 334 in get_output_data
  File "H:\ai\comfyui2\ComfyUI_windows_portable\ComfyUI\execution.py", line 525 in execute
  File "H:\ai\comfyui2\ComfyUI_windows_portable\ComfyUI\execution.py", line 758 in execute_async
  File "H:\ai\comfyui2\ComfyUI_windows_portable\python_embeded\Lib\asyncio\events.py", line 88 in _run
  File "H:\ai\comfyui2\ComfyUI_windows_portable\python_embeded\Lib\asyncio\base_events.py", line 1986 in _run_once
  File "H:\ai\comfyui2\ComfyUI_windows_portable\python_embeded\Lib\asyncio\base_events.py", line 641 in run_forever
  File "H:\ai\comfyui2\ComfyUI_windows_portable\python_embeded\Lib\asyncio\windows_events.py", line 322 in run_forever
  File "H:\ai\comfyui2\ComfyUI_windows_portable\python_embeded\Lib\asyncio\base_events.py", line 674 in run_until_complete
  File "H:\ai\comfyui2\ComfyUI_windows_portable\python_embeded\Lib\asyncio\runners.py", line 118 in run
  File "H:\ai\comfyui2\ComfyUI_windows_portable\python_embeded\Lib\asyncio\runners.py", line 194 in run
  File "H:\ai\comfyui2\ComfyUI_windows_portable\ComfyUI\execution.py", line 702 in execute
  File "H:\ai\comfyui2\ComfyUI_windows_portable\ComfyUI\main.py", line 276 in prompt_worker
  File "H:\ai\comfyui2\ComfyUI_windows_portable\python_embeded\Lib\threading.py", line 1012 in run
  File "H:\ai\comfyui2\ComfyUI_windows_portable\python_embeded\Lib\threading.py", line 1075 in _bootstrap_inner
  File "H:\ai\comfyui2\ComfyUI_windows_portable\python_embeded\Lib\threading.py", line 1032 in _bootstrap

H:\ai\comfyui2\ComfyUI_windows_portable>pause
请按任意键继续. . .

Other

No response

Metadata

Metadata

Assignees

Labels

Potential BugUser is reporting a bug. This should be tested.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions