Unable to run model on second GPU; CUDA error: operation not supported

I am attempting to run a model on a second GPU using a local ComfyUI worker, but I encounter a CUDA-related error. The master GPU is able to generate images without issues. The model I am testing is realismSDXL.

Additionally, I am exploring running the Flux-Dev model in FP8, which is slightly larger than my P100 GPU memory. I would like to know if there is a recommended way to split the model across two P100 GPUs.

```
==================================================
=== ComfyUI Worker Session Started ===
Worker: Worker 1
Port: 8189
CUDA Device: 1
Started: 2025-12-02 01:42:15
Command: /usr/local/bin/python /app/main.py --port 8189 --enable-cors-header --base-dir /app --listen
Note: Worker will stop when master shuts down
==============================

[Worker Monitor] Monitoring master PID: 1
[Distributed] Started worker PID: 76
[Distributed] Monitoring master PID: 1
Checkpoint files will always be loaded safely.
Total VRAM 16384 MB, total RAM 100554 MB
pytorch version: 2.4.1+cu118
Set vram state to: NORMAL_VRAM
Device: cuda:0 GRID P100-16Q : cudaMallocAsync
Using pytorch attention
Python version: 3.12.12 (main, Nov 18 2025, 05:56:04) [GCC 14.2.0]
ComfyUI version: 0.3.36
ComfyUI frontend version: 1.20.5
[Prompt Server] web root: /usr/local/lib/python3.12/site-packages/comfyui_frontend_package/static

Import times for custom nodes:
   0.0 seconds: /app/custom_nodes/websocket_image_save.py
   0.0 seconds: /app/custom_nodes/ComfyUI-Distributed

Starting server

To see the GUI go to: http://0.0.0.0:8189
To see the GUI go to: http://[::]:8189
got prompt
model weight dtype torch.float16, manual cast: torch.float32
model_type EPS
Using pytorch attention in VAE
Using pytorch attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.float32
Requested to load SDXLClipModel
loaded completely 9.5367431640625e+25 1560.802734375 True
CLIP/text encoder model load device: cpu, offload device: cpu, current: cpu, dtype: torch.float16
Token indices sequence length is longer than the specified maximum sequence length for this model (82 > 77). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (82 > 77). Running this sequence through the model will result in indexing errors
!!! Exception during processing !!! CUDA error: operation not supported
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Traceback (most recent call last):
  File "/app/execution.py", line 349, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/execution.py", line 224, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/execution.py", line 196, in _map_node_over_list
    process_inputs(input_dict, i)
  File "/app/execution.py", line 185, in process_inputs
    results.append(getattr(obj, func)(**inputs))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/nodes.py", line 1516, in sample
    return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/nodes.py", line 1483, in common_ksampler
    samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/comfy/sample.py", line 43, in sample
    sampler = comfy.samplers.KSampler(model, steps=steps, device=model.load_device, sampler=sampler_name, scheduler=scheduler, denoise=denoise, model_options=model.model_options)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/comfy/samplers.py", line 1083, in __init__
    self.set_steps(steps, denoise)
  File "/app/comfy/samplers.py", line 1104, in set_steps
    self.sigmas = self.calculate_sigmas(steps).to(self.device)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: operation not supported
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


Prompt executed in 24.10 seconds
```

**Environment Details:**

- ComfyUI version: 0.3.36
- Frontend version: 1.20.5
- PyTorch: 2.4.1+cu118
- Python: 3.12.12
- GPU: NVIDIA GRID P100-16Q (16 GB VRAM)
- Worker port: 8189

**Additional Information:**

- The master GPU can successfully generate images.
- I am using a local worker for the second GPU (CUDA Device: 1).
- I would like guidance on running larger models (like Flux-Dev in FP8) across multiple P100 GPUs, if possible.

**Questions:**

1. Is there a recommended way to split a model across two P100 GPUs for ComfyUI?
2. Are there any known issues with running FP16/FP8 models on GRID P100 vGPUs that could cause CUDA error: operation not supported?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unable to run model on second GPU; CUDA error: operation not supported #57

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Unable to run model on second GPU; CUDA error: operation not supported #57

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions