Skip to content

benchmark/serve OOM under wgpu when auto-detect picks all adapters on multi-GPU Windows #61

@adamtpang

Description

@adamtpang

benchmark and serve OOM under wgpu when auto-detect picks all adapters on multi-GPU Windows

Environment

  • Miner version: miner-cli 2.1.2+0a07bb2d
  • OS: Windows 11 Home, 10.0.26200, x64
  • Hardware:
    • AMD Radeon Graphics (integrated, shares system RAM)
    • NVIDIA GeForce RTX 3070 Laptop GPU (discrete)
  • CPU: 16 cores

Symptom

Default benchmark (no flags) auto-detects 5 GPU "devices" because wgpu enumerates each physical GPU once per backend (Vulkan + DX12) plus a Microsoft Basic Render Driver software fallback:

[engine_gpu] GPU engine initialized with 5 devices
[engine_gpu] 📊 GPU Device 0: AMD Radeon(TM) Graphics    | Backend: Vulkan | IntegratedGpu
[engine_gpu] 📊 GPU Device 1: NVIDIA GeForce RTX 3070 LP | Backend: Vulkan | DiscreteGpu
[engine_gpu] 📊 GPU Device 2: AMD Radeon(TM) Graphics    | Backend: Dx12   | IntegratedGpu
[engine_gpu] 📊 GPU Device 3: NVIDIA GeForce RTX 3070 LP | Backend: Dx12   | DiscreteGpu
[engine_gpu] 📊 GPU Device 4: Microsoft Basic Render Driver | Backend: Dx12 | Cpu
[miner_service] Auto-detected 5 GPU device(s). Using all available GPUs.

The miner spawns a worker on each, then panics during chunk allocation:

thread '<unnamed>' (24268) panicked at wgpu-27.0.1\src\backend\wgpu_core.rs:1570:18:
wgpu error: Out of Memory

This kills the whole process. Same failure under serve.

Root cause (likely)

Three problems compound:

  1. Backend duplication — each physical GPU appears twice (Vulkan + DX12). Workers on both backends compete for the same VRAM.
  2. Software adapter included — the Microsoft Basic Render Driver (CPU-emulated) is enumerated as a GPU and given the same chunk size as a real GPU.
  3. Integrated GPU sharing system RAM — the AMD iGPU shares the 16 GB system pool with everything else (browser, IDE, etc.). The default chunk size (100M hashes per the serve --chunk-size help text) overflows fast.

Suggested fixes

  1. Filter out DeviceType::Cpu adapters by default. A CPU-emulated software fallback is never useful for PoW mining and confuses auto-detect.
  2. Deduplicate by physical adapter, preferring one backend per device. Vulkan is usually the right pick on NVIDIA + Linux/Windows; on macOS only Metal is available. A simple (vendor_id, device_id) dedupe would do it.
  3. Lower default chunk size when integrated GPU is selected, or skip integrated GPUs by default and let users opt in via flag.
  4. Document MINER_GPU_DEVICES semantics — currently it's a count, not an explicit selector. On a heterogeneous machine, "use 1 GPU" picks the first enumerated, which is usually the integrated one. A flag like --gpu-adapter <name-substring> or --gpu-device-index <i> would let users target the discrete card directly.

Workaround attempted

  • WGPU_BACKEND=vulkan ... --gpu-devices 1 — produced 0 bytes of output, process appeared to hang or fail silently before any progress line. Did not OOM, but did not work either.
  • --cpu-workers 16 --gpu-devices 0 (CPU-only) — works fine, 360 KH/s sustained on 16 cores. Confirms the OOM is GPU-pathway-specific.

Why this matters

The OOM affects the most common Windows laptop hardware profile (one integrated + one discrete GPU). Right now a user with a strong NVIDIA card cannot run the GPU benchmark without manual tuning, and silent-fail cases like the Vulkan-only attempt above leave them with no signal at all.

Happy to PR

If a maintainer can confirm the desired filtering policy (drop CPU-emulated, prefer one backend per physical device), I can put up a PR adding the filter + a --gpu-adapter selector and tests.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions