benchmark and serve OOM under wgpu when auto-detect picks all adapters on multi-GPU Windows
Environment
- Miner version:
miner-cli 2.1.2+0a07bb2d
- OS: Windows 11 Home, 10.0.26200, x64
- Hardware:
- AMD Radeon Graphics (integrated, shares system RAM)
- NVIDIA GeForce RTX 3070 Laptop GPU (discrete)
- CPU: 16 cores
Symptom
Default benchmark (no flags) auto-detects 5 GPU "devices" because wgpu enumerates each physical GPU once per backend (Vulkan + DX12) plus a Microsoft Basic Render Driver software fallback:
[engine_gpu] GPU engine initialized with 5 devices
[engine_gpu] 📊 GPU Device 0: AMD Radeon(TM) Graphics | Backend: Vulkan | IntegratedGpu
[engine_gpu] 📊 GPU Device 1: NVIDIA GeForce RTX 3070 LP | Backend: Vulkan | DiscreteGpu
[engine_gpu] 📊 GPU Device 2: AMD Radeon(TM) Graphics | Backend: Dx12 | IntegratedGpu
[engine_gpu] 📊 GPU Device 3: NVIDIA GeForce RTX 3070 LP | Backend: Dx12 | DiscreteGpu
[engine_gpu] 📊 GPU Device 4: Microsoft Basic Render Driver | Backend: Dx12 | Cpu
[miner_service] Auto-detected 5 GPU device(s). Using all available GPUs.
The miner spawns a worker on each, then panics during chunk allocation:
thread '<unnamed>' (24268) panicked at wgpu-27.0.1\src\backend\wgpu_core.rs:1570:18:
wgpu error: Out of Memory
This kills the whole process. Same failure under serve.
Root cause (likely)
Three problems compound:
- Backend duplication — each physical GPU appears twice (Vulkan + DX12). Workers on both backends compete for the same VRAM.
- Software adapter included — the
Microsoft Basic Render Driver (CPU-emulated) is enumerated as a GPU and given the same chunk size as a real GPU.
- Integrated GPU sharing system RAM — the AMD iGPU shares the 16 GB system pool with everything else (browser, IDE, etc.). The default chunk size (100M hashes per the
serve --chunk-size help text) overflows fast.
Suggested fixes
- Filter out
DeviceType::Cpu adapters by default. A CPU-emulated software fallback is never useful for PoW mining and confuses auto-detect.
- Deduplicate by physical adapter, preferring one backend per device. Vulkan is usually the right pick on NVIDIA + Linux/Windows; on macOS only Metal is available. A simple
(vendor_id, device_id) dedupe would do it.
- Lower default chunk size when integrated GPU is selected, or skip integrated GPUs by default and let users opt in via flag.
- Document
MINER_GPU_DEVICES semantics — currently it's a count, not an explicit selector. On a heterogeneous machine, "use 1 GPU" picks the first enumerated, which is usually the integrated one. A flag like --gpu-adapter <name-substring> or --gpu-device-index <i> would let users target the discrete card directly.
Workaround attempted
WGPU_BACKEND=vulkan ... --gpu-devices 1 — produced 0 bytes of output, process appeared to hang or fail silently before any progress line. Did not OOM, but did not work either.
--cpu-workers 16 --gpu-devices 0 (CPU-only) — works fine, 360 KH/s sustained on 16 cores. Confirms the OOM is GPU-pathway-specific.
Why this matters
The OOM affects the most common Windows laptop hardware profile (one integrated + one discrete GPU). Right now a user with a strong NVIDIA card cannot run the GPU benchmark without manual tuning, and silent-fail cases like the Vulkan-only attempt above leave them with no signal at all.
Happy to PR
If a maintainer can confirm the desired filtering policy (drop CPU-emulated, prefer one backend per physical device), I can put up a PR adding the filter + a --gpu-adapter selector and tests.
benchmarkandserveOOM under wgpu when auto-detect picks all adapters on multi-GPU WindowsEnvironment
miner-cli 2.1.2+0a07bb2dSymptom
Default
benchmark(no flags) auto-detects 5 GPU "devices" because wgpu enumerates each physical GPU once per backend (Vulkan + DX12) plus aMicrosoft Basic Render Driversoftware fallback:The miner spawns a worker on each, then panics during chunk allocation:
This kills the whole process. Same failure under
serve.Root cause (likely)
Three problems compound:
Microsoft Basic Render Driver(CPU-emulated) is enumerated as a GPU and given the same chunk size as a real GPU.serve --chunk-sizehelp text) overflows fast.Suggested fixes
DeviceType::Cpuadapters by default. A CPU-emulated software fallback is never useful for PoW mining and confuses auto-detect.(vendor_id, device_id)dedupe would do it.MINER_GPU_DEVICESsemantics — currently it's a count, not an explicit selector. On a heterogeneous machine, "use 1 GPU" picks the first enumerated, which is usually the integrated one. A flag like--gpu-adapter <name-substring>or--gpu-device-index <i>would let users target the discrete card directly.Workaround attempted
WGPU_BACKEND=vulkan ... --gpu-devices 1— produced 0 bytes of output, process appeared to hang or fail silently before any progress line. Did not OOM, but did not work either.--cpu-workers 16 --gpu-devices 0(CPU-only) — works fine, 360 KH/s sustained on 16 cores. Confirms the OOM is GPU-pathway-specific.Why this matters
The OOM affects the most common Windows laptop hardware profile (one integrated + one discrete GPU). Right now a user with a strong NVIDIA card cannot run the GPU benchmark without manual tuning, and silent-fail cases like the Vulkan-only attempt above leave them with no signal at all.
Happy to PR
If a maintainer can confirm the desired filtering policy (drop CPU-emulated, prefer one backend per physical device), I can put up a PR adding the filter + a
--gpu-adapterselector and tests.