fix(gpu): filter CPU-emulated adapters and dedupe per physical GPU#62
Open
adamtpang wants to merge 1 commit into
Open
fix(gpu): filter CPU-emulated adapters and dedupe per physical GPU#62adamtpang wants to merge 1 commit into
adamtpang wants to merge 1 commit into
Conversation
On Windows wgpu enumerates each physical GPU once per backend (Vulkan
plus DX12) and additionally exposes a CPU-emulated software fallback
("Microsoft Basic Render Driver"). The current init path spawns a
worker context for every enumerated entry, which on a typical laptop
(NVIDIA dGPU + AMD iGPU) yields 5 contexts:
Device 0: AMD Radeon Graphics | Vulkan | IntegratedGpu
Device 1: NVIDIA RTX 3070 | Vulkan | DiscreteGpu
Device 2: AMD Radeon Graphics | Dx12 | IntegratedGpu
Device 3: NVIDIA RTX 3070 | Dx12 | DiscreteGpu
Device 4: Microsoft Basic Render | Dx12 | Cpu
Spawning a chunk on every entry drives VRAM contention and OOMs the
process during benchmark/serve startup.
This commit adds two surgical filters before context construction:
1. Drop adapters whose `device_type == DeviceType::Cpu` — software
fallbacks are never useful for PoW mining.
2. Dedupe per `(vendor, device)` physical-adapter key, keeping the
highest-priority backend (Vulkan or Metal first, then DX12).
The filtered list is then sorted discrete > integrated > virtual so
device enumeration is stable across runs. Skipped/dropped adapters
are logged at info level for diagnosability.
No public API change; `GpuEngine::try_new` still takes only a batch
size, and `device_count()` now reflects unique physical adapters.
Closes Quantus-Network#61
adamtpang
pushed a commit
to adamtpang/quantus.com
that referenced
this pull request
May 16, 2026
PR Quantus-Network#62 already filters Cpu adapters, dedupes each physical GPU across backends, and sorts discrete-first. It still *returns* the integrated GPU, so the worker pool round-robins onto it on hybrid laptops. This commit drops the parallel structure from the previous commit and restructures the change as a minimal follow-up that layers on Quantus-Network#62: * prefer_discrete_adapters(): runs after Quantus-Network#62's filter_and_dedupe_adapters and drops integrated/virtual/other adapters whenever any discrete GPU is present (falls back to keeping all if there is no discrete GPU, so integrated-only machines still mine). * discrete_preference_indices(): pure policy split out for unit testing with synthetic AdapterInfo; 4 tests cover hybrid-laptop, dual-discrete, integrated-only, and discrete+integrated+virtual. backend_priority/filter_and_dedupe_adapters are reconstructed here as a clearly-marked STAND-IN block so the branch compiles and tests run off main; that block is deleted when rebasing onto the merged Quantus-Network#62. The --gpu-devices N cap (breaking try_new signature change) is intentionally NOT included here; it is a separate follow-up so Quantus-Network#62 stays a tight single-purpose fix. try_new keeps its original signature, so miner-service / benches / example are reverted to their main state. https://claude.ai/code/session_01M8TsvfAHfST4D8zTDevK4x
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
On Windows, wgpu enumerates each physical GPU once per backend (Vulkan + DX12) and additionally exposes a CPU-emulated software fallback (
Microsoft Basic Render Driver). The currentGpuEngine::initbuilds a worker context for every enumerated entry, which on a typical laptop (NVIDIA dGPU + AMD iGPU) yields 5 contexts and OOMs at chunk allocation. Repro and full benchmark logs are in #61.This PR filters and deduplicates adapters before context construction:
device_type == DeviceType::Cpu— software fallbacks are never useful for PoW mining.(vendor, device)physical-adapter key, keeping the entry with the highest-priority backend (Vulkan/Metal first, then DX12, then GL, then BrowserWebGpu).Skipped and dropped adapters are logged at
infolevel so users can see what's happening on a multi-GPU box.What's not in this PR
I deliberately left out a
--gpu-adapter <substring>selector flag for explicit per-machine targeting. That's a useful follow-up but adds CLI surface and crosses crate boundaries; this PR aims to fix the OOM with zero new flags so it can land surgically.Tested by reading; flagging for build verification
I do not have a Rust toolchain on this machine and have not run
cargo build/cargo testlocally. The change is single-file (crates/engine-gpu/src/lib.rs), uses no new dependencies, and adds no public API surface —GpuEngine::try_newanddevice_count()are unchanged for callers. Bothmatcharms use_ =>fallbacks so they're forward-compatible with future non-exhaustive enum variants inwgpu::Backend/wgpu::DeviceType.I'd appreciate a maintainer running
cargo check -p engine-gpuand the existing GPU benches to confirm — and I'm happy to iterate on naming/log levels/style.How to repro the original OOM
On Windows 10/11 with one integrated and one discrete GPU:
Default auto-detect picks all 5 wgpu entries, spawns workers on each, and panics with
wgpu error: Out of Memoryduring chunk allocation. After this PR, the same command should select 2 contexts (one per physical GPU) on the same machine.Test plan
cargo check -p engine-gpucargo clippy -p engine-gpucargo bench -p engine-gpuon a multi-GPU Windows machine; expect 1 context per physical GPU rather than per backend, and no OOMdevice_count()is unchanged (1)Closes #61