Follow-up on quantus-miner v3.1.0 (RTX 50-series, 2x RTX 5060, Ubuntu)
Environment:
- quantus-node: v0.6.4-phase-alignment
- quantus-miner: v3.1.0 (8de756f)
- quantus-cli: v1.3.3
- OS: Ubuntu Linux
- GPUs: 2x NVIDIA GeForce RTX 5060
- Driver: 590.48.01
- Backend: Vulkan
Good news: the RTX 50-series tier is now correctly detected in v3.1.0.
Startup log (benchmark) shows:
GPU detected: NVIDIA GeForce RTX 5060 | tier: NVIDIA RTX 50 (Blackwell) | workgroups: 9362 (max: 65535)
This is a clear improvement over the v3.0.1 behavior reported in #55
where the same hardware was dispatched at 3276 workgroups (fallback path).
End-to-end works now:
- miner_gpu_hash_rate / miner_hash_rate / miner_hashes_total are exposed
- GPU utilization reaches 100% on both GPUs
- Power draw ~85-90W per GPU
- Reward collect via CLI succeeds (82 transfers claimed successfully)
Performance numbers, however, still look lower than expected.
Benchmark (10s) on 2x RTX 5060, v3.1.0:
- Average rate: 13.77 MH/s (total)
- Per-worker: 983.90 KH/s
- Per-GPU steady: ~6.3 MH/s (GPU0), ~6.45 MH/s (GPU1)
Runtime (Prometheus, 30s delta):
- ~22.7 MH/s estimated total
Relative to the docs range of 500-1000 MH/s per GPU scaled by CUDA cores
and memory bandwidth (RTX 5060: 3840 cores, 448 GB/s GDDR7),
a rough expectation for 2x RTX 5060 would be ~800-1400 MH/s total.
Observed is roughly 1-3% of that band.
Given that the workgroups dispatch is no longer the fallback (9362 vs old 3276)
but per-GPU throughput is still flat around 6.4 MH/s, the remaining ceiling
may be in the shader/kernel path or in batch size (logs show fixed
1,000,000 nonces per batch, ~0.15s each), rather than dispatch sizing.
One minor observation: an additional adapter is listed as
"llvmpipe (LLVM 20.1.2, 256 bits)" and is assigned a worker thread alongside
the two real GPUs. Using --gpu-devices 2, three devices end up initialized
(2 Blackwell + 1 llvmpipe fallback). Not sure if this is intended.
Happy to provide:
- full benchmark stdout+stderr
- quantus-miner /metrics output
- nvidia-smi pmon output
- startup log of
quantus-miner serve
Follow-up on quantus-miner v3.1.0 (RTX 50-series, 2x RTX 5060, Ubuntu)
Environment:
Good news: the RTX 50-series tier is now correctly detected in v3.1.0.
Startup log (benchmark) shows:
GPU detected: NVIDIA GeForce RTX 5060 | tier: NVIDIA RTX 50 (Blackwell) | workgroups: 9362 (max: 65535)
This is a clear improvement over the v3.0.1 behavior reported in #55
where the same hardware was dispatched at 3276 workgroups (fallback path).
End-to-end works now:
Performance numbers, however, still look lower than expected.
Benchmark (10s) on 2x RTX 5060, v3.1.0:
Runtime (Prometheus, 30s delta):
Relative to the docs range of 500-1000 MH/s per GPU scaled by CUDA cores
and memory bandwidth (RTX 5060: 3840 cores, 448 GB/s GDDR7),
a rough expectation for 2x RTX 5060 would be ~800-1400 MH/s total.
Observed is roughly 1-3% of that band.
Given that the workgroups dispatch is no longer the fallback (9362 vs old 3276)
but per-GPU throughput is still flat around 6.4 MH/s, the remaining ceiling
may be in the shader/kernel path or in batch size (logs show fixed
1,000,000 nonces per batch, ~0.15s each), rather than dispatch sizing.
One minor observation: an additional adapter is listed as
"llvmpipe (LLVM 20.1.2, 256 bits)" and is assigned a worker thread alongside
the two real GPUs. Using --gpu-devices 2, three devices end up initialized
(2 Blackwell + 1 llvmpipe fallback). Not sure if this is intended.
Happy to provide:
quantus-miner serve