Blackwell RTX 5060 detected correctly in v3.1.0, but throughput remains very low

Follow-up on quantus-miner v3.1.0 (RTX 50-series, 2x RTX 5060, Ubuntu)

Environment:
- quantus-node: v0.6.4-phase-alignment
- quantus-miner: v3.1.0 (8de756f4)
- quantus-cli: v1.3.3
- OS: Ubuntu Linux
- GPUs: 2x NVIDIA GeForce RTX 5060
- Driver: 590.48.01
- Backend: Vulkan

Good news: the RTX 50-series tier is now correctly detected in v3.1.0.
Startup log (benchmark) shows:

  GPU detected: NVIDIA GeForce RTX 5060 | tier: NVIDIA RTX 50 (Blackwell) | workgroups: 9362 (max: 65535)

This is a clear improvement over the v3.0.1 behavior reported in #55
where the same hardware was dispatched at 3276 workgroups (fallback path).

End-to-end works now:
- miner_gpu_hash_rate / miner_hash_rate / miner_hashes_total are exposed
- GPU utilization reaches 100% on both GPUs
- Power draw ~85-90W per GPU
- Reward collect via CLI succeeds (82 transfers claimed successfully)

Performance numbers, however, still look lower than expected.

Benchmark (10s) on 2x RTX 5060, v3.1.0:
- Average rate: 13.77 MH/s (total)
- Per-worker: 983.90 KH/s
- Per-GPU steady: ~6.3 MH/s (GPU0), ~6.45 MH/s (GPU1)

Runtime (Prometheus, 30s delta):
- ~22.7 MH/s estimated total

Relative to the docs range of 500-1000 MH/s per GPU scaled by CUDA cores
and memory bandwidth (RTX 5060: 3840 cores, 448 GB/s GDDR7),
a rough expectation for 2x RTX 5060 would be ~800-1400 MH/s total.
Observed is roughly 1-3% of that band.

Given that the workgroups dispatch is no longer the fallback (9362 vs old 3276)
but per-GPU throughput is still flat around 6.4 MH/s, the remaining ceiling
may be in the shader/kernel path or in batch size (logs show fixed
1,000,000 nonces per batch, ~0.15s each), rather than dispatch sizing.

One minor observation: an additional adapter is listed as
"llvmpipe (LLVM 20.1.2, 256 bits)" and is assigned a worker thread alongside
the two real GPUs. Using --gpu-devices 2, three devices end up initialized
(2 Blackwell + 1 llvmpipe fallback). Not sure if this is intended.

Happy to provide:
- full benchmark stdout+stderr
- quantus-miner /metrics output
- nvidia-smi pmon output
- startup log of `quantus-miner serve`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blackwell RTX 5060 detected correctly in v3.1.0, but throughput remains very low #60

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Blackwell RTX 5060 detected correctly in v3.1.0, but throughput remains very low #60

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions