Skip to content

Blackwell RTX 5060 detected correctly in v3.1.0, but throughput remains very low #60

@kyorozaka-dot

Description

@kyorozaka-dot

Follow-up on quantus-miner v3.1.0 (RTX 50-series, 2x RTX 5060, Ubuntu)

Environment:

  • quantus-node: v0.6.4-phase-alignment
  • quantus-miner: v3.1.0 (8de756f)
  • quantus-cli: v1.3.3
  • OS: Ubuntu Linux
  • GPUs: 2x NVIDIA GeForce RTX 5060
  • Driver: 590.48.01
  • Backend: Vulkan

Good news: the RTX 50-series tier is now correctly detected in v3.1.0.
Startup log (benchmark) shows:

GPU detected: NVIDIA GeForce RTX 5060 | tier: NVIDIA RTX 50 (Blackwell) | workgroups: 9362 (max: 65535)

This is a clear improvement over the v3.0.1 behavior reported in #55
where the same hardware was dispatched at 3276 workgroups (fallback path).

End-to-end works now:

  • miner_gpu_hash_rate / miner_hash_rate / miner_hashes_total are exposed
  • GPU utilization reaches 100% on both GPUs
  • Power draw ~85-90W per GPU
  • Reward collect via CLI succeeds (82 transfers claimed successfully)

Performance numbers, however, still look lower than expected.

Benchmark (10s) on 2x RTX 5060, v3.1.0:

  • Average rate: 13.77 MH/s (total)
  • Per-worker: 983.90 KH/s
  • Per-GPU steady: ~6.3 MH/s (GPU0), ~6.45 MH/s (GPU1)

Runtime (Prometheus, 30s delta):

  • ~22.7 MH/s estimated total

Relative to the docs range of 500-1000 MH/s per GPU scaled by CUDA cores
and memory bandwidth (RTX 5060: 3840 cores, 448 GB/s GDDR7),
a rough expectation for 2x RTX 5060 would be ~800-1400 MH/s total.
Observed is roughly 1-3% of that band.

Given that the workgroups dispatch is no longer the fallback (9362 vs old 3276)
but per-GPU throughput is still flat around 6.4 MH/s, the remaining ceiling
may be in the shader/kernel path or in batch size (logs show fixed
1,000,000 nonces per batch, ~0.15s each), rather than dispatch sizing.

One minor observation: an additional adapter is listed as
"llvmpipe (LLVM 20.1.2, 256 bits)" and is assigned a worker thread alongside
the two real GPUs. Using --gpu-devices 2, three devices end up initialized
(2 Blackwell + 1 llvmpipe fallback). Not sure if this is intended.

Happy to provide:

  • full benchmark stdout+stderr
  • quantus-miner /metrics output
  • nvidia-smi pmon output
  • startup log of quantus-miner serve

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions