Skip to content

Confirmed on quantus-miner v3.0.1 in an NVIDIA RTX 50-series (Blackwell) environment on Ubuntu Linux. #55

@kyorozaka-dot

Description

@kyorozaka-dot

Confirmed on quantus-miner v3.0.1 in an NVIDIA RTX 50-series (Blackwell) environment on Ubuntu Linux.

Environment

  • quantus-miner: v3.0.1
  • quantus-node: v0.6.3
  • quantus-cli: v1.3.3
  • GPU generation: NVIDIA RTX 50-series (Blackwell)
  • Driver: NVIDIA 590.48.01
  • Backend: Vulkan
  • OS: Ubuntu 24.04 LTS
  • Node state: fully synced, peers=12, isSyncing=false

Setup

  • External miner connected to the node via QUIC on 127.0.0.1:9833
  • Node started with:
    • --validator
    • --chain planck
    • --miner-listen-port 9833
    • --prometheus-port 9616
    • --rewards-inner-hash
  • Miner started with:
    • serve
    • --node-addr 127.0.0.1:9833
    • --gpu-devices 2
    • --cpu-workers 0
    • --metrics-port 9900

Node side looks healthy

  • chain_height increases normally
  • isSyncing=false
  • peers=12
  • block_time_ema around 7-10 seconds
  • last_block_duration around 2-5 seconds
  • Over 30 seconds, chain_height advanced by 2 blocks while I watched.
    This does not appear to be a sync or connectivity issue.

What I observed on the miner side

  • The miner connects successfully to the node over QUIC.
  • Mining jobs are received continuously (hundreds of jobs over tens of minutes).
  • Jobs are dispatched to GPU workers for every detected GPU.
  • All expected GPUs are detected.
  • The miner process is visible as "quantus-miner" in nvidia-smi pmon on every GPU.

However, effective GPU mining performance is extremely low compared with expectations.

Prometheus metrics on the miner are very minimal. Only these appear:

  • miner_active_jobs 1
  • miner_cpu_workers 0
  • miner_effective_cpus 12
  • miner_gpu_devices (number of GPUs)
  • miner_workers (number of workers)

Metrics that I would expect but do NOT see:

  • miner_gpu_hash_rate
  • miner_hash_rate
  • miner_hashes_total

GPU utilization observed via nvidia-smi

  • GPU compute utilization is consistently around 0%.
  • Power draw stays very low (single-digit to low double-digit watts).
  • Core clock stays in low hundreds of MHz.
  • Memory usage is small.
  • GPUs never spin up to a real mining load, even though jobs are being received very frequently.

Relevant miner log excerpts

  • QUIC connection established to 127.0.0.1:9833
  • Connected to node
  • Waiting for mining jobs
  • Received job: id=...
  • Received job: id=...
  • Received job: id=...
  • Job dispatched to GPU workers
  • Worker thread assigned to GPU device 0
  • Worker thread assigned to GPU device 1
  • GPU dispatch config:
    • 3276 workgroups × 256 threads

So the miner is absolutely talking to the node and receiving jobs. The problem is on the GPU dispatch side.

Benchmark result
I also ran the built-in GPU benchmark separately:
./quantus-miner benchmark --gpu-devices 2 --duration 30

Detected adapters included the NVIDIA RTX 50-series GPUs and a CPU fallback adapter (llvmpipe). Benchmark output consistently shows:

  • Max hardware workgroups: 65535
  • Optimal workgroups: 3276
  • Per-GPU throughput: around 5-6 MH/s
  • Total throughput: around 12-13 MH/s

This is far below the GPU mining range described in the official guide
(roughly 500-1000 MH/s for GPU mining).

Why this looks like Issue #52
Issue #52 explains that RTX 50-series cards may fall through to the generic fallback dispatch path because there is no explicit "rtx 50" match in the vendor-specific dispatch logic.

The reported values

  • Max hardware workgroups: 65535
  • Optimal workgroups: 3276

exactly match the generic fallback branch:

  • 65535 / 20 = 3276

So this RTX 50-series setup appears to be using the fallback dispatch path rather than an RTX 40-class dispatch, which would cap effective performance well below the hardware capability.

This would also explain why:

  • Jobs are received at a very high rate,
  • GPU utilization never really climbs,
  • And miner_gpu_hash_rate / miner_hashes_total do not appear at all:
    the miner is doing small dispatch work and exhausting each range quickly.

Impact

  • The miner looks "connected and mining" from logs, but actual GPU throughput is dramatically lower than expected.
  • There is no Prometheus warning or metric that clearly signals this state.
  • From the operator side, this is very hard to diagnose without running the built-in benchmark.

Suggested fix
Please add an explicit "rtx 50" branch before the existing "rtx 40" branch, or replace the current substring matching with a more future-proof GPU generation parser.

For example, logic equivalent to:

  • if the adapter name indicates RTX 50-series, use the same dispatch class as RTX 40
  • otherwise keep the existing behavior

Additional improvements that would help operators

  • Emit a warning when an unknown GPU generation falls back to the generic dispatch path.
  • Export proper Prometheus metrics for actual GPU hashrate and total hashes during external mining, not only during benchmark.
  • Optionally log when dispatched jobs are completed far faster than expected, as a hint that dispatch is under-sized.

If it helps, I can provide:

  • more detailed miner debug logs,
  • more detailed benchmark output,
  • full node Prometheus metrics,
  • nvidia-smi query/pmon output,
  • node system_health output.

Happy to help test a candidate fix on RTX 50-series hardware once it is available.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions