Confirmed on quantus-miner v3.0.1 in an NVIDIA RTX 50-series (Blackwell) environment on Ubuntu Linux.
Environment
- quantus-miner: v3.0.1
- quantus-node: v0.6.3
- quantus-cli: v1.3.3
- GPU generation: NVIDIA RTX 50-series (Blackwell)
- Driver: NVIDIA 590.48.01
- Backend: Vulkan
- OS: Ubuntu 24.04 LTS
- Node state: fully synced, peers=12, isSyncing=false
Setup
- External miner connected to the node via QUIC on 127.0.0.1:9833
- Node started with:
- --validator
- --chain planck
- --miner-listen-port 9833
- --prometheus-port 9616
- --rewards-inner-hash
- Miner started with:
- serve
- --node-addr 127.0.0.1:9833
- --gpu-devices 2
- --cpu-workers 0
- --metrics-port 9900
Node side looks healthy
- chain_height increases normally
- isSyncing=false
- peers=12
- block_time_ema around 7-10 seconds
- last_block_duration around 2-5 seconds
- Over 30 seconds, chain_height advanced by 2 blocks while I watched.
This does not appear to be a sync or connectivity issue.
What I observed on the miner side
- The miner connects successfully to the node over QUIC.
- Mining jobs are received continuously (hundreds of jobs over tens of minutes).
- Jobs are dispatched to GPU workers for every detected GPU.
- All expected GPUs are detected.
- The miner process is visible as "quantus-miner" in nvidia-smi pmon on every GPU.
However, effective GPU mining performance is extremely low compared with expectations.
Prometheus metrics on the miner are very minimal. Only these appear:
- miner_active_jobs 1
- miner_cpu_workers 0
- miner_effective_cpus 12
- miner_gpu_devices (number of GPUs)
- miner_workers (number of workers)
Metrics that I would expect but do NOT see:
- miner_gpu_hash_rate
- miner_hash_rate
- miner_hashes_total
GPU utilization observed via nvidia-smi
- GPU compute utilization is consistently around 0%.
- Power draw stays very low (single-digit to low double-digit watts).
- Core clock stays in low hundreds of MHz.
- Memory usage is small.
- GPUs never spin up to a real mining load, even though jobs are being received very frequently.
Relevant miner log excerpts
- QUIC connection established to 127.0.0.1:9833
- Connected to node
- Waiting for mining jobs
- Received job: id=...
- Received job: id=...
- Received job: id=...
- Job dispatched to GPU workers
- Worker thread assigned to GPU device 0
- Worker thread assigned to GPU device 1
- GPU dispatch config:
- 3276 workgroups × 256 threads
So the miner is absolutely talking to the node and receiving jobs. The problem is on the GPU dispatch side.
Benchmark result
I also ran the built-in GPU benchmark separately:
./quantus-miner benchmark --gpu-devices 2 --duration 30
Detected adapters included the NVIDIA RTX 50-series GPUs and a CPU fallback adapter (llvmpipe). Benchmark output consistently shows:
- Max hardware workgroups: 65535
- Optimal workgroups: 3276
- Per-GPU throughput: around 5-6 MH/s
- Total throughput: around 12-13 MH/s
This is far below the GPU mining range described in the official guide
(roughly 500-1000 MH/s for GPU mining).
Why this looks like Issue #52
Issue #52 explains that RTX 50-series cards may fall through to the generic fallback dispatch path because there is no explicit "rtx 50" match in the vendor-specific dispatch logic.
The reported values
- Max hardware workgroups: 65535
- Optimal workgroups: 3276
exactly match the generic fallback branch:
So this RTX 50-series setup appears to be using the fallback dispatch path rather than an RTX 40-class dispatch, which would cap effective performance well below the hardware capability.
This would also explain why:
- Jobs are received at a very high rate,
- GPU utilization never really climbs,
- And miner_gpu_hash_rate / miner_hashes_total do not appear at all:
the miner is doing small dispatch work and exhausting each range quickly.
Impact
- The miner looks "connected and mining" from logs, but actual GPU throughput is dramatically lower than expected.
- There is no Prometheus warning or metric that clearly signals this state.
- From the operator side, this is very hard to diagnose without running the built-in benchmark.
Suggested fix
Please add an explicit "rtx 50" branch before the existing "rtx 40" branch, or replace the current substring matching with a more future-proof GPU generation parser.
For example, logic equivalent to:
- if the adapter name indicates RTX 50-series, use the same dispatch class as RTX 40
- otherwise keep the existing behavior
Additional improvements that would help operators
- Emit a warning when an unknown GPU generation falls back to the generic dispatch path.
- Export proper Prometheus metrics for actual GPU hashrate and total hashes during external mining, not only during benchmark.
- Optionally log when dispatched jobs are completed far faster than expected, as a hint that dispatch is under-sized.
If it helps, I can provide:
- more detailed miner debug logs,
- more detailed benchmark output,
- full node Prometheus metrics,
- nvidia-smi query/pmon output,
- node system_health output.
Happy to help test a candidate fix on RTX 50-series hardware once it is available.
Confirmed on quantus-miner v3.0.1 in an NVIDIA RTX 50-series (Blackwell) environment on Ubuntu Linux.
Environment
Setup
Node side looks healthy
This does not appear to be a sync or connectivity issue.
What I observed on the miner side
However, effective GPU mining performance is extremely low compared with expectations.
Prometheus metrics on the miner are very minimal. Only these appear:
Metrics that I would expect but do NOT see:
GPU utilization observed via nvidia-smi
Relevant miner log excerpts
So the miner is absolutely talking to the node and receiving jobs. The problem is on the GPU dispatch side.
Benchmark result
I also ran the built-in GPU benchmark separately:
./quantus-miner benchmark --gpu-devices 2 --duration 30
Detected adapters included the NVIDIA RTX 50-series GPUs and a CPU fallback adapter (llvmpipe). Benchmark output consistently shows:
This is far below the GPU mining range described in the official guide
(roughly 500-1000 MH/s for GPU mining).
Why this looks like Issue #52
Issue #52 explains that RTX 50-series cards may fall through to the generic fallback dispatch path because there is no explicit "rtx 50" match in the vendor-specific dispatch logic.
The reported values
exactly match the generic fallback branch:
So this RTX 50-series setup appears to be using the fallback dispatch path rather than an RTX 40-class dispatch, which would cap effective performance well below the hardware capability.
This would also explain why:
the miner is doing small dispatch work and exhausting each range quickly.
Impact
Suggested fix
Please add an explicit "rtx 50" branch before the existing "rtx 40" branch, or replace the current substring matching with a more future-proof GPU generation parser.
For example, logic equivalent to:
Additional improvements that would help operators
If it helps, I can provide:
Happy to help test a candidate fix on RTX 50-series hardware once it is available.