This page is a shared results board for BenchmarkGPU.
If you benchmark your system and want to contribute a data point, please add a row to the table below based on your generated report. Feel free to open a pull request with your results.
- Use one row per tested device and runtime combination.
- If a column does not apply, write
N/A. - Prefer values from the plain-text report generated by the benchmark.
- If you changed important CLI settings such as matrix size or sample count, mention that in
Notes.
| Contributor | GPU Vendor | GPU Model | GPU Class | Runtime | Device Index | OS | PyTorch Version | Driver / Runtime Version | Matrix Size | FP32 (TFLOPS) | FP16 (TFLOPS) | BF16 (TFLOPS) | Mixed Precision (TFLOPS) | FP64 (TFLOPS) | TF32 (TFLOPS) | Stability Status | Notes |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BinaryOutlook | Apple | M4 - 10 Core | Integrated | MPS | N/A | macOS 26.2 arm64 | 2.10.0 | N/A | 4096 x 4096 | 2.83 | 3.25 | 3.25 | 2.80 | N/A | N/A | Stable | N/A |
| BinaryOutlook | NVIDIA | GeForce RTX 5070 | Discrete | CUDA | 0 | Windows 11 | 2.10.0 | 591.86 | 6144 x 6144 | 19.72 | 63.61 | 66.74 | 58.22 | 0.48 | 58.55 | Stable | N/A |
GPU Class:Integrated,Discrete, orExternalRuntime:CUDA,ROCm,Intel XPU,MPS, orCPUStability Status:Stable,Best-effort,Partial, orNeeds review