A cross-platform, terminal-based hardware benchmark tool written in Python.
Measures CPU, Memory, Disk, and GPU performance — with real-time system monitoring and structured JSON export.
PyBench is a lightweight benchmark suite designed to stress-test and profile hardware components using pure Python and OpenCL. It provides reproducible, comparable results across machines with a clean terminal UI powered by Rich.
| Module | Tests |
|---|---|
| CPU | Single-thread math, multi-thread parallel, zlib compression, PBKDF2 encryption, prime sieve |
| Memory | Sequential bandwidth, random access speed, memoryview copy speed, allocation stress |
| Disk | SEQ1M Q8T1 read/write, RND4K Q32T1 read/write, total IOPS |
| GPU | OpenCL compute kernel (sqrt · log · sin), host↔device memory bandwidth |
| Monitor | Real-time CPU %, frequency, RAM usage, GPU %, temperature, VRAM, disk I/O, network — with Avg / Min / Max / Std Dev |
| Scoring | Weighted score per module + overall composite score |
| Export | Full results saved as structured JSON with timestamp-based run ID |
╭──────────────────────────────────────────────────────────────╮
│ │
│ PyBench │
│ -------------------------------------------- │
│ Available Benchmarks: │
│ 1. CPU Benchmark │
│ 2. Memory Benchmark │
│ 3. Disk Benchmark │
│ 4. GPU Benchmark │
│ -------------------------------------------- │
│ Enter numbers separated by comma (e.g. 1,3) or 'all' to run. │
│ │
╰──────────────────────────────────────────────────────────────╯
Select benchmarks (e.g. 1,2,3) or 'all' (all): all
⠋ Running CPU Benchmark... ████████████████████ 100%
⠋ Running Memory Benchmark... ████████████████████ 100%
⠋ Running Disk Benchmark... ████████████████████ 100%
⠋ Running GPU Benchmark... ████████████████████ 100%
Benchmark Completed Successfully!
🔍 RAW BENCHMARK DETAILS
┌───────────┬─────────────────────────┬──────────────────┐
│ Component │ Test Case │ Result │
├───────────┼─────────────────────────┼──────────────────┤
│ DISK │ Sequential Read (Q8T1) │ 8.20 GB/s │
│ DISK │ Sequential Write (Q8T1) │ 626.00 MB/s │
│ DISK │ Random Read (Q32T1) │ 40787.11 IOPS │
│ DISK │ Random Write (Q32T1) │ 17604.28 IOPS │
│ MEMORY │ Sequential Bandwidth │ 4.09 GB/s │
│ MEMORY │ Random Access Speed │ 529,456 IOPS │
│ MEMORY │ Memory Copy Speed │ 5.63 MB/s │
│ CPU │ Multi-thread (Math) │ 9,882 Ops │
│ CPU │ Single-thread (Math) │ 4,319 Ops │
│ GPU │ Compute Performance │ 2.02 MOps/s │
│ GPU │ VRAM Bandwidth │ N/A │
└───────────┴─────────────────────────┴──────────────────┘
🏆 SCORE SUMMARY
┌───────────────┬───────────┐
│ Category │ Score │
├───────────────┼───────────┤
│ CPU │ 9,260 │
│ MEMORY │ 4,594 │
│ DISK │ 70,520 │
│ GPU │ 10 │
│ OVERALL SCORE │ 21,096 │
└───────────────┴───────────┘
📊 HARDWARE MONITORING SUMMARY
┌───────┬──────────────────────────────────────────────────┐
│ Group │ Metric Details (AVG / MIN-MAX) │
├───────┼──────────────────────────────────────────────────┤
│ CPU │ Usage: 36.17% | Clock: 763MHz | Temp: 64.6° │
│ RAM │ Usage: 61.27% (61.1% - 61.9%) │
│ GPU │ N/A (No GPU Sensor detected) │
└───────┴──────────────────────────────────────────────────┘
📂 Report: results/run_20260501_175030.json
pybench/
│
├── main.py # Entry point — CLI selector & benchmark runner
├── config.py # Duration, thread count, result path settings
├── requirements.txt # Standard pip dependencies
├── pyproject.toml # Modern Python packaging (uv/build)
├── uv.lock # Lockfile for reproducible environments
├── CONTRIBUTING.md # Guidelines for contributing to PyBench
├── CODE_OF_CONDUCT.md # Rules for community behavior
│
├── modules/
│ ├── cpu_benchmark.py # Single/multi-thread, compression, encryption, prime sieve
│ ├── memory_benchmark.py # Bandwidth, latency, copy speed, allocation
│ ├── disk_benchmark.py # Sequential & random I/O, IOPS
│ └── gpu_benchmark.py # OpenCL compute kernel + VRAM bandwidth
│
├── monitor/
│ ├── system_monitor.py # Background thread — polls psutil & nvidia-smi every 0.5s
│ └── aggregator.py # Computes Avg / Min / Max / Std Dev from raw samples
│
├── scoring/
│ └── scorer.py # Score formulas per module + overall composite
│
├── reporter/
│ └── exporter.py # Serializes results to timestamped JSON
│
├── ui/
│ ├── dashboard.py # Live stats panel (Rich Live)
│ ├── formatter.py # Welcome screen & log helpers
│ └── results_view.py # Final results table renderer
│
└── results/ # Auto-created — stores run_YYYYMMDD_HHMMSS.json
PyBench uses uv for lightning-fast and reproducible environment management.
Requirements: Python 3.12+, uv (recommended)
# 1. Clone the repository
git clone https://github.com/dxnz-id/pybench.git
cd pybench
# 2. Setup environment and install dependencies
uv venv
uv sync
# 3. Run the application
uv run python main.py(Alternatively, you can use standard Python: python -m venv .venv and pip install -r requirements.txt)
Note — OpenCL: GPU benchmarking requires OpenCL drivers.
- NVIDIA: install CUDA Toolkit
- Intel / AMD integrated: install the vendor's OpenCL runtime
- If OpenCL is unavailable, the GPU compute test falls back to a CPU scalar implementation automatically.
# Run interactively (recommended)
python main.py
# Run with verbose logging
python main.py --verboseAt the prompt, enter the benchmarks you want to run:
Select benchmarks (e.g. 1,2,3) or 'all' (all): all
1 → CPU Benchmark
2 → Memory Benchmark
3 → Disk Benchmark
4 → GPU Benchmark
all → Run everything
Results are automatically saved to results/run_<timestamp>.json.
All tests use a time-bounded loop pattern: each sub-test runs for a fixed duration (DEFAULT_DURATION, default 10 seconds) and scores by counting how many operations were completed — not by measuring how long a fixed task takes.
Focuses on raw processor throughput across floating-point math, cryptography, compression, and parallel workloads.
Single-thread — Measures pure single-core performance by repeatedly executing a 500-iteration FPU-heavy loop (sqrt · log · sin). Stresses the Floating Point Unit (FPU) without any parallelism.
def workload():
x = 0.0
for i in range(1, 500):
x += math.sqrt(i) * math.log(i + 1) * math.sin(i)Multi-thread — Dispatches the same workload across all available logical cores via ThreadPoolExecutor. Scores are the sum of all thread completions, reflecting total parallel throughput.
with ThreadPoolExecutor(max_workers=cores) as ex:
futures = [ex.submit(self._run_timed, workload) for _ in range(cores)]
total = sum(f.result() for f in futures)Compression — Repeatedly compresses a 64 KB random buffer with zlib at level 6. Stresses integer instruction throughput and memory manipulation.
Encryption — Runs hashlib.pbkdf2_hmac(sha256) on 64 KB data per iteration. Simulates cryptographic workloads found in real-world security applications.
Prime Sieve — Implements the Sieve of Eratosthenes up to n = 100,000 using a bytearray. Tests CPU branching logic and array traversal patterns.
def sieve(n):
s = bytearray([1]) * (n + 1)
s[0] = s[1] = 0
for i in range(2, int(n ** 0.5) + 1):
if s[i]:
s[i*i::i] = bytearray(len(s[i*i::i]))Focuses on RAM throughput and latency — how fast the system can move large blocks of data and how quickly it responds to non-sequential access patterns.
Sequential Bandwidth — Allocates a 128 MB bytearray, copies it into a new buffer (write pass), then performs a single read. Measures raw bulk transfer speed in MB/s.
BUF_SIZE = 128 * 1024 * 1024
buf2 = bytearray(BUF_SIZE)
buf2[:] = buf # write pass
_ = buf2[0] # read pass
total_bytes += BUF_SIZE * 2Random Access Speed — Populates a 64 MB in-memory list of floats, then performs random index reads. Simulates cache miss pressure since accesses do not follow a predictable pattern. Result reported in IOPS.
Copy Speed — Uses Python's memoryview to copy a 256 MB buffer directly at the memory level, minimizing Python object overhead. Reported in GB/s.
mv_dst[:] = mv_src # low-level bulk copy via memoryviewAllocation Stress — Continuously allocates and discards 1 MB bytearray objects, then forces a gc.collect(). Tests OS memory management and the Python garbage collector under sustained pressure.
Modeled after the CrystalDiskMark methodology. A 1 GB temporary file (CDM_test.tmp) is created before testing and deleted automatically on completion.
| Sub-test | Block Size | Threads | Use case simulated |
|---|---|---|---|
| SEQ1M Q8T1 | 1 MB | 8 | Large file transfers (video, ISO, game installs) |
| RND4K Q32T1 | 4 KB | 32 | OS boot, app launch, small file I/O (IOPS-bound) |
Sequential tests move through the file in ordered offsets. Random tests use random.randint to seek to arbitrary 4 KB-aligned positions, stressing the drive's IOPS capability. All writes call os.fsync() to ensure data is committed to hardware rather than OS cache.
# RND4K — random sector seek before every operation
pos = random.randint(0, self.file_size // block_size - 1) * block_size
f.seek(pos)
f.write(data) # or f.read(block_size)Uses PyOpenCL to execute compute workloads on the GPU. If no OpenCL platform is detected, the compute test falls back to a CPU scalar loop automatically — vram_bw returns null in that case.
GPU Compute — Compiles and dispatches an OpenCL C kernel on-the-fly that applies sqrt · log · sin to every element of a 1M-element float32 array in parallel across all GPU compute units. Result is reported in MOps/s (million operations per second).
__kernel void compute(__global float* src, __global float* dst) {
int i = get_global_id(0);
dst[i] = sqrt(src[i]) * log(src[i] + 1.0f) * sin(src[i]);
}VRAM Bandwidth — Repeatedly transfers a 256 MB float32 NumPy array from host RAM to device VRAM (cl.Buffer with COPY_HOST_PTR) and back (cl.enqueue_copy). Measures PCIe bus throughput and VRAM read/write speed in MB/s.
A background daemon thread polls system state every 0.5s for the entire duration of the benchmark run using psutil, nvidia-smi (via subprocess), and WMI fallbacks. Raw samples are collected into lists and aggregated into Avg / Min / Max / Std Dev by monitor/aggregator.py after the run completes. Monitored metrics include: CPU usage %, CPU frequency, RAM usage, GPU usage %, GPU temperature, VRAM used, disk I/O speed, and network throughput.
Higher score = better performance. Scores are unitless integers calibrated so mid-range hardware scores roughly in the same order of magnitude across modules.
CPU Score = single_ops + (multi_ops × 0.5)
Memory Score = (seq_bw × 0.4) + (copy_gb × 500) + (rand_lat / 5,000)
Disk Score = (seq_total_MB × 0.05) + (rnd_total_IOPS × 1.2)
GPU Score = (compute_MOps × 5) + (vram_bw_MB × 0.1)
Overall = average of all modules with score > 0
Each run produces a JSON file in results/ with the following structure:
{
"run_id": "20260428_122852",
"timestamp": "Tue Apr 28 12:28:52 2026",
"hardware": {
"cpu": "13th Gen Intel(R) Core(TM) i5-13450HX",
"cpu_cores": 10,
"cpu_threads": 16,
"cpu_base_clock": "2400.00 MHz",
"ram": "11.71 GB",
"gpu": "NVIDIA GeForce RTX 3050 6GB Laptop GPU",
"gpu_vram": "6144.0 MB",
"disk_total": "237.41 GB",
"os": "Windows 11"
},
"system_monitor": {
"cpu": { "avg": 20.32, "min": 0.0, "max": 73.7, "std_dev": 15.02 },
"cpu_freq": {
"avg": 2171.2,
"min": 1520.0,
"max": 2400.0,
"std_dev": 386.97
},
"ram": { "avg": 59.56, "min": 57.0, "max": 63.6, "std_dev": 2.05 },
"gpu": { "avg": 4.11, "min": 0.0, "max": 48.0, "std_dev": 9.59 },
"gpu_temp": { "avg": 44.21, "min": 43.0, "max": 49.0, "std_dev": 1.77 }
},
"benchmark_results": {
"cpu": {
"single": 126391,
"multi": 184701,
"compress": 5843,
"encrypt": 2506329,
"prime": 45899
},
"memory": {
"seq_bw": 4634.33,
"rand_lat": 1430777.29,
"copy": 8.23,
"alloc": 3077.19
},
"disk": {
"seq_write": 3566.22,
"seq_read": 5960.72,
"rand_write": 70188.45,
"rand_read": 75401.3,
"iops": 145589.76
},
"gpu": { "compute": 3545.31, "vram_bw": 1216.46, "opencl_ok": true }
},
"scores": {
"cpu": 218741,
"memory": 6254,
"disk": 175184,
"gpu": 17848,
"overall": 104506
}
}| Package | Purpose |
|---|---|
rich |
Terminal UI — progress bars, live panels, tables |
psutil |
CPU, RAM, and disk metrics |
nvidia-ml-py |
Hardware identification (NVML) in exporter.py |
py-cpuinfo |
Detailed CPU model information |
pyopencl |
GPU compute kernel execution |
numpy |
Array operations for GPU benchmark |
- CPU temperature requires platform-specific drivers (e.g., OpenHardwareMonitor on Windows,
lm-sensorson Linux). PyBench reportsnullif unavailable. - GPU monitoring via
nvidia-smionly supports NVIDIA GPUs. Intel/AMD integrated graphics are not monitored via this method (though hardware info might still be detected via OpenCL), and temperature/usage stats may fall back to WMI on Windows ifnvidia-smiis unavailable. - Disk benchmark creates a 1 GB temporary file. Ensure the target directory has sufficient free space.
- Python GIL affects multi-thread CPU scores — results reflect Python-level concurrency, not raw hardware throughput. Results are still consistent and comparable across machines running the same Python version.
- Score comparability is only valid between runs using the same
DEFAULT_DURATIONsetting.
If you find PyBench useful and would like to support its development, you can buy me a coffee:
- Detailed Reporting: HTML / PDF export for benchmark results.
- Network Benchmark: Latency and bandwidth tests for internet/LAN.
- Hardware Database: Online leaderboard to compare your scores.
- macOS Support: Adding Apple Silicon (M1/M2/M3) specific fallback sensors.
We welcome contributions from the community! Whether it's adding a new benchmark module, optimizing existing code, or fixing bugs, please read our Contributing Guidelines before opening a Pull Request to ensure your changes align with PyBench's architecture.
Please note that this project is released with a Code of Conduct. By participating in this project you agree to abide by its terms.
PyBench is a hardware benchmarking utility built to explore and demonstrate systems-level programming performance in Python.
Results are relative benchmarks and should not be compared to native-compiled tools such as CrystalDiskMark, Cinebench, or 3DMark.
Disclaimer: This program was primarily written with the assistance of AI.