Official artifact accompanying the arXiv preprint: GDEV-AI: A Generalized Evaluation of Deep Learning Inference Scaling and Architectural Saturation
Available on arXiv: https://arxiv.org/abs/2602.16858
This repository provides the exact CPU-only benchmarking script used to generate the inference scaling results reported in the paper.
This benchmark quantifies:
- Throughput scaling (images/sec)
- Median latency
- P99 tail latency
- Thread-level saturation behavior
- Oversubscription effects
- Batch-size tradeoffs
The goal is to establish a reproducible CPU inference baseline under modern deep learning workloads.
- ResNet-18
- ResNet-50
All measurements are performed using:
- PyTorch (Eager mode)
- CPU-only execution
- No CUDA
- No quantization
- No TorchScript
- No TensorRT
Linux (Ubuntu 22.04 recommended)
Python 3.8+
Create and activate a virtual environment:
python3 -m venv gdevai_env
source gdevai_env/bin/activateInstall dependencies:
pip install --upgrade pip
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
pip install numpyVerify installation:
python -c "import torch; print(torch.__version__)"To reproduce the paper exactly:
- Warmup iterations: 20
- Timed iterations: 100
- Batch sizes: 1, 2, 4, 8, 16
Thread sweeps:
Legacy platform: 1, 2, 3, 4
Modern platform: 1, 2, 3, 4, 6, 8, 12, 16, 24, 32, 40, 48
Additional configuration: torch.set_num_interop_threads(1)
Pretrained ImageNet weights are used.
python run_cpu_inference_benchmark.pyTo reduce scheduling noise and to match the results reported in the preprint, run the benchmark with core pinning. For a system with
Example for a 24-core system:
taskset -c 0-23 python run_cpu_inference_benchmark.pyThe benchmark script supports two thread sweep configurations. Only one configuration should be enabled at a time inside the script.
Uncomment the following and comment out the modern configuration:
# Legacy server setting
TMAX = int(sysinfo["cpu_logical_cpus"]) if sysinfo["cpu_logical_cpus"] else 4
THREADS_LIST = list(range(1, TMAX + 1))This performs a full sweep from 1 up to the maximum logical CPUs. Use this for older or low-core-count systems.
Comment out the legacy configuration and enable:
# Modern server setting
logical_cpus = int(sysinfo["cpu_logical_cpus"]) if sysinfo["cpu_logical_cpus"] else 1
THREADS_LIST = [t for t in [1, 2, 3, 4, 6, 8, 12, 16, 24, 32, 40, 48] if t <= logical_cpus]This controlled sweep matches the configuration used in the paper and avoids excessive noisy oversubscription on high-core systems.
Results are written to:
benchmark_results_final.csv
Each row contains:
- Timestamp
- Model
- Batch size
- Thread count
- Median latency (ms)
- P99 latency (ms)
- Throughput (images/sec)
- System metadata (CPU, memory, OS, PyTorch version)
