GDEV-AI: A Generalized Evaluation of Deep Learning Inference Scaling and Architectural Saturation

📄 Paper

Official artifact accompanying the arXiv preprint: GDEV-AI: A Generalized Evaluation of Deep Learning Inference Scaling and Architectural Saturation

Available on arXiv: https://arxiv.org/abs/2602.16858

gdev-ai-cpu-inference-benchmark

This repository provides the exact CPU-only benchmarking script used to generate the inference scaling results reported in the paper.

1. Overview

This benchmark quantifies:

Throughput scaling (images/sec)
Median latency
P99 tail latency
Thread-level saturation behavior
Oversubscription effects
Batch-size tradeoffs

The goal is to establish a reproducible CPU inference baseline under modern deep learning workloads.

2. Models Evaluated

ResNet-18
ResNet-50

All measurements are performed using:

PyTorch (Eager mode)
CPU-only execution
No CUDA
No quantization
No TorchScript
No TensorRT

3. System Requirements

Operating System

Linux (Ubuntu 22.04 recommended)

Python

Python 3.8+

4. Installation

Create and activate a virtual environment:

python3 -m venv gdevai_env
source gdevai_env/bin/activate

Install dependencies:

pip install --upgrade pip
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
pip install numpy

Verify installation:

python -c "import torch; print(torch.__version__)"

5. Benchmark Configuration (Paper Settings)

To reproduce the paper exactly:

Warmup iterations: 20
Timed iterations: 100
Batch sizes: 1, 2, 4, 8, 16

Thread sweeps:

Legacy platform: 1, 2, 3, 4

Modern platform: 1, 2, 3, 4, 6, 8, 12, 16, 24, 32, 40, 48

Additional configuration: torch.set_num_interop_threads(1)

Pretrained ImageNet weights are used.

6. Running the Benchmark

Basic Run

python run_cpu_inference_benchmark.py

Recommended: Core Pinning (Modern Systems)

To reduce scheduling noise and to match the results reported in the preprint, run the benchmark with core pinning. For a system with $N$ physical cores and to avoids hyperthread scheduling interference.

Example for a 24-core system:

taskset -c 0-23 python run_cpu_inference_benchmark.py

7. Server-Specific Thread Configuration

The benchmark script supports two thread sweep configurations. Only one configuration should be enabled at a time inside the script.

Legacy Server Configuration

Uncomment the following and comment out the modern configuration:

# Legacy server setting
TMAX = int(sysinfo["cpu_logical_cpus"]) if sysinfo["cpu_logical_cpus"] else 4
THREADS_LIST = list(range(1, TMAX + 1))

This performs a full sweep from 1 up to the maximum logical CPUs. Use this for older or low-core-count systems.

Modern Server Configuration

Comment out the legacy configuration and enable:

# Modern server setting
logical_cpus = int(sysinfo["cpu_logical_cpus"]) if sysinfo["cpu_logical_cpus"] else 1
THREADS_LIST = [t for t in [1, 2, 3, 4, 6, 8, 12, 16, 24, 32, 40, 48] if t <= logical_cpus]

This controlled sweep matches the configuration used in the paper and avoids excessive noisy oversubscription on high-core systems.

8. Output

Results are written to:

benchmark_results_final.csv

Each row contains:

Timestamp
Model
Batch size
Thread count
Median latency (ms)
P99 latency (ms)
Throughput (images/sec)
System metadata (CPU, memory, OS, PyTorch version)

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
images		images
LICENSE		LICENSE
README.md		README.md
run_cpu_inference_benchmark.py		run_cpu_inference_benchmark.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GDEV-AI: A Generalized Evaluation of Deep Learning Inference Scaling and Architectural Saturation

📄 Paper

gdev-ai-cpu-inference-benchmark

1. Overview

2. Models Evaluated

3. System Requirements

Operating System

Python

4. Installation

5. Benchmark Configuration (Paper Settings)

6. Running the Benchmark

Basic Run

Recommended: Core Pinning (Modern Systems)

7. Server-Specific Thread Configuration

Legacy Server Configuration

Modern Server Configuration

8. Output

About

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GDEV-AI: A Generalized Evaluation of Deep Learning Inference Scaling and Architectural Saturation

📄 Paper

gdev-ai-cpu-inference-benchmark

1. Overview

2. Models Evaluated

3. System Requirements

Operating System

Python

4. Installation

5. Benchmark Configuration (Paper Settings)

6. Running the Benchmark

Basic Run

Recommended: Core Pinning (Modern Systems)

7. Server-Specific Thread Configuration

Legacy Server Configuration

Modern Server Configuration

8. Output

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages