policies/MLPerf_Compatibility_Table.adoc at master · FileSystemGuy/policies

Table of Contents

Training
HPC
Inference
Mobile
Tiny

N/A : Benchmark not present in a round

X: Change in benchmark. Submission results can be compared across rounds when there has been no change in the benchmark

Training

Model

0.5

0.6

0.7

1.0

1.1

2.0

2.1

3.0

3.1

4.0

4.1

5.0

ResNet-50 v1.5

N/A

SSD-ResNet34

N/A

RetinaNet-ResNeXt50

N/A

MaskRCNN

N/A

NCF

N/A

NMT

N/A

Transformer

N/A

MiniGo

N/A

DLRM

N/A

DLRM-dcnv2

N/A

BERT

N/A

RNN-T

N/A

3D U-Net

N/A

GPT3

N/A

Stable Diffusionv2

N/A

LLama70B-LoRA

N/A

RGAT

N/A

Llama3.1 405b

N/A

Metric: Time-to-train (measured in minutes)

Note: v0.6 ResNet-50 v1.5, SSD-ResNet34, NMT increased accuracy targets, all v0.6 benchmarks changed initializition timing, and v0.7 MiniGo moved to 19x19 board

HPC

Model	0.7	1.0	2.0
CosmoFlow	X	X	X
DeepCAM	X		X
Open Catalyst	N/A	X	X

Metrics: Time-to-train (measured in minutes) and throughput (weak scaling - measured in models/minute)

Inference

Datacenter

Model	0.5	0.7	1.0	1.1	2.0	2.1	3.0	3.1	4.0	4.1	5.0
MobileNet-v1	X	N/A
ResNet-50 v1.5	X
SSD-MobileNets	X								N/A
SSD-ResNet34	X					N/A
RetinaNet-ResNeXt50	N/A					X
NMT	X	N/A
DLRM	N/A	X						N/A
DLRM-v2	N/A							X
BERT	N/A	X									N/A
RNN-T	N/A	X								N/A
3D U-Net	N/A	X
GPT-J	N/A							X
Llama2-70b	N/A								X
Stable-diffusion-xl	N/A								X
Mixtral-8x7b	N/A									X
Llama 3.1 405B	N/A										X
Llama 2 70B interactive	N/A										X
RGAT	N/A										X

Edge

Model	0.5	0.7	1.0	1.1	2.0	2.1	3.0	3.1	4.0	4.1	5.0
MobileNet-v1	X	N/A
ResNet-50 v1.5	X
SSD-MobileNets	X								N/A
SSD-ResNet34	X					N/A
RetinaNet-ResNeXt50	N/A					X
NMT	X	N/A
DLRM	N/A	X						N/A
DLRM-v2	N/A							X	N/A
BERT	N/A	X
RNN-T	N/A	X								N/A
3D U-Net	N/A	X
GPT-J	N/A							X
Stable-diffusion-xl	N/A								X
Automotive PointPainting	N/A										X

Metrics: Queries/second (server), Samples/second (offline), Latency (measured in milliseconds) (single stream), Streams (multi-stream v0.5-v1.1), Latency (measured in milliseconds) (multi-stream 2.0+)

Additional power metrics: System power (measured in watts) (server and offline), system energy per stream (measured in joules) (single stream and multi-stream)

Note: Performance metrics for inference and power submissions are not comparable

Note: Multistream v0.5-v1.1 is not compatible with v2.0 and newer

Note: Inference over Network scenario introduced in v2.1

Mobile

Model	0.7	1.0	1.1	2.0	2.1	3.0
MobileNetEdge	X
SSD-MobileNetsV2	X	N/A
MobileDET	N/A	X
DeeplabV3	X				N/A
MOSAIC	N/A			X
MobileBERT	X
EDSR	N/A					X

Primary metrics: Latency (measured in milliseconds) (single stream), Samples/second (offline)

Note: Submission requires all benchmarks in single stream and MobileNetEdge in single stream and offline

Tiny

Model	0.5	0.7	1.0
MobileNetV1	X		X
ResNet-V1	X*		X
DSCNN	X		X
FC Autoencoder	X		X

Primary metric: Latency (measured in milliseconds)

Secondary metric: Energy per inference (measured in microjoules)

*Latency Compatible, not accuracy: v0.5 and v0.7 use the same model, but changed the evaluation set to improve balance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training

HPC

Inference

Mobile

Tiny

FilesExpand file tree

MLPerf_Compatibility_Table.adoc

Latest commit

History

MLPerf_Compatibility_Table.adoc

File metadata and controls

Training

HPC

Inference

Mobile

Tiny