-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathREADME.optimize_for_cpu
More file actions
31 lines (23 loc) · 1.26 KB
/
README.optimize_for_cpu
File metadata and controls
31 lines (23 loc) · 1.26 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
8/2021 Optimize for a specific CPU
The wgrib2's makefile assumes a generic CPU. For 64-bit AMD or Intel
cpus on a 64-bit OS, that means wgrib2 will be compiled for the 64-bit
instruction set first introduced by AMD in 1999. That means the latest
instructions like AVX+ will not be used. The advantage of using the older
instruction set is that the executable doesn't have to be recompiled
to a compatible OS.
In my experience, optimizing wgrib2 to new instructions provided limited
gains. Speeding up wgrib2 by recompiling is limited because
(1) Wgrib2 is often limited by I/O speeds.
(2) Wgrib2 has limited parallelization in packing and unpacking.
(3) Wgrib2 uses OpenMP to parallize the loops, and this limits the
parallelization that can be achived by modern SIMD instructions.
(4) Wgrib2 limits itself to OpenMP 3.1. Later versions of OpenMP
can enable SIMD instructions and run on GPUs. However, portability
is an issue.
If you want to try optimizing wgrib2, add the following before compiling
export CPPFLAGS='compiler-directives' (previous version of README had a typo, CFFLAGS=..)
export FFLAGS='compiler-directives'
export CC=C-compiler
export FC=Fortran-compiler
make
For compiler directives, the -march=(CPU) is easy to use.