Skip to content

Commit a186743

Browse files
MadeeksRMelibcumming
authored
CE: Added pages with guidelines for images on Alps (#272)
As agreed in VCUE, I have updated a set of images with foundational resources (CUDA, MPI, NCCL, NVSHMEM), deriving them from material I use myself, and demonstrated how to run them through the CE on Alps. The intent of this material is to offer guidelines and suggestions about versions, building, and running of foundational components to use on Alps, without committing to officially supported resources. --------- Co-authored-by: Rocco Meli <r.meli@bluemail.ch> Co-authored-by: Ben Cumming <bcumming@cscs.ch>
1 parent 5a33290 commit a186743

File tree

18 files changed

+1087
-71
lines changed

18 files changed

+1087
-71
lines changed

.github/actions/spelling/allow.txt

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,11 @@ CWP
1717
CXI
1818
Ceph
1919
Containerfile
20+
Containerfiles
2021
DNS
2122
Dockerfiles
2223
Dufourspitze
24+
EFA
2325
EMPA
2426
ETHZ
2527
Ehrenfest
@@ -76,6 +78,8 @@ MeteoSwiss
7678
NAMD
7779
NICs
7880
NVMe
81+
NVSHMEM
82+
NVLINK
7983
Nordend
8084
OpenFabrics
8185
OAuth
@@ -102,6 +106,7 @@ ROCm
102106
RPA
103107
Roboto
104108
Roothaan
109+
SHMEM
105110
SSHService
106111
STMV
107112
Scopi

docs/software/communication/cray-mpich.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ This means that Cray MPICH will automatically be linked to the GTL library, whic
2828
$ ldd myexecutable | grep gtl
2929
libmpi_gtl_cuda.so => /user-environment/linux-sles15-neoverse_v2/gcc-13.2.0/cray-gtl-8.1.30-fptqzc5u6t4nals5mivl75nws2fb5vcq/lib/libmpi_gtl_cuda.so (0x0000ffff82aa0000)
3030
```
31-
31+
3232
The path may be different, but the `libmpi_gtl_cuda.so` library should be printed when using CUDA.
3333
In ROCm environments the `libmpi_gtl_hsa.so` library should be linked.
3434
If the GTL library is not linked, nothing will be printed.
@@ -40,7 +40,7 @@ See [this page][ref-slurm-gh200] for more information on configuring Slurm to us
4040
!!! warning "Segmentation faults when trying to communicate GPU buffers without `MPICH_GPU_SUPPORT_ENABLED=1`"
4141
If you attempt to communicate GPU buffers through MPI without setting `MPICH_GPU_SUPPORT_ENABLED=1`, it will lead to segmentation faults, usually without any specific indication that it is the communication that fails.
4242
Make sure that the option is set if you are communicating GPU buffers through MPI.
43-
43+
4444
!!! warning "Error: "`GPU_SUPPORT_ENABLED` is requested, but GTL library is not linked""
4545
If `MPICH_GPU_SUPPORT_ENABLED` is set to `1` and your application does not link against one of the GTL libraries you will get an error similar to the following during MPI initialization:
4646
```bash
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
ARG ubuntu_version=24.04
2+
ARG cuda_version=12.8.1
3+
FROM docker.io/nvidia/cuda:${cuda_version}-cudnn-devel-ubuntu${ubuntu_version}
4+
5+
RUN apt-get update \
6+
&& DEBIAN_FRONTEND=noninteractive \
7+
apt-get install -y \
8+
build-essential \
9+
ca-certificates \
10+
pkg-config \
11+
automake \
12+
autoconf \
13+
libtool \
14+
cmake \
15+
gdb \
16+
strace \
17+
wget \
18+
git \
19+
bzip2 \
20+
python3 \
21+
gfortran \
22+
rdma-core \
23+
numactl \
24+
libconfig-dev \
25+
libuv1-dev \
26+
libfuse-dev \
27+
libfuse3-dev \
28+
libyaml-dev \
29+
libnl-3-dev \
30+
libnuma-dev \
31+
libsensors-dev \
32+
libcurl4-openssl-dev \
33+
libjson-c-dev \
34+
libibverbs-dev \
35+
--no-install-recommends \
36+
&& rm -rf /var/lib/apt/lists/*
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
ARG gdrcopy_version=2.5.1
2+
RUN git clone --depth 1 --branch v${gdrcopy_version} https://github.com/NVIDIA/gdrcopy.git \
3+
&& cd gdrcopy \
4+
&& export CUDA_PATH=/usr/local/cuda \
5+
&& make CC=gcc CUDA=$CUDA_PATH lib \
6+
&& make lib_install \
7+
&& cd ../ && rm -rf gdrcopy
8+
9+
# Install libfabric
10+
ARG libfabric_version=1.22.0
11+
RUN git clone --branch v${libfabric_version} --depth 1 https://github.com/ofiwg/libfabric.git \
12+
&& cd libfabric \
13+
&& ./autogen.sh \
14+
&& ./configure --prefix=/usr --with-cuda=/usr/local/cuda --enable-cuda-dlopen \
15+
--enable-gdrcopy-dlopen --enable-efa \
16+
&& make -j$(nproc) \
17+
&& make install \
18+
&& ldconfig \
19+
&& cd .. \
20+
&& rm -rf libfabric
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
ARG nccl_tests_version=2.17.1
2+
RUN wget -O nccl-tests-${nccl_tests_version}.tar.gz https://github.com/NVIDIA/nccl-tests/archive/refs/tags/v${nccl_tests_version}.tar.gz \
3+
&& tar xf nccl-tests-${nccl_tests_version}.tar.gz \
4+
&& cd nccl-tests-${nccl_tests_version} \
5+
&& MPI=1 make -j$(nproc) \
6+
&& cd .. \
7+
&& rm -rf nccl-tests-${nccl_tests_version}.tar.gz
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
RUN apt-get update \
2+
&& DEBIAN_FRONTEND=noninteractive \
3+
apt-get install -y \
4+
python3-venv \
5+
python3-dev \
6+
--no-install-recommends \
7+
&& rm -rf /var/lib/apt/lists/* \
8+
&& rm /usr/lib/python3.12/EXTERNALLY-MANAGED
9+
10+
# Build NVSHMEM from source
11+
ARG nvshmem_version=3.4.5
12+
RUN wget -q https://developer.download.nvidia.com/compute/redist/nvshmem/${nvshmem_version}/source/nvshmem_src_cuda12-all-all-${nvshmem_version}.tar.gz \
13+
&& tar -xvf nvshmem_src_cuda12-all-all-${nvshmem_version}.tar.gz \
14+
&& cd nvshmem_src \
15+
&& NVSHMEM_BUILD_EXAMPLES=0 \
16+
NVSHMEM_BUILD_TESTS=1 \
17+
NVSHMEM_DEBUG=0 \
18+
NVSHMEM_DEVEL=0 \
19+
NVSHMEM_DEFAULT_PMI2=0 \
20+
NVSHMEM_DEFAULT_PMIX=1 \
21+
NVSHMEM_DISABLE_COLL_POLL=1 \
22+
NVSHMEM_ENABLE_ALL_DEVICE_INLINING=0 \
23+
NVSHMEM_GPU_COLL_USE_LDST=0 \
24+
NVSHMEM_LIBFABRIC_SUPPORT=1 \
25+
NVSHMEM_MPI_SUPPORT=1 \
26+
NVSHMEM_MPI_IS_OMPI=1 \
27+
NVSHMEM_NVTX=1 \
28+
NVSHMEM_PMIX_SUPPORT=1 \
29+
NVSHMEM_SHMEM_SUPPORT=1 \
30+
NVSHMEM_TEST_STATIC_LIB=0 \
31+
NVSHMEM_TIMEOUT_DEVICE_POLLING=0 \
32+
NVSHMEM_TRACE=0 \
33+
NVSHMEM_USE_DLMALLOC=0 \
34+
NVSHMEM_USE_NCCL=1 \
35+
NVSHMEM_USE_GDRCOPY=1 \
36+
NVSHMEM_VERBOSE=0 \
37+
NVSHMEM_DEFAULT_UCX=0 \
38+
NVSHMEM_UCX_SUPPORT=0 \
39+
NVSHMEM_IBGDA_SUPPORT=0 \
40+
NVSHMEM_IBGDA_SUPPORT_GPUMEM_ONLY=0 \
41+
NVSHMEM_IBDEVX_SUPPORT=0 \
42+
NVSHMEM_IBRC_SUPPORT=0 \
43+
LIBFABRIC_HOME=/usr \
44+
NCCL_HOME=/usr \
45+
GDRCOPY_HOME=/usr/local \
46+
MPI_HOME=/usr \
47+
SHMEM_HOME=/usr \
48+
NVSHMEM_HOME=/usr \
49+
cmake . \
50+
&& make -j$(nproc) \
51+
&& make install \
52+
&& ldconfig \
53+
&& cd .. \
54+
&& rm -r nvshmem_src nvshmem_src_cuda12-all-all-${nvshmem_version}.tar.gz
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
ARG OMPI_VER=5.0.8
2+
RUN wget -q https://download.open-mpi.org/release/open-mpi/v5.0/openmpi-${OMPI_VER}.tar.gz \
3+
&& tar xf openmpi-${OMPI_VER}.tar.gz \
4+
&& cd openmpi-${OMPI_VER} \
5+
&& ./configure --prefix=/usr --with-ofi=/usr --with-ucx=/usr \
6+
--enable-oshmem --with-cuda=/usr/local/cuda \
7+
--with-cuda-libdir=/usr/local/cuda/lib64/stubs \
8+
&& make -j$(nproc) \
9+
&& make install \
10+
&& ldconfig \
11+
&& cd .. \
12+
&& rm -rf openmpi-${OMPI_VER}.tar.gz openmpi-${OMPI_VER}
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
ARG omb_version=7.5.1
2+
RUN wget -q http://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-${omb_version}.tar.gz \
3+
&& tar xf osu-micro-benchmarks-${omb_version}.tar.gz \
4+
&& cd osu-micro-benchmarks-${omb_version} \
5+
&& ldconfig /usr/local/cuda/targets/sbsa-linux/lib/stubs \
6+
&& ./configure --prefix=/usr/local CC=$(which mpicc) CFLAGS="-O3 -lcuda -lnvidia-ml" \
7+
--enable-cuda --with-cuda-include=/usr/local/cuda/include \
8+
--with-cuda-libpath=/usr/local/cuda/lib64 \
9+
CXXFLAGS="-lmpi -lcuda" \
10+
&& make -j$(nproc) \
11+
&& make install \
12+
&& ldconfig \
13+
&& cd .. \
14+
&& rm -rf osu-micro-benchmarks-${omb_version} osu-micro-benchmarks-${omb_version}.tar.gz
15+
16+
WORKDIR /usr/local/libexec/osu-micro-benchmarks/mpi
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Install UCX
2+
ARG UCX_VERSION=1.19.0
3+
RUN wget https://github.com/openucx/ucx/releases/download/v${UCX_VERSION}/ucx-${UCX_VERSION}.tar.gz \
4+
&& tar xzf ucx-${UCX_VERSION}.tar.gz \
5+
&& cd ucx-${UCX_VERSION} \
6+
&& mkdir build \
7+
&& cd build \
8+
&& ../configure --prefix=/usr --with-cuda=/usr/local/cuda --with-gdrcopy=/usr/local \
9+
--enable-mt --enable-devel-headers \
10+
&& make -j$(nproc) \
11+
&& make install \
12+
&& cd ../.. \
13+
&& rm -rf ucx-${UCX_VERSION}.tar.gz ucx-${UCX_VERSION}
Lines changed: 56 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,67 @@
11
[](){#ref-software-communication}
22
# Communication Libraries
33

4-
CSCS provides common communication libraries optimized for the [Slingshot 11 network on Alps][ref-alps-hsn].
4+
Communication libraries, like MPI and NCCL, are one of the building blocks for high performance scientific and ML workloads.
5+
Broadly speaking, there are two levels of communication:
6+
7+
* **Intra-node** communication between two processes on the same node.
8+
* **Inter-node** communication between different nodes, over the [Slingshot 11 network][ref-alps-hsn] that connects nodes on Alps.
9+
10+
To get the best inter-node performance on Alps, they need to be configured to use the [libfabric][ref-communication-libfabric] library that has an optimised back end for the Slingshot 11 network on Alps.
11+
12+
As such, communication libraries are part of the "base layer" of libraries and tools used by all workloads to fully utilize the hardware on Alps.
13+
They comprise the *network* layer in the following stack:
14+
15+
* **CPU**: compilers with support for building applications optimized for the CPU architecture on the node.
16+
* **GPU**: CUDA and ROCM provide compilers and runtime libraries for NVIDIA and AMD GPUs respectively.
17+
* **Network**: libfabric, MPI, NCCL, NVSHMEM, need to be configured for the Slingshot network.
18+
19+
CSCS provides communication libraries optimised for libfabric and Slingshot in uenv, and guidance on how to create container images that use them.
20+
This section of the documentation provides advice on how to build and install software to use these libraries, and how to deploy them.
521

622
For most scientific applications relying on MPI, [Cray MPICH][ref-communication-cray-mpich] is recommended.
723
[MPICH][ref-communication-mpich] and [OpenMPI][ref-communication-openmpi] may also be used, with limitations.
824
Cray MPICH, MPICH, and OpenMPI make use of [libfabric][ref-communication-libfabric] to interact with the underlying network.
925

10-
Most machine learning applications rely on [NCCL][ref-communication-nccl] or [RCCL][ref-communication-rccl] for high-performance implementations of collectives.
11-
NCCL and RCCL have to be configured with a plugin using [libfabric][ref-communication-libfabric] to make full use of the Slingshot network.
26+
Most machine learning applications rely on [NCCL][ref-communication-nccl] for high-performance implementations of collectives.
27+
NCCL have to be configured with a plugin using [libfabric][ref-communication-libfabric] to make full use of the Slingshot network.
1228

1329
See the individual pages for each library for information on how to use and best configure the libraries.
1430

15-
* [Cray MPICH][ref-communication-cray-mpich]
16-
* [MPICH][ref-communication-mpich]
17-
* [OpenMPI][ref-communication-openmpi]
18-
* [NCCL][ref-communication-nccl]
19-
* [RCCL][ref-communication-rccl]
20-
* [libfabric][ref-communication-libfabric]
31+
<div class="grid cards" markdown>
32+
33+
- __Low Level__
34+
35+
Learn about the low-level networking library libfabric, and how to use it in uenv and containers
36+
37+
[:octicons-arrow-right-24: libfabric][ref-alps]
38+
39+
</div>
40+
<div class="grid cards" markdown>
41+
42+
- __MPI__
43+
44+
Cray MPICH is the most optimized and best tested MPI implementation on Alps, and is used by uenv.
45+
46+
[:octicons-arrow-right-24: Cray MPICH][ref-communication-cray-mpich]
47+
48+
For compatibility in containers:
49+
50+
[:octicons-arrow-right-24: MPICH][ref-communication-mpich]
51+
52+
Also OpenMPI can be built in containers or in uenv
53+
54+
[:octicons-arrow-right-24: OpenMPI][ref-communication-openmpi]
55+
56+
</div>
57+
<div class="grid cards" markdown>
58+
59+
- __Machine Learning__
60+
61+
Communication libraries used by ML tools like Torch, and some simulation codes.
62+
63+
[:octicons-arrow-right-24: NCCL][ref-communication-nccl]
64+
65+
[:octicons-arrow-right-24: NVSHMEM][ref-communication-nvshmem]
66+
67+
</div>

0 commit comments

Comments
 (0)