Parallel Computing

A collection of HPC implementations covering MPI, OpenMP, and hybrid parallelism, developed as part of the P1.5 Parallel Computing course in the HPC Master programme at SISSA/ICTP (2025-26).

Technologies

Technology	Usage
MPI	Distributed-memory communication (point-to-point, collective, non-blocking)
OpenMP	Shared-memory thread parallelism
Hybrid MPI+OpenMP	Two-level parallelism targeting multi-core cluster nodes
HDF5 (parallel)	Scalable parallel I/O for large datasets
FFTW3-MPI	Distributed 3D FFTs for spectral PDE solving
C++20 / C	Implementation languages
CMake / Make	Build systems

Projects

1. Distributed Identity Matrix — `01_identity_matrix/`

An N×N identity matrix is distributed in block rows across MPI processes using a templated CMatrix<T> class. Explores three communication strategies for collecting and printing the matrix from the root process:

Blocking MPI_Send / MPI_Recv
Non-blocking MPI_Isend / MPI_Irecv with double-buffering (overlap communication and printing)
Binary file I/O via parallel writes and MPI_Recv gather

The load imbalance from non-divisible sizes is handled through the rest term, distributing the extra rows to the first processes.

Build:

cd 01_identity_matrix && mkdir build && cd build
cmake .. && make
mpirun -np 4 ./idMat

2. Matrix Multiplication — 1D Block-Row Distribution — `02_matrix_multiplication/`

Parallel dense matrix multiplication (A × B = C) with a 1D block-row data layout using MPI collectives (MPI_Scatterv, MPI_Allgather). The inner loop is further parallelised with OpenMP to exploit shared memory within each node (hybrid MPI+OpenMP).

Performance is benchmarked against the serial baseline on a 50 000 × 50 000 matrix across multiple node counts and thread configurations:

Configuration	Plot
Pure MPI scaling	`data/PureMPI.png`
Pure MPI efficiency	`data/efficiency.png`
Hybrid 4 tasks/node	`data/hybrid4npes.png`
Hybrid 8 tasks/node	`data/hybrid8npes.png`
Hybrid efficiency	`data/hybrideff.png`
Cannon vs 1D comparison	`data/cannonAlg.png`

A Google Test suite (gtest.cpp) validates correctness by comparing the parallel result against a serial reference multiplication.

Build:

cd 02_matrix_multiplication && mkdir build && cd build
cmake .. && make
mpirun -np 4 ./matMul

3. Cannon's Algorithm — 2D Block Distribution — `03_cannon_algorithm/`

Implementation of Cannon's algorithm for matrix multiplication with a 2D process grid. Each process owns a square subblock; blocks are cyclically shifted along rows and columns to compute the product with O(√P) memory per process instead of O(N²/P) for the 1D layout.

Requires the number of MPI processes to be a perfect square. Supports hybrid execution (OpenMP inner loop). Benchmark scripts compare Cannon vs. 1D distribution performance and efficiency.

Build:

cd 03_cannon_algorithm && mkdir build && cd build
cmake .. && make
mpirun -np 16 ./cannon   # P must be a perfect square

4. OpenMP Introduction — `04_openmp_intro/`

Two standalone programs illustrating OpenMP fundamentals:

hello_threads.cpp — spawns a fixed number of threads and prints a greeting from each, demonstrating #pragma omp parallel and omp_get_thread_num().
matmul_omp.c — benchmarks five OpenMP strategies for the inner loop of matrix multiplication: collapse(3) with atomic, collapse(3) with reduction, collapse(2), collapse(1), and collapse(3) with critical, comparing wall-clock time across strategies.

5. Jacobi Solver — Pure MPI — `05_jacobi_mpi/`

Parallel iterative Jacobi solver for the 2D steady-state heat equation on a square domain, using a 1D (row-wise) domain decomposition. Boundary conditions are applied via an injected lambda.

Each iteration requires exchanging halo rows with neighbouring processes. Two communication variants are implemented and compared:

Blocking MPI_Sendrecv — simple and deadlock-free
Non-blocking MPI_Isend / MPI_Irecv — overlaps halo exchange with interior computation

Scaling results on the Leonardo HPC cluster (CINECA):

Plot	Description
`plots/pureMPI.png`	Time vs. process count, blocking
`plots/MPINonBlock.png`	Time vs. process count, non-blocking
`plots/hybrid8procs.png`	Hybrid, 8 processes
`plots/hybrid16procs.png`	Hybrid, 16 processes

Build:

cd 05_jacobi_mpi && mkdir build && cd build
cmake .. && make
mpirun -np 8 ./jacobi

6. Jacobi Solver — Hybrid MPI+OpenMP — `06_jacobi_hybrid/`

Extension of the pure-MPI Jacobi solver to a two-level hybrid model. The MPI halo exchange is non-blocking (MPI_Isend/MPI_Irecv) and the inner Jacobi sweep is parallelised with #pragma omp parallel for collapse(2), allowing communication and interior computation to overlap at the thread level.

Benchmarked at 8 and 16 MPI processes with varying OMP thread counts per rank on Leonardo, illustrating the trade-off between MPI granularity and shared-memory efficiency.

Build:

cd 06_jacobi_hybrid && mkdir build && cd build
cmake .. && make
OMP_NUM_THREADS=4 mpirun -np 4 ./hybJacobi

7. Jacobi Solver with HDF5 Parallel I/O — `07_jacobi_hdf5/`

Two variants of the Jacobi solver that checkpoint the solution field to HDF5 files using parallel HDF5: each MPI process writes its subdomain collectively without gathering data to rank 0, enabling scalable I/O and post-processing/visualisation at scale.

Sub-project	Description
`1d_jacobi/`	1D domain decomposition, HDF5 checkpoint every 100 iterations
`2d_jacobi/`	2D domain decomposition (process grid), HDF5 output per block

I/O performance benchmarks in the plots/ directories show write throughput scaling with process count.

Dependencies: parallel HDF5 library.

Build:

cd 07_jacobi_hdf5/1d_jacobi && mkdir build && cd build
cmake .. && make
mpirun -np 8 ./jacobi_hdf5

8. 3D Diffusion Equation with FFTW-MPI — `08_fftw_diffusion/`

Parallel solution of the 3D diffusion equation with spatially varying diffusivity, using spectral (Fourier) spatial derivatives and forward Euler time integration:

∂c/∂t = ∇·(D(r) ∇c)

The 3D domain is distributed with a 1D slab decomposition along the first dimension via fftw_mpi_local_size_3d. Forward and inverse FFTs use fftw_mpi_plan_dft_3d / fftw_mpi_execute_dft. Global reductions (MPI_Allreduce) maintain concentration normalisation at each diagnostic step.

The serial_reference/ subdirectory contains the equivalent single-process code.

Scaling benchmarks on grids of 256³, 512³, and 1024³ points:

Plot	Grid
`plots/n256.png`	256³
`plots/n512.png`	512³
`plots/n1024.png`	1024³

Dependencies: FFTW3 with MPI support.

Build:

cd 08_fftw_diffusion
make           # edit MPI_CC / FFTW_DIR variables as needed
mpirun -np 8 ./diffusion

Extras — `extras/`

alltoall.c — Demonstrates MPI_Alltoall to perform a distributed matrix transpose, sending equal-sized blocks between all process pairs.
par_identity.c — A lightweight C implementation of the parallel identity matrix using non-blocking point-to-point communication.

Repository Structure

.
├── 01_identity_matrix/        # MPI parallel identity matrix (blocking / non-blocking / binary I/O)
├── 02_matrix_multiplication/  # 1D MPI+OpenMP matrix multiply with scaling benchmarks
├── 03_cannon_algorithm/       # Cannon's 2D algorithm for matrix multiplication
├── 04_openmp_intro/           # OpenMP thread basics and matmul strategy comparison
├── 05_jacobi_mpi/             # Pure MPI Jacobi heat equation solver
├── 06_jacobi_hybrid/          # Hybrid MPI+OpenMP Jacobi solver
├── 07_jacobi_hdf5/            # Jacobi solver with parallel HDF5 checkpoint I/O
│   ├── 1d_jacobi/
│   └── 2d_jacobi/
├── 08_fftw_diffusion/         # FFTW-MPI 3D spectral diffusion solver
│   └── serial_reference/
└── extras/                    # MPI_Alltoall transpose, C identity matrix

Building

Projects 01–07 use CMake:

cd <project_dir>
mkdir build && cd build
cmake ..
make

Project 08 uses a plain Makefile — edit MPI_CC and FFTW_DIR to match your system installation.

Running on a Cluster

All MPI executables accept standard mpirun / srun invocations:

# OpenMPI / generic
mpirun -np <N> ./<executable>

# SLURM (e.g. Leonardo @ CINECA)
srun --mpi=pmix -n <N> ./<executable>

# Hybrid MPI+OpenMP
export OMP_NUM_THREADS=<T>
mpirun -np <N> --map-by socket:PE=<T> ./<executable>

Dependencies

Library	Required by
MPI (OpenMPI ≥ 4 or MPICH ≥ 3)	all projects
OpenMP (GCC / Clang)	02, 03, 04, 06
CMake ≥ 3.21	01–07
HDF5 with parallel support	07
FFTW3 with MPI support	08

License

MIT — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parallel Computing

Technologies

Projects

1. Distributed Identity Matrix — `01_identity_matrix/`

2. Matrix Multiplication — 1D Block-Row Distribution — `02_matrix_multiplication/`

3. Cannon's Algorithm — 2D Block Distribution — `03_cannon_algorithm/`

4. OpenMP Introduction — `04_openmp_intro/`

5. Jacobi Solver — Pure MPI — `05_jacobi_mpi/`

6. Jacobi Solver — Hybrid MPI+OpenMP — `06_jacobi_hybrid/`

7. Jacobi Solver with HDF5 Parallel I/O — `07_jacobi_hdf5/`

8. 3D Diffusion Equation with FFTW-MPI — `08_fftw_diffusion/`

Extras — `extras/`

Repository Structure

Building

Running on a Cluster

Dependencies

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
01_identity_matrix		01_identity_matrix
02_matrix_multiplication		02_matrix_multiplication
03_cannon_algorithm		03_cannon_algorithm
04_openmp_intro		04_openmp_intro
05_jacobi_mpi		05_jacobi_mpi
06_jacobi_hybrid		06_jacobi_hybrid
07_jacobi_hdf5		07_jacobi_hdf5
08_fftw_diffusion		08_fftw_diffusion
extras		extras
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Parallel Computing

Technologies

Projects

1. Distributed Identity Matrix — 01_identity_matrix/

2. Matrix Multiplication — 1D Block-Row Distribution — 02_matrix_multiplication/

3. Cannon's Algorithm — 2D Block Distribution — 03_cannon_algorithm/

4. OpenMP Introduction — 04_openmp_intro/

5. Jacobi Solver — Pure MPI — 05_jacobi_mpi/

6. Jacobi Solver — Hybrid MPI+OpenMP — 06_jacobi_hybrid/

7. Jacobi Solver with HDF5 Parallel I/O — 07_jacobi_hdf5/

8. 3D Diffusion Equation with FFTW-MPI — 08_fftw_diffusion/

Extras — extras/

Repository Structure

Building

Running on a Cluster

Dependencies

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Distributed Identity Matrix — `01_identity_matrix/`

2. Matrix Multiplication — 1D Block-Row Distribution — `02_matrix_multiplication/`

3. Cannon's Algorithm — 2D Block Distribution — `03_cannon_algorithm/`

4. OpenMP Introduction — `04_openmp_intro/`

5. Jacobi Solver — Pure MPI — `05_jacobi_mpi/`

6. Jacobi Solver — Hybrid MPI+OpenMP — `06_jacobi_hybrid/`

7. Jacobi Solver with HDF5 Parallel I/O — `07_jacobi_hdf5/`

8. 3D Diffusion Equation with FFTW-MPI — `08_fftw_diffusion/`

Extras — `extras/`

Packages