Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
9d29aae
Add CSCS CI
havogt Feb 19, 2025
0cf96ed
set timelimit
havogt Feb 19, 2025
07fe6ed
Update cscs.yml
havogt Feb 19, 2025
99faef3
add build step
havogt Feb 19, 2025
32a7854
Merge branch 'cscsci' of https://github.com/GridTools/gridtools into …
havogt Feb 19, 2025
889c8af
fix stage name
havogt Feb 19, 2025
9673248
fix base image
havogt Feb 19, 2025
975f8bc
add python to base
havogt Feb 19, 2025
2d9f7cc
fix build_type
havogt Feb 19, 2025
900b057
add pip
havogt Feb 19, 2025
4ff2ffe
path to requirements
havogt Feb 19, 2025
fddd70d
why default no working?
havogt Feb 19, 2025
6dd126b
???
havogt Feb 19, 2025
2979f05
...
havogt Feb 19, 2025
955944f
...
havogt Feb 19, 2025
9962d26
...
havogt Feb 19, 2025
bf98eb4
cleanup
havogt Feb 19, 2025
27ef834
set build command
havogt Feb 20, 2025
b73a78f
update gcc
havogt Feb 20, 2025
e0a680a
update ubuntu
havogt Feb 20, 2025
68fd8e0
use uv
havogt Feb 20, 2025
fb5bf00
...
havogt Feb 20, 2025
19c6461
path
havogt Feb 20, 2025
7a0faf1
cuda 12.5.1
havogt Feb 20, 2025
2a1e4f8
disable test
havogt Feb 20, 2025
e0cf0fe
add test step
havogt Feb 20, 2025
d3d6fe2
.
havogt Feb 20, 2025
32228b7
fix condition
havogt Feb 20, 2025
dcd4e84
fix name
havogt Feb 21, 2025
c055054
fix dir
havogt Feb 21, 2025
ee12f6c
explicit run
havogt Feb 21, 2025
5b1262c
use test runscript with no slurm option
havogt Feb 22, 2025
2244e10
fix run_with_slurm check
havogt Feb 22, 2025
4d17119
test that ci fails on failure
havogt Feb 22, 2025
437ad45
set more env vars
havogt Feb 22, 2025
4763e00
remove c_bindings example
havogt Feb 22, 2025
069fed0
add mpich
havogt Feb 22, 2025
8a3612a
mpich-dev
havogt Feb 22, 2025
ba59aac
...
havogt Feb 22, 2025
977a1ef
other mpich...
havogt Feb 22, 2025
3d61ceb
other mpich...
havogt Feb 22, 2025
c9155da
use compiler wrappers as compilers
havogt Feb 22, 2025
2712068
change mpi location
havogt Feb 22, 2025
e916a25
fix path
havogt Feb 22, 2025
ad28349
play with mpi options
havogt Feb 22, 2025
dcf3cad
separate mpi job
havogt Feb 22, 2025
f33a0ca
try direct ctest
havogt Feb 22, 2025
9d61cb9
try stuff
havogt Feb 22, 2025
9d9f762
remove mpi runner
havogt Feb 24, 2025
4b9c53a
gpus per task
havogt Feb 24, 2025
78c28ba
add some debug prints
havogt Feb 24, 2025
432a0a4
another debug print
havogt Feb 24, 2025
47de3d2
more debug
havogt Feb 24, 2025
7a3c6cd
debug mpi
havogt Feb 24, 2025
e93158d
compiler...
havogt Feb 24, 2025
e57518f
try again
havogt Feb 25, 2025
53440c8
fix mpich path
havogt Feb 25, 2025
a03a141
gpu aware mpi
havogt Feb 25, 2025
112b875
cleanup
havogt Feb 25, 2025
29a8de3
no mpi executable
havogt Feb 25, 2025
3cc6954
cleanup, re-enable github actions
havogt Feb 25, 2025
df91cec
fix no mpi exec
havogt Feb 25, 2025
3d0ab85
remove match for old python versions
havogt Feb 25, 2025
b80856a
address review comments
havogt Feb 26, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ __pycache__
/*

# except
!ci
!cmake
!docs
!docs_src
Expand Down
1 change: 1 addition & 0 deletions .python_package/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ dist/
setup.cfg
*.egg-info/
src/gridtools_cpp/data
build/
63 changes: 63 additions & 0 deletions ci/base.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
ARG UBUNTU_VERSION=24.04
ARG CUDA_VERSION
FROM docker.io/nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION}
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8

ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get update -qq && \
apt-get install -qq -y --no-install-recommends \
gfortran \
g++ \
gcc \
strace \
build-essential \
tar \
wget \
curl \
cmake \
ca-certificates \
zlib1g-dev \
libssl-dev \
libbz2-dev \
libsqlite3-dev \
llvm \
libncurses5-dev \
libncursesw5-dev \
xz-utils \
tk-dev \
libffi-dev \
liblzma-dev \
libreadline-dev \
python3-dev \
python3-pip \
git \
rustc \
htop && \
rm -rf /var/lib/apt/lists/*

ARG MPICH_VERSION=3.3.2
ARG MPICH_PATH=/usr/local
RUN wget -q https://www.mpich.org/static/downloads/${MPICH_VERSION}/mpich-${MPICH_VERSION}.tar.gz && \
tar -xzf mpich-${MPICH_VERSION}.tar.gz && \
cd mpich-${MPICH_VERSION} && \
./configure \
--disable-fortran \
--prefix=$MPICH_PATH && \
make install -j32 && \
rm -rf /root/mpich-${MPICH_VERSION}.tar.gz /root/mpich-${MPICH_VERSION}
RUN echo "${MPICH_PATH}/lib" >> /etc/ld.so.conf.d/cscs.conf && ldconfig

ENV CXX=${MPICH_PATH}/bin/mpicxx
ENV CC=${MPICH_PATH}/bin/mpicc

RUN wget --quiet https://archives.boost.io/release/1.85.0/source/boost_1_85_0.tar.gz && \
echo be0d91732d5b0cc6fbb275c7939974457e79b54d6f07ce2e3dfdd68bef883b0b boost_1_85_0.tar.gz > boost_hash.txt && \
sha256sum -c boost_hash.txt && \
Comment on lines +55 to +56
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: just as a shortcut you can also do this:

Suggested change
echo be0d91732d5b0cc6fbb275c7939974457e79b54d6f07ce2e3dfdd68bef883b0b boost_1_85_0.tar.gz > boost_hash.txt && \
sha256sum -c boost_hash.txt && \
(echo be0d91732d5b0cc6fbb275c7939974457e79b54d6f07ce2e3dfdd68bef883b0b boost_1_85_0.tar.gz | sha256sum --check) && \

? (Also minor: I much prefer the long versions of options like --check since there's less guessing what c happens to stand for in this particular case).

Then again, I guess this will be removed soon enough anyway...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll not touch it, because I want to delete it later today.

tar xzf boost_1_85_0.tar.gz && \
mv boost_1_85_0/boost /usr/local/include/ && \
rm boost_1_85_0.tar.gz boost_hash.txt
ENV BOOST_ROOT /usr/local/

ENV CUDA_HOME /usr/local/cuda
ENV CUDA_ARCH=${CUDA_ARCH}
23 changes: 23 additions & 0 deletions ci/build.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
ARG BASE_IMAGE
FROM $BASE_IMAGE

COPY . /gridtools

ARG BUILD_TYPE

ENV GTRUN_BUILD_COMMAND='make -j 32'
ENV GTCMAKE_Boost_NO_BOOST_CMAKE=ON
ENV GTCMAKE_Boost_NO_SYSTEM_PATHS=ON
ENV GTCMAKE_GT_TESTS_REQUIRE_FORTRAN_COMPILER=ON
ENV GTCMAKE_GT_TESTS_REQUIRE_C_COMPILER=ON
ENV GTCMAKE_GT_TESTS_REQUIRE_OpenMP=ON
ENV GTCMAKE_GT_TESTS_REQUIRE_GPU=ON
ENV GTCMAKE_GT_TESTS_MPI_WITH_MPI_EXECUTABLE=OFF
ENV GTCMAKE_GT_TESTS_REQUIRE_Python=ON
ENV GT_ENABLE_STENCIL_DUMP=ON
ENV GTCMAKE_CMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON

RUN curl -LsSf https://astral.sh/uv/install.sh | sh
ENV PATH="/root/.local/bin:${PATH}"

RUN uv run /gridtools/pyutils/driver.py -v build -b ${BUILD_TYPE} -o build -i install -t install
81 changes: 81 additions & 0 deletions ci/cscs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
include:
- remote: "https://gitlab.com/cscs-ci/recipes/-/raw/master/templates/v2/.ci-ext.yml"

stages:
- baseimage
- build
- test

.build_baseimage:
stage: baseimage
# we create a tag that depends on the SHA value of ci/base.Dockerfile, this way
# a new base image is only built when the SHA of this file changes
# If there are more dependency files that should change the tag-name of the base container
# image, they can be added too.
# Since the base image name is runtime dependent, we need to carry the value of it to
# the following jobs via a dotenv file.
before_script:
# include build arguments in hash since we use a parameterized Docker file
- DOCKER_TAG=`echo "$(cat $DOCKERFILE) $DOCKER_BUILD_ARGS" | sha256sum | head -c 16`
- export PERSIST_IMAGE_NAME=$CSCS_REGISTRY_PATH/public/$ARCH/base/gridtools-ci:$DOCKER_TAG
- echo "BASE_IMAGE=$PERSIST_IMAGE_NAME" >> build.env
artifacts:
reports:
dotenv: build.env
variables:
DOCKERFILE: ci/base.Dockerfile
# change to 'always' if you want to rebuild, even if target tag exists already (if-not-exists is the default, i.e. we could also skip the variable)
CSCS_REBUILD_POLICY: if-not-exists
DOCKER_BUILD_ARGS: '["CUDA_VERSION=$CUDA_VERSION", "UBUNTU_VERSION=$UBUNTU_VERSION"]'
build_baseimage_aarch64:
extends: [.container-builder-cscs-gh200, .build_baseimage]
variables:
CUDA_VERSION: 12.6.2
CUDA_ARCH: sm_90
UBUNTU_VERSION: 24.04
SLURM_TIMELIMIT: 10


.build_image:
stage: build
variables:
# make sure we use a unique name here, otherwise we could create a race condition, when multiple pipelines
# are running.
PERSIST_IMAGE_NAME: $CSCS_REGISTRY_PATH/public/$ARCH/gridtools/gridtools-ci:$CI_COMMIT_SHA
DOCKERFILE: ci/build.Dockerfile
DOCKER_BUILD_ARGS: '["BASE_IMAGE=${BASE_IMAGE}", "BUILD_TYPE=release"]'
build_image_aarch64:
extends: [.container-builder-cscs-gh200, .build_image]
variables:
SLURM_TIMELIMIT: 10

.test_helper:
stage: test
image: $CSCS_REGISTRY_PATH/public/$ARCH/gridtools/gridtools-ci:$CI_COMMIT_SHA
variables:
GTRUN_WITH_SLURM: False # since we are already in a SLURM job
SLURM_JOB_NUM_NODES: 1
SLURM_TIMELIMIT: 10
CSCS_CUDA_MPS: 0

test_aarch64:
extends: [.container-runner-daint-gh200, .test_helper]
script:
- cd /build && ctest -LE mpi --output-on-failure
variables:
SLURM_NTASKS: 1

test_aarch64_mpi:
extends: [.container-runner-daint-gh200, .test_helper]
script:
- export LD_LIBRARY_PATH=/usr/lib64:$LD_LIBRARY_PATH
- export LD_PRELOAD=/usr/lib64/libmpi_gtl_cuda.so
- cd /build && ctest -L mpi --output-on-failure
variables:
NVIDIA_VISIBLE_DEVICES: all
SLURM_NTASKS: 4
SLURM_GPUS_PER_TASK: 1
MPICH_GPU_SUPPORT_ENABLED: 1
USE_MPI: "YES"
SLURM_MPI_TYPE: cray_shasta
CSCS_ADDITIONAL_MOUNTS: '["/opt/cray/pe/mpich/8.1.28/ofi/gnu/12.3/lib/libmpi.so:/usr/local/lib/libmpi.so.12.1.8", "/opt/cray/pe/lib64/libpmi.so.0:/usr/lib64/libpmi.so.0", "/opt/cray/pe/lib64/libpmi2.so.0:/usr/lib64/libpmi2.so.0", "/opt/cray/pals/1.4/lib/libpals.so.0:/usr/lib64/libpals.so.0", "/usr/lib64/libgfortran.so.5:/usr/lib64/libgfortran.so.5", "/opt/cray/pe/mpich/8.1.28/gtl/lib/libmpi_gtl_cuda.so:/usr/lib64/libmpi_gtl_cuda.so"]'
11 changes: 0 additions & 11 deletions examples/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -36,17 +36,6 @@ if(GT_INSTALL_EXAMPLES)

install_example(DIRECTORY boundaries SOURCES boundaries boundaries_provided)

configure_file(c_bindings/CMakeLists.txt.in
${CMAKE_CURRENT_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/c_bindings/CMakeLists.txt @ONLY)
install(FILES ${CMAKE_CURRENT_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/c_bindings/CMakeLists.txt
DESTINATION ${GT_INSTALL_EXAMPLES_PATH}/c_bindings)
install(
DIRECTORY c_bindings
DESTINATION ${GT_INSTALL_EXAMPLES_PATH}
PATTERN "CMakeLists.txt.in" EXCLUDE
)
list(APPEND enabled_examples c_bindings)

configure_file(cmake_skeletons/CMakeLists.txt.driver.in ${CMAKE_CURRENT_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/CMakeLists.txt @ONLY)
install(FILES ${CMAKE_CURRENT_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/CMakeLists.txt DESTINATION ${GT_INSTALL_EXAMPLES_PATH})

Expand Down
8 changes: 8 additions & 0 deletions pyutils/driver.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
#!/usr/bin/env python3

# /// script
# dependencies = [
# "matplotlib",
# "numpy",
# "python-dateutil",
# ]
# ///

import json
import os

Expand Down
65 changes: 43 additions & 22 deletions pyutils/pyutils/env.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,20 +10,36 @@
env = os.environ.copy()


def env_flag_to_bool(name: str, default: bool) -> bool:
"""Recognize true or false signaling string values."""
flag_value = None
if name in env:
flag_value = env[name].lower()
if flag_value is None:
return default
elif flag_value in ("0", "false", "off"):
return False
elif flag_value in ("1", "true", "on"):
return True
else:
raise ValueError(
"Invalid environment flag value: use '0 | false | off' or '1 | true | on'."
)


def load(envfile):
if not os.path.exists(envfile):
raise FileNotFoundError(f'Could find environment file "{envfile}"')
env['GTCMAKE_PYUTILS_ENVFILE'] = os.path.abspath(envfile)
env["GTCMAKE_PYUTILS_ENVFILE"] = os.path.abspath(envfile)

envdir, envfile = os.path.split(envfile)
output = runtools.run(
['bash', '-c', f'set -e && source {envfile} && env -0'],
cwd=envdir).strip('\0')
env.update(line.split('=', 1) for line in output.split('\0'))
["bash", "-c", f"set -e && source {envfile} && env -0"], cwd=envdir
).strip("\0")
env.update(line.split("=", 1) for line in output.split("\0"))

log.info(f'Loaded environment from {os.path.join(envdir, envfile)}')
log.debug(f'New environment',
'\n'.join(f'{k}={v}' for k, v in sorted(env.items())))
log.info(f"Loaded environment from {os.path.join(envdir, envfile)}")
log.debug(f"New environment", "\n".join(f"{k}={v}" for k, v in sorted(env.items())))


try:
Expand All @@ -36,39 +52,43 @@ def load(envfile):


def _items_with_tag(tag):
return {k[len(tag):]: v for k, v in env.items() if k.startswith(tag)}
return {k[len(tag) :]: v for k, v in env.items() if k.startswith(tag)}


def cmake_args():
args = []
for k, v in _items_with_tag('GTCMAKE_').items():
if v.strip().upper() in ('ON', 'OFF'):
k += ':BOOL'
for k, v in _items_with_tag("GTCMAKE_").items():
if v.strip().upper() in ("ON", "OFF"):
k += ":BOOL"
else:
k += ':STRING'
args.append(f'-D{k}={v}')
k += ":STRING"
args.append(f"-D{k}={v}")
return args


def set_cmake_arg(arg, value):
if isinstance(value, bool):
value = 'ON' if value else 'OFF'
env['GTCMAKE_' + arg] = value
value = "ON" if value else "OFF"
env["GTCMAKE_" + arg] = value


def sbatch_options(mpi):
options = _items_with_tag('GTRUN_SBATCH_')
options = _items_with_tag("GTRUN_SBATCH_")
if mpi:
options.update(_items_with_tag('GTRUNMPI_SBATCH_'))
options.update(_items_with_tag("GTRUNMPI_SBATCH_"))

return [
'--' + k.lower().replace('_', '-') + ('=' + v if v else '')
"--" + k.lower().replace("_", "-") + ("=" + v if v else "")
for k, v in options.items()
]


def build_command():
return env.get('GTRUN_BUILD_COMMAND', 'make').split()
return env.get("GTRUN_BUILD_COMMAND", "make").split()


def run_with_slurm() -> bool:
return env_flag_to_bool("GTRUN_WITH_SLURM", True)


def hostname():
Expand All @@ -90,9 +110,10 @@ def clustername():
'kesch'
"""
try:
output = runtools.run(['scontrol', 'show', 'config'])
m = re.compile(r'.*ClusterName\s*=\s*(\S*).*',
re.MULTILINE | re.DOTALL).match(output)
output = runtools.run(["scontrol", "show", "config"])
m = re.compile(r".*ClusterName\s*=\s*(\S*).*", re.MULTILINE | re.DOTALL).match(
output
)
if m:
return m.group(1)
except FileNotFoundError:
Expand Down
Loading
Loading