Run WarpX on Apple Silicon GPUs through:
WarpX -> AMReX (SYCL) -> AdaptiveCpp SSCP -> Metal
This repo packages the patches, build scripts, validation tests, and benchmark workflow needed to get that stack working on macOS.
- End-to-end WarpX runs on Apple GPU through Metal.
- SYCL smoke tests pass on the AdaptiveCpp Metal backend.
- AMReX validation passes on Metal with the included patches.
- WarpX Langmuir tests pass in 2D and 3D on the Metal path.
- Benchmarks on an Apple M4 Pro show GPU wins at larger 3D problem sizes.
The work has been tested on an Apple M4 Pro with 16 GPU compute units.
- Apple Silicon Mac
- macOS 14+
- Xcode 16+ and command-line tools
- Homebrew
- Internet access to clone upstream repos and download
metal-cpp
The scripts install and use:
llvm@20for the AdaptiveCpp Metal backend buildllvm@18forld64.lld, which is not shipped in the Homebrewllvm@20bottleboost,cmake,ninja, andlibomp
Run the full GPU build and validation flow from the repo root:
./scripts/00-install-deps.sh
./scripts/01-build-adaptivecpp.sh
./scripts/02-validate-metal.sh
./scripts/03-build-amrex.sh
./scripts/04-validate-amrex.sh
./scripts/05-build-warpx.sh
./scripts/06-validate-warpx.shWhat you should expect:
- First GPU runs are slow because AdaptiveCpp JIT-compiles LLVM IR to Metal.
- JIT artifacts are cached under
~/.acpp/apps/global/jit-cache/. - Build products and cloned sources live under
opt/andextern/.
scripts/00-install-deps.sh installs Homebrew
packages, checks the Metal toolchain, and downloads metal-cpp headers into
opt/metal-cpp.
If xcrun metal exists but is not functional, the script will point you to:
xcodebuild -downloadComponent MetalToolchainscripts/01-build-adaptivecpp.sh:
- clones
AdaptiveCppintoextern/AdaptiveCpp - applies patches from
patches/adaptivecpp/ - configures CMake with
WITH_METAL_BACKEND=ON - installs into
opt/adaptivecpp
scripts/02-validate-metal.sh compiles and
runs the SYCL smoke tests in tests/sycl/:
device_queryvector_addusm_testreduction_test
These confirm device discovery, basic kernels, shared USM, and atomics.
- clones
AMReXintoextern/amrex - applies the AMReX patch set and file replacements from
patches/amrex/ - configures a SYCL build with single precision and MPI/Fortran disabled
- installs into
opt/amrex
scripts/04-validate-amrex.sh builds and runs
the AMReX HeatEquation test from tests/amrex/heat_equation.
- clones
WarpXintoextern/warpx - re-applies the Metal-compatible AMReX source changes used by the WarpX subbuild
- configures WarpX with:
WarpX_COMPUTE=SYCLWarpX_PRECISION=SINGLEWarpX_PARTICLE_PRECISION=SINGLEWarpX_MPI=OFFWarpX_FFT=OFFWarpX_QED=OFFWarpX_OPENPMD=OFF
scripts/06-validate-warpx.sh runs reduced
Langmuir tests on the Metal backend:
- 2D:
inputs_test_2d_langmuir_multi - 3D:
inputs_test_3d_langmuir_multi
Logs are written under tests/warpx/results/.
Build a CPU baseline and compare it to the GPU build:
./scripts/07-build-warpx-cpu.sh
./scripts/08-benchmark.shUseful variants:
./scripts/08-benchmark.sh --quick
./scripts/08-benchmark.sh --gpu-only
./scripts/08-benchmark.sh --cpu-onlyCurrent benchmark summary from benchmarks/RESULTS.md:
| Test | CPU 12T s/step | GPU s/step | Speedup |
|---|---|---|---|
| 128x128 2D | 0.0027 | 0.0172 | 0.16x |
| 512x512 2D | 0.0173 | 0.0208 | 0.83x |
| 64^3 3D | 0.0081 | 0.0181 | 0.45x |
| 128^3 3D | 0.0560 | 0.0402 | 1.39x |
On the measured M4 Pro system, the GPU only pulls ahead once the problem is large enough to amortize Metal submission overhead.
Capture a Metal System Trace with Instruments:
./scripts/09-profile-metal.sh
./scripts/09-profile-metal.sh langmuir_3d_largeThis records traces under benchmarks/profiles/.
- Apple GPUs do not support FP64, so all builds are single precision.
__int128is disabled for the Metal path.host_taskis not available in AdaptiveCpp here; the AMReX integration uses alternate cleanup paths.- PSATD / FFT-based solvers are disabled. This setup supports the FDTD path.
- The first execution of a new binary pays JIT compilation cost.
This repo keeps the portability and performance fixes as patches instead of as a forked monorepo snapshot.
Key patches:
patches/adaptivecpp/0008-metal-all-warpx-fixes.patchFixes Metal codegen issues needed for WarpX correctness, including thread address-space handling.patches/adaptivecpp/0009-metal-batch-command-buffer.patchBatches Metal command buffer submission to remove large per-kernel overhead.patches/adaptivecpp/0010-metal-dtoh-fast-path.patchAdds a faster device-to-host path for CPU-accessible USM allocations.patches/amrex/0002-amrex-sscp-atomic-fix.patchRestores correct atomic behavior under AdaptiveCpp SSCP.patches/amrex/0003-amrex-redistribute-no-mpi-sync.patchRemoves redundant synchronizations in the no-MPI redistribute path.
The AMReX directory also includes patched replacement files for SYCL CMake and RNG-related sources.
scripts/ Build, validation, benchmark, and profiling entry points
patches/ AdaptiveCpp and AMReX patches plus AMReX replacement files
tests/ SYCL, AMReX, and local WarpX validation inputs
benchmarks/ Benchmark inputs, raw results, and summary report
docs/ Notes, issues, and implementation background
extern/ Cloned upstream source trees (created by scripts)
opt/ Installed toolchains and local build artifacts (created by scripts)
scripts/env.shdefines the shared paths used by all other scripts.- The build scripts reset the cloned
extern/trees before re-applying patches. Treat those directories as generated workspace state. - More detailed debugging notes and historical context live in
docs/known-issues.md.
BSD-3-Clause. See LICENSE.