floatingspeed

Various tests of numerical performance, comparing languages, compilers, flags, etc.

Alex Barnett 2016--2026

Contributions by:
Jeremy Magland
Libin Lu
many others

arraysinglethread: compare C/C++/Fortran speeds for complex and real double-precision arithmetic, on a 1D array.
lap3dkernel: compare various languages speed in a multithreaded implementation of direct summation of the 1/r kernel.

Open in a containerized development environment using vscode

You can use Visual Studio Code to open a containerized development environment. This creates a Docker container with all of the compilers and tools pre-installed, including Fortran, Python, Octave, g++, and Julia. You can then run the tests and seamlessly edit the code all within this environment, and the source files are automatically sync'd with the host environment.

Note: MATLAB is not installed within the development container.

Prerequisites

Instructions

# Clone this repo
git clone [repo-url] floatingspeed
cd floatingspeed

### Open in Visual Studio Code
code .

Install the Remote-Containers vscode extension (if not already installed)

In the lower-left corner of the vscode window you should see a little green icon. Click that and select "Reopen in Container"

Now the container will be built on your system (may take several minutes). And then your project will appear -- inside the development environment container.

If you are curious, the Docker recipe for installing all the compilers can be found at .devcontainer/Dockerfile

You can now either run the tests from the built-in vscode terminal, or you can run them using the built-in tasks for this project.

To run a task, launch the vscode command menu (Ctrl+Shift+P) and search for "Run Task" and press enter. Then select one of the two tasks that appear (e.g., "Run arraysinglethread test") and press enter. Then enter again to continue. A new vscode terminal will open and run your test.

If you are curious, the tasks are configured at .vscode/tasks.json.

Summary of this repo produced by codex CLI using ChatGPT 5.2

I'm preparing a concise high-level explanation covering the two main benchmarks—arraysinglethread testing complex arithmetic in a single thread, and lap3dkernel performing 3D 1/r sums across languages with SIMD and OpenMP. I'll describe key files, script entry points, output locations, and note VCL inclusion, while briefly addressing the non-critical display errors. I'll also ─ Worked for 1m 11s ────────────────────────────────────────────────────────────

• Here’s the shape of the repo and what each part does, based on the files in / home/alex/numerics/floatingspeed:

arraysinglethread/ — single‑core array arithmetic

Purpose: compare real vs complex arithmetic performance in C, C++, and Fortran on large 1D arrays, mostly memory‑bound. See arraysinglethread/ README.
Runner script: arraysinglethread/do_complextimings.sh compiles and runs three binaries (Fortran, C++ with C complex, C++ with std::complex). You tweak FLAGS, GF, and GXX here.
C++ test: arraysinglethread/complexmulttiming.cpp allocates large arrays (default M=1e8), fills with fixed random data, and times multiply‑add loops across several representations:
- real doubles
- “by hand” complex (separate real/imag arrays and interleaved)
- C double complex (if USE_C_TYPE_COMPLEX)
- C++ std::complex (if USE_CPP_TYPE_COMPLEX)
Fortran test: arraysinglethread/complexmulttiming.f does the same for real8 and complex16.
Timing utility: arraysinglethread/utils.h + arraysinglethread/utils.cpp provide a simple CNTime timer.
Results: arraysinglethread/results/*.txt are saved benchmark outputs for various machines/flags.

lap3dkernel/ — multithreaded 3D 1/r kernel

Purpose: benchmark dense direct N‑body summation (1/r kernel) from ns=1e4 sources to nt=1e4 targets. See lap3dkernel/README.
Runner script: lap3dkernel/run_all_tests.sh compiles/runs C++ SIMD + OpenMP, Fortran + OpenMP, Julia, Python+Numba, MATLAB, Octave. It expects those toolchains to be installed. What the benchmarks actually measure
arraysinglethread: raw array traversal + multiply‑add with accumulation; highlights compiler flags and complex number representation overhead.
lap3dkernel: “source‑target pairs per second” for 3D 1/r kernel; highlights threading, SIMD, and compiler differences across languages.

If you want, tell me which part you want to go deeper on (e.g., “walk through the C++ SIMD kernel,” “what flags matter most,” or “how to run just one benchmark”), and I’ll zoom in.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

floatingspeed

Contents

Open in a containerized development environment using vscode

Prerequisites

Instructions

Summary of this repo produced by codex CLI using ChatGPT 5.2

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

floatingspeed

Contents

Open in a containerized development environment using vscode

Prerequisites

Instructions

Summary of this repo produced by codex CLI using ChatGPT 5.2