Port to nanobind and restructure for performance by janbridley · Pull Request #71 · glotzerlab/spatula

janbridley · 2025-11-22T00:20:27Z

Description

This is a combination rewrite/refactor that aims to (1) replace pybind11 code with nanobind and (2) restructure data types and the code layout to interface better with cymmetry (upcoming library & publication) and new optimizer developments.

The biggest change is a separation of python interfaces from c++ source code -- bindings are consolidated to separate files, and much of the library is now available as headers for use in other tools. The area of the interface is also reduced, with nanobind's ndarray replacing python exports for Vec3 and Quaternion types exported in previous versions. Finally, the data classes (PGOP- and BOOSOPStore) have been removed. These adapters were added in previous versions for performance reasons, but nanobind allows for copy-free array translation that removes this necessity. These changes, combined with a few other optimizations, result in a 400-500% performance increase across the board.

Changed

Project now uses C++ 20 (primarily for std::span). This is great for ergonomics and makes a future Eigen3 port much easier (Eigen::Map is very similar)
Nanobind exports replace pybind11
Optimizers are now header-only
Locality code is now header-only
Vec3 and Quaternion are now header-only
BondOrder is now header-only
Metrics and Utils (excluding QlmEval) are now header only
pgop.py::BOOSOP code is now in separate file boosop.py.
Many std::vector<std::vector<...>> are now vectors of pointers, allowing for copy- and move- free access to python data. Matrix elements are accessed with std::span and cast to statically-allocated types for performance.
py::array are now replaced with std::vector or type* pointers
Implied rotation matrix type (std::vector<double>) is now typedef RotationMatrix = std::array<double, 9>

Removed

PGOPStore
BOOSOPStore
Unused python bindings (quaternion, vec3, QLMEval, metrics)

Added

m_group_sizes class method for PGOP, which stores the size of each group (currently, (group order - 1) * 9). Previous code used vector.size, which requires copies and allocations for both individual elements and entire groups.
RotationMatrix std::array wrapper for fast and strongly typed vector rotations
-DENABLE_PROFILING flag to allow for easy profiling

Benchmarking

uv pip install . --config-settings=cmake.args="-DENABLE_PROFILING=ON"  --config-settings=cmake.build-type="RelWithDebInfo"

Before this PR

Compute PGOP for mesh of 600 points, computed for an icosahedron (N=12, N_query=1):

--- Benchmarking C2 symmetry ---
  PGOP: 0.9031 ± 0.0057 (mean ± std. dev.)
  Time: 0.59 μs ± 0.01 per trial(mean ± std. dev. of 10 runs, 50 orientations each)

--- Benchmarking D5d symmetry ---
  PGOP: 0.8648 ± 0.0413 (mean ± std. dev.)
  Time: 8.40 μs ± 0.03 per trial(mean ± std. dev. of 10 runs, 50 orientations each)

--- Benchmarking T symmetry ---
  PGOP: 0.8623 ± 0.0286 (mean ± std. dev.)
  Time: 4.91 μs ± 0.02 per trial(mean ± std. dev. of 10 runs, 50 orientations each)

--- Benchmarking Ih symmetry ---
  PGOP: 0.8365 ± 0.0506 (mean ± std. dev.)
  Time: 51.63 μs ± 0.45 per trial(mean ± std. dev. of 10 runs, 50 orientations each)

Same approach, just with mode = "boo" and sigma=177.7 (~kappa = 0.075):

 ~ spatula-analysis==0.1.1 (from file:///Users/jenna/github/spatula)
--- Benchmarking C2 symmetry ---
  PGOP: 0.8981 ± 0.0104 (mean ± std. dev.)
  Time: 0.97 μs ± 0.05 per trial(mean ± std. dev. of 10 runs, 50 orientations each)

--- Benchmarking D5d symmetry ---
  PGOP: 0.8231 ± 0.0657 (mean ± std. dev.)
  Time: 16.03 μs ± 0.11 per trial(mean ± std. dev. of 10 runs, 50 orientations each)

--- Benchmarking T symmetry ---
  PGOP: 0.8263 ± 0.0487 (mean ± std. dev.)
  Time: 9.30 μs ± 0.10 per trial(mean ± std. dev. of 10 runs, 50 orientations each)

--- Benchmarking Ih symmetry ---
  PGOP: 0.7805 ± 0.0816 (mean ± std. dev.)
  Time: 99.19 μs ± 0.69 per trial(mean ± std. dev. of 10 runs, 50 orientations each)

After this PR

Compute PGOP for mesh of 600 points, computed for an icosahedron (N=12, N_query=1):

--- Benchmarking C2 symmetry ---
  PGOP: 0.90312034 ± 0.00567270 (mean ± std. dev.)
  Time: 0.17 μs ± 0.02 per trial(mean ± std. dev. of 10 runs, 50 orientations each)

--- Benchmarking D5d symmetry ---
  PGOP: 0.86482608 ± 0.04126538 (mean ± std. dev.)
  Time: 1.77 μs ± 0.02 per trial(mean ± std. dev. of 10 runs, 50 orientations each)

--- Benchmarking T symmetry ---
  PGOP: 0.86229689 ± 0.02857537 (mean ± std. dev.)
  Time: 1.04 μs ± 0.02 per trial(mean ± std. dev. of 10 runs, 50 orientations each)

--- Benchmarking Ih symmetry ---
  PGOP: 0.83650060 ± 0.05056483 (mean ± std. dev.)
  Time: 10.67 μs ± 0.10 per trial(mean ± std. dev. of 10 runs, 50 orientations each)

With mode="boo" and the same sigma/kappa conversion

--- Benchmarking C2 symmetry ---
  PGOP: 0.89811896 ± 0.01037498 (mean ± std. dev.)
  Time: 0.17 μs ± 0.01 per trial(mean ± std. dev. of 10 runs, 50 orientations each)

--- Benchmarking D5d symmetry ---
  PGOP: 0.82308454 ± 0.06566185 (mean ± std. dev.)
  Time: 1.50 μs ± 0.03 per trial(mean ± std. dev. of 10 runs, 50 orientations each)

--- Benchmarking T symmetry ---
  PGOP: 0.82627869 ± 0.04871954 (mean ± std. dev.)
  Time: 0.91 μs ± 0.02 per trial(mean ± std. dev. of 10 runs, 50 orientations each)

--- Benchmarking Ih symmetry ---
  PGOP: 0.78053632 ± 0.08155272 (mean ± std. dev.)
  Time: 8.90 μs ± 0.07 per trial(mean ± std. dev. of 10 runs, 50 orientations each)

Motivation and Context

Resolves: #???

How Has This Been Tested?

Checklist:

I have reviewed the Contributor Guidelines.
I agree with the terms of the spatula Contributor Agreement.
My name is on the list of contributors in the pull request source branch.
I have updated the Change log.

janbridley · 2026-02-10T21:51:53Z

Note that 46affaf was a big merge containing the 2026 header updates -- I need to make sure I didn't break anything

janbridley · 2026-02-10T22:02:49Z

Note that 46affaf was a big merge containing the 2026 header updates -- I need to make sure I didn't break anything

Should be solved in 5287386

This reverts commit 980dc90.

janbridley · 2026-02-11T19:52:19Z

@DomFijan Somehow this works only on macOS -- do you get more debug output on your end?

janbridley added 30 commits November 17, 2025 16:37

Remove from src/*.cc

6371141

WIP

d828a90

wip

eb7f8b6

Working but tests fail

c46987a

Flatten

e4f1724

Working?

2029468

More shape and size

87ee168

Remove from header

a4f0364

FOrmat

9866dc4

wip

c5d0774

Undo commits

c66382c

Refactor threads

eefba9a

Threads header only

9854891

Metrics -> header

1600939

util pybind free

53dedca

Util header only

4e25f08

BondOrder header only

c6cd983

Add nanobind

b0b1d33

Metrics -> nb

44a65e3

Refactor threads

887d343

OPtimize

b407ef1

quaternion

3e61f34

wip

9263388

Cleanup

dc918b4

Port optimize WIP

af8905b

wip

eae3fd9

wip

7b3291b

Back further

139434b

Back to working state

823d286

Fix failing test

b834782

janbridley added 4 commits December 6, 2025 22:35

Better names

aed442a

Mark spans const

0e52e94

Update notes on fastmath

7ca732a

Strip NEON code

02bb4ad

DomFijan reviewed Feb 10, 2026

View reviewed changes

Comment thread benchmarks/microbenchmark.py

Merge branch origin/main into self

46affaf

janbridley added 2 commits February 10, 2026 16:52

prek

b20b96a

re-add nanobind to lockfile

5287386