Skip to content

mesh: add get_mean_radius() + @collective_operation on radii accessors#206

Merged
lmoresi merged 2 commits into
developmentfrom
bugfix/mesh-mean-radius-collective
May 25, 2026
Merged

mesh: add get_mean_radius() + @collective_operation on radii accessors#206
lmoresi merged 2 commits into
developmentfrom
bugfix/mesh-mean-radius-collective

Conversation

@lmoresi
Copy link
Copy Markdown
Member

@lmoresi lmoresi commented May 25, 2026

Summary

  • Adds mesh.get_mean_radius() to complete the get_min_radius / get_max_radius / get_mean_radius triple — parallel-safe via MPI allreduce of local sum and count.
  • Decorates all three radii accessors with @uw.collective_operation so any use inside selective_ranks() raises a clear deadlock-warning before the run hangs.

Motivation

User code (an adaptive-mesh harness, scripts/stagnant_lid_adapt_loop.py on a feature branch) reached for mesh._radii.mean() to set a smoothing-length default. That's rank-local: each MPI rank computes its own mean → different gradient_smoothing_length passed into a screened-Poisson Vector_Projection → JIT generates different C source on different ranks → the existing JIT determinism guard correctly raised RuntimeError: JIT C-source hash differs across MPI ranks.

Symptom looked like a JIT non-determinism bug. Root cause was a parallel-unsafe accessor with no documented replacement. This PR adds the replacement.

Changes

src/underworld3/discretisation/discretisation_mesh.py:

  • New Mesh.get_mean_radius() method (28-line addition). Docstring explicitly warns against falling back to self._radii.mean() and explains the JIT-leak mechanism so the trap doesn't get rebuilt in user scripts.
  • @uw.collective_operation decorator on get_min_radius, get_max_radius, get_mean_radius (consistency — the decorator existed and was used in swarm.py / mesh_variables.py but wasn't yet applied here).

No other files changed. No API breakage.

Test plan

  • Serial smoke test: Annulus(cellSize=1/16).get_mean_radius() returns a sensible value (0.0328, between the existing min 0.0247 and max 0.0375)
  • Parallel smoke test (mpirun -n 2): all ranks report identical mean_radius to ~3e-5 (small float-order variation in cell-volume computation across partitions — fundamental, not a bug; parallel-CONSISTENT which is what matters for JIT)
  • Regression: the original failing parallel harness command now runs cleanly to step 20 with 4 productive adapts and identical vrms/Nu on both ranks. Reproducer in commit message.

Underworld development team with AI support from Claude Code

Adds a parallel-safe mean-cell-radius accessor mirroring the existing
get_min_radius / get_max_radius pattern (MPI allreduce of local sum
and count). Together the three methods form the canonical
"characteristic mesh length" API.

Motivation: user code (e.g. an adaptive mesh harness) reached for
mesh._radii.mean() to set a smoothing length default, which is
rank-local: different ranks compute different means → different
gradient_smoothing_length passed to a screened-Poisson
Vector_Projection → different JIT C source generated per rank →
the JIT determinism guard correctly raised a "C-source hash differs
across ranks" RuntimeError. Symptom looked like a JIT bug but was a
rank-local-data bug being correctly caught.

Also decorates all three radii methods with @uw.collective_operation
so any future use inside selective_ranks() blocks raises a clear
deadlock-warning before the run hangs.

The methods are documented as parallel-safe and the get_mean_radius
docstring explicitly warns against falling back to self._radii.mean()
with the JIT-leak explanation, so the same trap doesn't get rebuilt
in user scripts.

Underworld development team with AI support from Claude Code
Copilot AI review requested due to automatic review settings May 25, 2026 00:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a parallel-safe “mean mesh radius” accessor to Mesh, and marks the radius accessors as collective operations so they fail fast when called inside selective_ranks() (preventing MPI deadlocks and avoiding rank-divergent values leaking into JIT inputs).

Changes:

  • Added Mesh.get_mean_radius() implemented via MPI allreduce of local sum and count.
  • Decorated get_min_radius(), get_max_radius(), and get_mean_radius() with @uw.collective_operation to detect unsafe selective-rank usage early.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/underworld3/discretisation/discretisation_mesh.py
Comment thread src/underworld3/discretisation/discretisation_mesh.py
Comment thread src/underworld3/discretisation/discretisation_mesh.py
…ests

Three review comments from copilot-pull-request-reviewer on PR #206
addressed:

* get_mean_radius now passes op=MPI.SUM explicitly to allreduce
  instead of relying on the default (matches the codebase pattern
  used elsewhere; less brittle if mpi4py wrapper conventions change)
* docstring corrected — clarifies that _radii is the
  characteristic cell length (volume^(1/dim)) computed by
  DMPlexComputeGeometryFVM, not literally a centroid-to-face
  distance. The same correction applies to get_min/max_radius but
  those existing docstrings are not touched in this PR
* test coverage added:
  - tests/test_0008_mesh_radii_accessors.py: serial unit tests
    on Annulus and UnstructuredSimplexBox checking min/mean/max
    ordering and that all three methods carry the
    @uw.collective_operation decorator
  - tests/parallel/ptest_0008_mesh_radii_accessors.py: MPI test
    asserting all ranks see identical min/mean/max to 1e-12

Underworld development team with AI support from Claude Code
@lmoresi lmoresi merged commit 0ce05ae into development May 25, 2026
1 check passed
@lmoresi lmoresi deleted the bugfix/mesh-mean-radius-collective branch May 25, 2026 01:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants