coral_anneal: 13.8× per-step speedup via incremental energy tracking by wock9000 · Pull Request #50 · Xylem-Group/lopt

wock9000 · 2026-05-14T09:38:13Z

Summary

One-line algorithmic fix in `coral_anneal.py`: the previous `run_mtm_anneal` called `exact_energy(spins, J)` on every accepted Metropolis move (O(N²) — a full `-½sᵀJs` recompute), then took the delta. Replaced with incremental `e_cur += dE_true` since we already compute `dE_true` exactly in int64 arithmetic on each accept; the result is bit-equivalent without the recompute.

This was FINDINGS entry 6's "Coral-side per-step bookkeeping" bottleneck. The diagnosis (C/Cython rewrite of softmax) was wrong — it was an algorithmic bug in our own code, masquerading as Python/numpy overhead.

Result (same instance + seeds as FINDINGS 6: random_spin_glass N=1024, d=0.4, s=42; 4 seeds × 3200 steps)

variant	ms/step	e_best mean	host-to-Coral ratio
Host (M3 Pro) NumPy	0.34	−14463	1×
Coral over-network (PR #47)	68.78	(J-mismatch)	209×
Coral co-located v1 (FINDINGS 6)	43.33	−14749	133×
Coral co-located v2 (this PR)	3.14	−14749	9.4×

13.8× per-step speedup. Cumulative improvement from the original V3-over-network number: 22×. Quality unchanged.

What's NOT in this PR

FINDINGS revisions (the entry 5 over-claim about "retiring the experiment" and the new entry for this win) — separate PR so the perf number can be reviewed in isolation.
Wishart-J quality experiment — separate work; requires rebuilding the TPU model with a Wishart-J baked in.
Further Coral optimizations (TPU matmul pipelining, batched-replica matmul). The remaining headroom on this arc is ~3× more if pipelined, ~10× more if batched — but the next experiment to run is the quality one, not more optimization.

Test plan

e_best parity confirmed against FINDINGS 6's data (−14947 min, −14749 mean — exact match)
Quality unchanged (within seed variance)
Reproduces from `tools/coral_anneal_bench.py` end-to-end

🤖 Generated with Claude Code

The old code called exact_energy(spins, J) on every accepted move, which is O(N²) per accept (~10–20ms at N=1024 on the Coral ARM). Replaced with incremental e_cur += dE_true. Mathematically identical: integer dE is exact for int8 J + int64 accumulator, no drift. Also pre-promotes J and spins to int64 once outside the loop so each per-step row dot product runs without per-call type coercion. Measured on the same instance + seeds as FINDINGS 6 (random_spin_glass N=1024 d=0.4 s=42, 4 seeds × 3200 steps): before (FINDINGS 6): 43.3 ms/step after (this commit): 3.14 ms/step speedup: 13.8× Quality unchanged (e_best mean −14749 both runs). The previous "C/Cython rewrite of ARM bookkeeping" diagnosis in FINDINGS 6 was wrong — the bottleneck was an algorithmic O(N²) bug in our own code, not Python/numpy overhead. Host-vs-Coral gap closes from 133× to 9.4×. Coral is now fast enough that running real experiments on it is economical, which un-blocks the planned Wishart truth-bench (separate work). Reproducible: tools/coral_anneal_bench.py, data in docs/scaling/coral_colocated_anneal_v2.jsonl. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

wock9000 mentioned this pull request May 14, 2026

FINDINGS 5/6 revisions, new entries 7 (algo-bug diagnosis) + 8 (TPU noise cracks Wishart wall) #51

Merged

5 tasks

wock9000 merged commit 793e319 into trunk May 14, 2026
1 check passed

wock9000 deleted the coral-side-perf branch May 14, 2026 14:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

coral_anneal: 13.8× per-step speedup via incremental energy tracking#50

coral_anneal: 13.8× per-step speedup via incremental energy tracking#50
wock9000 merged 1 commit into
trunkfrom
coral-side-perf

wock9000 commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wock9000 commented May 14, 2026

Summary

Result (same instance + seeds as FINDINGS 6: random_spin_glass N=1024, d=0.4, s=42; 4 seeds × 3200 steps)

What's NOT in this PR

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant