FINDINGS 5/6 revisions, new entries 7 (algo-bug diagnosis) + 8 (TPU noise cracks Wishart wall)#51
Merged
Merged
Conversation
Three honest corrections to the V3 arc record: 1. FINDINGS 5: drop the "retires the planned experiment" framing. The Wishart truth-bench on TPU was uncompetitive in the v0 per-step-over-SSH architecture, but FINDINGS 6 (co-location) and FINDINGS 7 (incremental energy) bring the ratio from 209× to 9.4×. Inline revision banner names the actual blockers now (Wishart-J not baked into the model; one model per instance). 2. FINDINGS 6: the "ARM bookkeeping dominates by 20×" attribution was wrong. Almost all of the 41ms surplus was a single O(N²) recompute bug in our own coral_anneal.py code, not Python / numpy overhead. Inline revision banner pointing at FINDINGS 7. 3. FINDINGS 7 (new): the 13.8× speedup from the incremental fix, the diagnostic about over-attributing slowness to architecture when the cause is algorithmic, and the updated lever ranking (TPU matmul pipelining > batch matmul > C/Cython; C/Cython demoted to ~10% from "order of magnitude"). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wishart N=1024 α=0.5 seed=0, 4 seeds × 3200 steps, same MTM: - host (M3 Pro NumPy, exact int8 ΔE): gap_rel mean 89.45% - Coral Edge TPU (noisy ΔE): gap_rel mean 5.80% 15× better solution quality from the Coral path on the same algorithm. Roughly ties the prior best on this instance (PT + PA hybrid at 5.5%, FINDINGS 3) using strictly simpler machinery — single-replica, single-temperature MTM. Mechanism: TPU's TFLite int8-quantized matmul produces approximations of the exact int8 ΔE the host computes. That noise acts as stochastic dither on the Boltzmann argmax, broadening exploration through the Wishart wall. exact_energy(s_best, J) is recomputed host-side at the end of each anneal so the reported energies are the true energies of the configurations the Coral returned. Reframes the V3 story: not "fast accelerator for exact algorithm", but "stochastic solver where hardware-driven noise is essential to performance." FINDINGS 5/6's "Coral uncompetitive" was about throughput; on quality, the TPU is qualitatively better on the glassy regime. Key follow-up named: calibrated host stochastic-MTM. If a host variant with explicit randomization matches the Coral, the feature lives in the algorithm class. If the Coral still wins, something specific about the noise distribution is doing work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Four updates to the V3-arc record, accumulated as the algorithmic fix in PR #50 unlocked a deeper experiment.
Entry 5 revision
The original "retires the planned experiment" framing was over-claimed. The economic argument was correct for the v0 per-step-over-SSH architecture, but FINDINGS 6 (co-location) and 7 (incremental energy) take the ratio from 209× to 9.4×. Inline revision banner names the actual remaining blockers.
Entry 6 revision
The "ARM bookkeeping dominates by 20×" attribution was wrong — most of the surplus was a single O(N²) bug in our own coral_anneal.py. Inline revision banner pointing at entry 7.
Entry 7 (new)
The 13.8× algorithmic speedup, the diagnostic lesson (architecture mis-attribution), and the updated lever ranking. Pairs with PR #50.
Entry 8 (new) — this is the consequential one
With the perf fix unblocking the Wishart truth-bench, we built the Wishart-J TPU model and ran the long-promised quality experiment. Same algorithm, same instance, same step budget:
15× better solution quality from the Coral path. Roughly ties the previous Wishart-best (PT + PA hybrid at 5.5%) using strictly simpler machinery — single-replica, single-temperature MTM. The TPU's int8 quantization-aware matmul produces noisy ΔE values that act as stochastic dither on the Boltzmann argmax, broadening exploration through the Wishart wall. `exact_energy(s_best, J)` is computed host-side on the int8 J so the reported energies are the true energies of the configurations the Coral returned.
Reframes V3 from "fast accelerator for exact algorithm" to "stochastic solver where hardware-driven noise is essential."
The named follow-up: calibrated host stochastic-MTM. If a host variant with explicit randomization matches the Coral, the feature lives in the algorithm class. If the Coral still wins, something specific about the noise distribution is doing work.
Test plan
🤖 Generated with Claude Code