Skip to content

FINDINGS 5/6 revisions, new entries 7 (algo-bug diagnosis) + 8 (TPU noise cracks Wishart wall)#51

Merged
wock9000 merged 2 commits into
trunkfrom
findings-coral-honest
May 14, 2026
Merged

FINDINGS 5/6 revisions, new entries 7 (algo-bug diagnosis) + 8 (TPU noise cracks Wishart wall)#51
wock9000 merged 2 commits into
trunkfrom
findings-coral-honest

Conversation

@wock9000
Copy link
Copy Markdown
Contributor

@wock9000 wock9000 commented May 14, 2026

Summary

Four updates to the V3-arc record, accumulated as the algorithmic fix in PR #50 unlocked a deeper experiment.

Entry 5 revision

The original "retires the planned experiment" framing was over-claimed. The economic argument was correct for the v0 per-step-over-SSH architecture, but FINDINGS 6 (co-location) and 7 (incremental energy) take the ratio from 209× to 9.4×. Inline revision banner names the actual remaining blockers.

Entry 6 revision

The "ARM bookkeeping dominates by 20×" attribution was wrong — most of the surplus was a single O(N²) bug in our own coral_anneal.py. Inline revision banner pointing at entry 7.

Entry 7 (new)

The 13.8× algorithmic speedup, the diagnostic lesson (architecture mis-attribution), and the updated lever ranking. Pairs with PR #50.

Entry 8 (new) — this is the consequential one

With the perf fix unblocking the Wishart truth-bench, we built the Wishart-J TPU model and ran the long-promised quality experiment. Same algorithm, same instance, same step budget:

backend algorithm gap_rel mean gap_rel min
Host NumPy MTM single-spin 89.45% 86.85%
Coral TPU per-step MTM single-spin 5.80% 5.60%
(reference: FINDINGS 3) PT + PA hybrid 5.5%

15× better solution quality from the Coral path. Roughly ties the previous Wishart-best (PT + PA hybrid at 5.5%) using strictly simpler machinery — single-replica, single-temperature MTM. The TPU's int8 quantization-aware matmul produces noisy ΔE values that act as stochastic dither on the Boltzmann argmax, broadening exploration through the Wishart wall. `exact_energy(s_best, J)` is computed host-side on the int8 J so the reported energies are the true energies of the configurations the Coral returned.

Reframes V3 from "fast accelerator for exact algorithm" to "stochastic solver where hardware-driven noise is essential."

The named follow-up: calibrated host stochastic-MTM. If a host variant with explicit randomization matches the Coral, the feature lives in the algorithm class. If the Coral still wins, something specific about the noise distribution is doing work.

Test plan

  • FINDINGS reads honestly across all four updates
  • Bench data committed (`docs/scaling/wishart_n1024_{host,coral}_v2.jsonl`)
  • Wishart-J input committed (`coral/wishart_J_n1024_a0.50_s0.npy`)
  • Reproduction commands in the entry's "Reproducible" section
  • Follow-up calibrated-host-stochastic-MTM experiment (separate work)

🤖 Generated with Claude Code

wock9000 and others added 2 commits May 14, 2026 02:39
Three honest corrections to the V3 arc record:

1. FINDINGS 5: drop the "retires the planned experiment" framing.
   The Wishart truth-bench on TPU was uncompetitive in the v0
   per-step-over-SSH architecture, but FINDINGS 6 (co-location)
   and FINDINGS 7 (incremental energy) bring the ratio from 209×
   to 9.4×. Inline revision banner names the actual blockers now
   (Wishart-J not baked into the model; one model per instance).

2. FINDINGS 6: the "ARM bookkeeping dominates by 20×" attribution
   was wrong. Almost all of the 41ms surplus was a single O(N²)
   recompute bug in our own coral_anneal.py code, not Python /
   numpy overhead. Inline revision banner pointing at FINDINGS 7.

3. FINDINGS 7 (new): the 13.8× speedup from the incremental fix,
   the diagnostic about over-attributing slowness to architecture
   when the cause is algorithmic, and the updated lever ranking
   (TPU matmul pipelining > batch matmul > C/Cython; C/Cython
   demoted to ~10% from "order of magnitude").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wishart N=1024 α=0.5 seed=0, 4 seeds × 3200 steps, same MTM:
- host (M3 Pro NumPy, exact int8 ΔE):  gap_rel mean 89.45%
- Coral Edge TPU (noisy ΔE):           gap_rel mean  5.80%

15× better solution quality from the Coral path on the same
algorithm. Roughly ties the prior best on this instance (PT + PA
hybrid at 5.5%, FINDINGS 3) using strictly simpler machinery —
single-replica, single-temperature MTM.

Mechanism: TPU's TFLite int8-quantized matmul produces approximations
of the exact int8 ΔE the host computes. That noise acts as
stochastic dither on the Boltzmann argmax, broadening exploration
through the Wishart wall. exact_energy(s_best, J) is recomputed
host-side at the end of each anneal so the reported energies are
the true energies of the configurations the Coral returned.

Reframes the V3 story: not "fast accelerator for exact algorithm",
but "stochastic solver where hardware-driven noise is essential to
performance." FINDINGS 5/6's "Coral uncompetitive" was about
throughput; on quality, the TPU is qualitatively better on the
glassy regime.

Key follow-up named: calibrated host stochastic-MTM. If a host
variant with explicit randomization matches the Coral, the feature
lives in the algorithm class. If the Coral still wins, something
specific about the noise distribution is doing work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@wock9000 wock9000 changed the title FINDINGS: revise entries 5 + 6 with corrections, add entry 7 (algorithmic-bug diagnosis) FINDINGS 5/6 revisions, new entries 7 (algo-bug diagnosis) + 8 (TPU noise cracks Wishart wall) May 14, 2026
@wock9000 wock9000 merged commit 1948663 into trunk May 14, 2026
1 check passed
@wock9000 wock9000 deleted the findings-coral-honest branch May 14, 2026 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant