cybertronai · gabrielnan · May 25, 2026 · May 21, 2026 · May 25, 2026 · May 22, 2026
diff --git a/.gitignore b/.gitignore
@@ -21,3 +21,10 @@ env/
 # Env / OS
 .env
 .DS_Store
+
+# Local dev notes / scratch — explainers, plans, idea logs, experiment journals.
+.scratch/
+
+# Internal slot-claim metadata written by claim_slot.sh for cross-session
+# coordination (session id + heartbeat). Not for upstream.
+submissions/*/.CLAIMED
diff --git a/MAINTAINING.md b/MAINTAINING.md
@@ -0,0 +1,54 @@
+# Maintaining the leaderboard
+
+Notes for whoever has push access to `cybertronai/wikitext`.
+
+## Branching
+
+- **`main`** — stable. Every row of `README.md`'s Record History was scored
+  under the same setup.
+- **`dev`** — staging. Feature PRs (new submissions, new paradigms, harness
+  tweaks) target `dev` and merge as soon as review is green.
+- **`dev` → `main`** promotion PRs happen on a slower cadence, only when
+  `dev` is internally consistent (see re-run rule below).
+
+## The setup-change re-run rule
+
+If a PR changes anything that can move where existing submissions land on
+the leaderboard, the **prior leaderboard rows in `README.md` must be re-run
+on the new setup before that PR merges to `main`**. Otherwise the
+half-old/half-new comparison is meaningless.
+
+| Change | Triggers re-run? |
+|---|---|
+| `EnergyMeter` semantics, idle-baseline default, scoring formula | **Yes** |
+| Hardware pin (PCIe ↔ SXM4, A100 ↔ H100) | **Yes** |
+| `MAX_TRAIN_SECONDS`, `ACC_MIN`, eval window | **Yes** |
+| Container-image bump with numerical drift | **Maybe** — re-run if anything visibly drifts |
+| New submission, doc/typo, `.scratch/`, internal refactor | No |
+| Additive optional field on `result.json` (existing semantics intact) | No — but new field is `null` on old entries; mention in PR |
+
+When in doubt, re-run. ~$0.50/submission on Modal A100 is cheaper than a
+broken leaderboard.
+
+## Process
+
+1. Land the setup change on a branch (typically targeting `dev`); don't merge yet.
+2. Re-run the rows currently in `README.md`'s Record History on the new
+   harness — `python submit.py submissions/<slot> --yes`, fire in parallel
+   (Modal cap: 10 concurrent).
+3. When `result.json` files all reflect the new setup, append the re-run
+   rows to `README.md` (old rows stay as history) and add a dated banner
+   above the table noting the schema change.
+4. Restate the leaderboard table in the promotion PR body, confirming all
+   rows shown are under the new setup. Then merge.
+
+Don't: ship a half-new/half-old table; claim a new leader without re-running
+the priors; silently overwrite old `result.json` files without a banner in
+`README.md`.
+
+## Reference: setup-change events
+
+| Date | Change | PR | Re-ran upstream? |
+|---|---|---|---|
+| 2026-05-18 | Hardware pin: SXM4 → PCIe A100-80GB | (n/a) | partial — older SXM4 rows kept as history |
+| 2026-05-19 | `EnergyMeter` gains `cpu_energy_J` + `total_energy_J` via CodeCarbon | #4 | yes — `lwta_k2`, `lwta_k4`, `modded_nanogpt` re-run |
diff --git a/README.md b/README.md
@@ -23,6 +23,8 @@ python submit.py submissions/modded_nanogpt
 
 ## Record History
 
+The `Energy (J)` column reports **`total_energy_J`** (GPU NVML net of idle baseline + CodeCarbon CPU estimate, floored at `duration_s × 50 W`) for rows dated **2026-05-20 and later**. Earlier rows report the prior NVML-only `training_energy_J`. The semantic change is the new total-system-energy rule per @yaroslavvb2's Telegram note; see `MAINTAINING.md` and the `EnergyMeter` source for details. Upstream-leaderboard rows from before the change have been re-run under the new harness — those re-runs appear below as the canonical entries for those submissions; the original rows are preserved for history.
+
 | Date | Energy (J) | Val char-acc | GPU | Config | Submission | Contributor |
 |------|-----------:|-------------:|-----|--------|------------|-------------|
 | 2026-05-12 |     51,704 | 0.7374    | A100 80GB PCIe | modded_nanogpt | [dir](submissions/modded_nanogpt) | @KellerJordan |
@@ -33,6 +35,21 @@ python submit.py submissions/modded_nanogpt
 | 2026-05-18 |      3,612 |       DQ | A100 80GB PCIe | chunker_d1       | [dir](research/catalog/new_directions/chunker_d1)       | @ab-10 |
 | 2026-05-18 |        735 |       DQ | A100 80GB PCIe | ppm_c            | [dir](research/catalog/new_directions/ppm_c)            | @ab-10 |
 | 2026-05-17 |         70 |       DQ | A100 80GB SXM4 | P2-A_random_projection | [dir](research/forward-forward-deep/runs/phase2/P2-A_random_projection) | @ab-10 |
+| 2026-05-19 |     60,864 |       DQ | A100 80GB PCIe | mamba_byte           | [dir](submissions/mamba_byte)           | @gabrielnan |
+| 2026-05-20 |      1,752 |       DQ | A100 80GB SXM4 | gpu_ngram_w31_k10    | [dir](submissions/gpu_ngram_w31_k10)    | @gabrielnan |
+| 2026-05-20 |     13,936 |       DQ | A100 80GB SXM4 | chunker_phase1_v2    | [dir](submissions/chunker_phase1_v2)    | @gabrielnan |
+| 2026-05-20 |     24,417 |       DQ | A100 80GB SXM4 | bpe_internal_nn_v2   | [dir](submissions/bpe_internal_nn_v2)   | @gabrielnan |
+| 2026-05-20 |     53,683 | 0.7246    | A100 80GB PCIe | lwta_k4              | [dir](submissions/lwta_k4)              | @ab-10 (re-run on new harness; total_J = 44,329 gpu + 9,354 cpu) |
+| 2026-05-20 |     54,614 | 0.7145    | A100 80GB PCIe | lwta_k2              | [dir](submissions/lwta_k2)              | @ab-10 (re-run on new harness; total_J = 44,583 gpu + 10,031 cpu) |
+| 2026-05-21 |      2,474 | 0.7031    | A100 80GB PCIe | subset_70_mkn        | [dir](submissions/subset_70_mkn)        | @gabrielnan |
+| 2026-05-21 |      3,092 | 0.7050    | A100 80GB PCIe | gpu_ngram_w31_k11    | [dir](submissions/gpu_ngram_w31_k11)    | @gabrielnan |
+| 2026-05-21 |      4,607 | 0.7047    | A100 80GB PCIe | paq_mixer_v3         | [dir](submissions/paq_mixer_v3)         | @gabrielnan |
+| 2026-05-21 |      8,602 | 0.7184    | A100 80GB PCIe | gpu_ngram_o14_xorfix | [dir](submissions/gpu_ngram_o14_xorfix) | @gabrielnan |
+| 2026-05-21 |      9,591 | 0.7063    | A100 80GB PCIe | chunker_phase1_v1    | [dir](submissions/chunker_phase1_v1)    | @gabrielnan |
+| 2026-05-21 |     14,578 | 0.7184    | A100 80GB PCIe | deep_backoff_kn      | [dir](submissions/deep_backoff_kn)      | @gabrielnan |
+| 2026-05-21 |     19,922 | 0.7328    | A100 80GB SXM4 | lwta_k4_alpha_065    | [dir](submissions/lwta_k4_alpha_065)    | @gabrielnan |
+| 2026-05-21 |     20,743 | 0.7390    | A100 80GB SXM4 | alpha_06             | [dir](submissions/alpha_06)             | @gabrielnan |
+| 2026-05-21 |     62,006 | 0.7337    | A100 80GB SXM4 | modded_nanogpt       | [dir](submissions/modded_nanogpt)       | @ab-10 |
 
 
 ## Rules

diff --git a/requirements.txt b/requirements.txt
@@ -7,3 +7,9 @@ modal>=0.66
 # Optional: tests run with stdlib if pytest is missing, but `pytest
 # test_wikitext.py` gives nicer output.
 pytest
+# CodeCarbon: CPU energy estimation backend for EnergyMeter's
+# total_energy_J field. EnergyMeter reads ``tracker._total_cpu_energy``
+# after stop, which is internal to CodeCarbon — pin a minor range to
+# keep that path stable. Required on the leaderboard (raises if NVML is
+# available and this isn't); optional on dev boxes without a GPU.
+codecarbon~=3.2
diff --git a/run_eval.py b/run_eval.py
@@ -119,7 +119,7 @@ def main() -> None:
         if m is not None:
             print(f"training duration  : {m.duration_s:.1f}s")
             if m.energy_joules is not None:
-                print(f"training energy (J): {m.energy_joules:,.1f}  (at kill)")
+                print(f"training energy (J): {_fmt_training_energy(m)}  (at kill)")
         if args.results_json is not None:
             payload = {
                 "submission": submission_name,
@@ -128,6 +128,8 @@ def main() -> None:
                 "max_train_seconds": args.max_train_seconds,
                 "training_energy_J": m.energy_joules if m is not None else None,
                 "training_duration_s": m.duration_s if m is not None else None,
+                "cpu_energy_J": m.cpu_energy_J if m is not None else None,
+                "total_energy_J": m.total_energy_J if m is not None else None,
                 "gpu_name": _gpu_name(),
                 "date_utc": _utc_now(),
             }
@@ -153,7 +155,7 @@ def main() -> None:
               f"below floor {args.acc_min:.4f}")
         print(f"submission         : {submission_name}")
         if m.energy_joules is not None:
-            print(f"training energy (J): {m.energy_joules:,.1f}")
+            print(f"training energy (J): {_fmt_training_energy(m)}")
         print(f"training duration  : {m.duration_s:.1f}s")
         if args.results_json is not None:
             payload = {
@@ -165,6 +167,8 @@ def main() -> None:
                 "val_chars": val_result.n_chars,
                 "training_energy_J": m.energy_joules,
                 "training_duration_s": m.duration_s,
+                "cpu_energy_J": m.cpu_energy_J,
+                "total_energy_J": m.total_energy_J,
                 "gpu_name": _gpu_name(),
                 "date_utc": _utc_now(),
             }
@@ -174,10 +178,7 @@ def main() -> None:
 
     print("---")
     print(f"submission         : {submission_name}")
-    if m.energy_joules is not None:
-        print(f"training energy (J): {m.energy_joules:,.1f}")
-    else:
-        print("training energy (J): NOT MEASURED")
+    print(f"training energy (J): {_fmt_training_energy(m)}")
     print(f"training duration  : {m.duration_s:.1f}s")
     print(f"val  char-accuracy : {val_result.accuracy:.4f}")
     print(f"val  chars         : {val_result.n_chars:,}")
@@ -187,6 +188,8 @@ def main() -> None:
             "submission": submission_name,
             "training_energy_J": m.energy_joules,
             "training_duration_s": m.duration_s,
+            "cpu_energy_J": m.cpu_energy_J,
+            "total_energy_J": m.total_energy_J,
             "val_char_accuracy": val_result.accuracy,
             "val_chars": val_result.n_chars,
             "gpu_name": _gpu_name(),
@@ -211,5 +214,16 @@ def _utc_now() -> str:
             .replace(microsecond=0).isoformat().replace("+00:00", "Z"))
 
 
+def _fmt_training_energy(m) -> str:
+    if (m.total_energy_J is not None
+            and m.energy_joules is not None
+            and m.cpu_energy_J is not None):
+        return (f"{m.total_energy_J:,.1f} "
+                f"({m.energy_joules:,.1f} GPU + {m.cpu_energy_J:,.1f} CPU)")
+    if m.energy_joules is not None:
+        return f"{m.energy_joules:,.1f}"
+    return "NOT MEASURED"
+
+
 if __name__ == "__main__":
     main()