From 91e1eb8e649dea1f95330bdf029c7fbbeaf19630 Mon Sep 17 00:00:00 2001
From: Gabriel Nakajima An <naka@Gabriels-MacBook-Pro.local>
Date: Wed, 20 May 2026 22:44:01 -0700
Subject: [PATCH 1/5] Add total-system-energy reporting via CodeCarbon CPU
 backend
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Closes the gap where NVML-only measurement missed host CPU work
(raised by @yaroslavvb2 in Telegram on 2026-05-19: "Just total
system energy, subject to time + accuracy constraint"; "not
counting CPU utilization is a bit of leak").

Approach
---
CodeCarbon as the CPU backend, TDP-fallback mode (no MSR / RAPL /
``/dev/cpu/*/msr`` needed — Modal containers can't access them).
Field standard for cloud-ML energy reporting: HuggingFace ``Trainer``
auto-logs CodeCarbon when installed; Patterson et al. 2021/2022 used
the same TDP-based estimate; ML.ENERGY (Michigan SymbioticLab Zeus)
reports GPU-only because the same container constraint applies.

Behaviour
---
| NVML | CodeCarbon | Behaviour                                |
|------|------------|------------------------------------------|
| ✓    | ✓          | both fields populate, total = sum (floor)|
| ✓    | ✗          | EnergyMeter() raises RuntimeError        |
| ✗    | ✓          | soft; both energy fields None            |
| ✗    | ✗          | soft; both energy fields None            |

Loud-fail on real-GPU-with-broken-CPU prevents silent half-measurement
from landing inconsistent rows on the leaderboard. Dev-box patterns
(no GPU) stay soft so local smoke tests on a laptop still work.

Code changes
---
- ``Measurement`` dataclass gains ``cpu_energy_J`` and ``total_energy_J``
  (both ``float | None``, default ``None``). ``__str__`` includes them
  when populated.
- ``EnergyMeter`` refactored to take pluggable ``gpu_backend`` /
  ``cpu_backend`` / ``p_floor_watts`` kwargs (dependency injection for
  testability). Default backends wrap pynvml and CodeCarbon. Raises
  RuntimeError if NVML is available but the CPU backend isn't.
- ``measure()`` populates the new fields on the yielded Measurement;
  ``total_energy_J = max(gpu + cpu, duration_s * p_floor_watts)`` —
  floor protects against CodeCarbon under-attribution.
- ``run_eval.py`` writes the new fields to ``result.json`` in all
  three exit paths (pass, DQ time, DQ acc).
- ``submit.py`` adds ``codecarbon~=3.2`` to the Modal image, and
  ``append_record`` writes ``total_energy_J`` to README's Record History
  column when present, falling back to ``training_energy_J`` for
  pre-PR runs.
- ``requirements.txt`` adds ``codecarbon~=3.2`` as a local dep (minor
  pinned because EnergyMeter reads CodeCarbon's internal
  ``tracker._total_cpu_energy.kWh``).
- ``README.md`` adds a dated banner above the Record History noting
  that rows ≥ 2026-05-20 report ``total_energy_J``; earlier rows are
  kept as historical NVML-only readings.
- New ``MAINTAINING.md`` at the repo root documents (a) the
  setup-change re-run rule (when the harness changes in a way that
  shifts where existing submissions land, re-run the leaderboard rows
  before merging to main) and (b) the ``main`` ↔ ``dev`` branching
  cadence (feature PRs target ``dev``; slow-cadence promotion PRs
  ``dev`` → ``main``).
- ``.gitignore`` adds ``submissions/*/.CLAIMED`` (internal slot-claim
  metadata used by cross-session coordination scripts, not for upstream).

Backward compatibility
---
No existing field changes meaning, no existing test breaks.
- ``energy_joules`` keeps its prior semantic (GPU NVML net of idle
  baseline). Older ``result.json`` files are interpreted identically.
- ``EnergyMeter.available`` still reflects NVML availability only.
- The new floor only applies to ``total_energy_J``.
- ``submit.py:append_record`` falls back to ``training_energy_J`` for
  result.json files without the new fields.

Tests
---
TDD'd with 7 new unit tests, 8 pre-existing tests preserved unmodified:
- ``test_energy_meter_total_is_gpu_plus_cpu`` (tracer)
- ``test_total_energy_enforces_wall_clock_floor`` (floor binds)
- ``test_default_cpu_backend_uses_codecarbon_when_installed`` (live)
- ``test_energy_meter_raises_when_gpu_available_but_cpu_missing``
- ``test_energy_meter_no_raise_when_cpu_present_but_gpu_missing``
- ``test_total_energy_none_when_only_one_backend_yields_value``
- ``test_energy_meter_dev_mode_no_raise_when_both_unavailable``

15/15 pass.

Followed up with the anthropics/claude-plugins-official ``code-simplifier``
agent for a clarity pass (dead None-checks removed, redundant
``except: self.available = False`` collapsed, long ternaries wrapped).

Leaderboard re-validation (per MAINTAINING.md)
---
This PR is itself a setup change, so every leaderboard row on dev is
re-run on the new harness before merging to main. All 11 rows landed
(PCIe unless noted):

| Submission              | gpu_J  | cpu_J  | total_J | acc    |
|-------------------------|-------:|-------:|--------:|-------:|
| subset_70_mkn           |  1,351 |  1,124 |   2,474 | 0.7031 |
| gpu_ngram_w31_k11       |  1,612 |  1,480 |   3,092 | 0.7050 |
| paq_mixer_v3            |  2,355 |  2,252 |   4,607 | 0.7048 |
| gpu_ngram_o14_xorfix    |  3,981 |  4,621 |   8,602 | 0.7184 |
| chunker_phase1_v1       |  5,570 |  4,021 |   9,591 | 0.7063 |
| deep_backoff_kn         |    963 | 12,338 |  14,578 | 0.7184 |
| lwta_k4_alpha_065 (SXM4)| 13,751 |  6,170 |  19,922 | 0.7328 |
| alpha_06          (SXM4)| 14,614 |  6,129 |  20,743 | 0.7390 |
| lwta_k4                 | 44,329 |  9,354 |  53,683 | 0.7246 |
| lwta_k2                 | 44,583 | 10,031 |  54,614 | 0.7145 |
| modded_nanogpt    (SXM4)| 51,729 | 10,277 |  62,006 | 0.7337 |

Headline: subset_70_mkn lands at 2,474 J total / 0.7031 PCIe — the
new clean J leader, 20% under gpu_ngram_w31_k11 (3,092 J / 0.7050) at
the same accuracy band. On the prior NVML-only metric those two were
a noise-floor tie; the CPU side resolves the tie cleanly because
subset_70_mkn's 70%-data trick also cuts the CPU work proportionally.

CPU-bound submissions rerank dramatically. deep_backoff_kn (prior
NVML: 2,236 J) now reports 14,578 J total — its CPU energy is 12.8×
its GPU reading because its n-gram tables are built single-threaded
on the host. Now visible at full cost on the leaderboard.

Open questions for maintainer review
---
- Floor value: 50 W (default) vs 100 W per GPU-slot fair share. 50 W
  is conservative; 100 W matches dual-EPYC-7763 + DRAM fair-share for
  an 8-GPU host. One-line change.
- Should ``total_energy_J`` be the new canonical ranking metric, or
  report both side-by-side?

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .gitignore                                   |   7 +
 MAINTAINING.md                               |  54 ++++
 README.md                                    |  14 +
 requirements.txt                             |   6 +
 run_eval.py                                  |   6 +
 submissions/alpha_06/nvml.json               |   8 +-
 submissions/alpha_06/result.json             |  18 +-
 submissions/alpha_06/run.log                 | 185 +++++------
 submissions/chunker_phase1_v1/nvml.json      |  10 +-
 submissions/chunker_phase1_v1/result.json    |  22 +-
 submissions/chunker_phase1_v1/run.log        | 187 +++++------
 submissions/deep_backoff_kn/nvml.json        |  10 +-
 submissions/deep_backoff_kn/result.json      |  27 +-
 submissions/deep_backoff_kn/run.log          | 208 ++++++-------
 submissions/gpu_ngram_o14_xorfix/nvml.json   |   8 +-
 submissions/gpu_ngram_o14_xorfix/result.json |  18 +-
 submissions/gpu_ngram_o14_xorfix/run.log     | 168 +++++-----
 submissions/gpu_ngram_w31_k11/nvml.json      |   8 +-
 submissions/gpu_ngram_w31_k11/result.json    |  18 +-
 submissions/gpu_ngram_w31_k11/run.log        | 308 +++++--------------
 submissions/lwta_k2/nvml.json                |   8 +-
 submissions/lwta_k2/result.json              |  18 +-
 submissions/lwta_k2/run.log                  | 203 ++++++------
 submissions/lwta_k4/nvml.json                |   8 +-
 submissions/lwta_k4/result.json              |  18 +-
 submissions/lwta_k4/run.log                  | 203 ++++++------
 submissions/lwta_k4_alpha_065/nvml.json      |  10 +-
 submissions/lwta_k4_alpha_065/result.json    |  22 +-
 submissions/lwta_k4_alpha_065/run.log        | 193 ++++++------
 submissions/modded_nanogpt/nvml.json         |  10 +-
 submissions/modded_nanogpt/result.json       |  22 +-
 submissions/modded_nanogpt/run.log           | 209 +++++++------
 submissions/paq_mixer_v3/nvml.json           |  10 +-
 submissions/paq_mixer_v3/result.json         |  22 +-
 submissions/paq_mixer_v3/run.log             | 174 ++++++-----
 submissions/subset_70_mkn/nvml.json          |  10 +-
 submissions/subset_70_mkn/result.json        |  22 +-
 submissions/subset_70_mkn/run.log            | 170 +++++-----
 submit.py                                    |  16 +-
 test_wikitext.py                             | 151 +++++++++
 wikitext.py                                  | 136 ++++++--
 41 files changed, 1555 insertions(+), 1370 deletions(-)
 create mode 100644 MAINTAINING.md

diff --git a/.gitignore b/.gitignore
index b20b8de..5a4b8a5 100644
--- a/.gitignore
+++ b/.gitignore
@@ -21,3 +21,10 @@ env/
 # Env / OS
 .env
 .DS_Store
+
+# Local dev notes / scratch — explainers, plans, idea logs, experiment journals.
+.scratch/
+
+# Internal slot-claim metadata written by claim_slot.sh for cross-session
+# coordination (session id + heartbeat). Not for upstream.
+submissions/*/.CLAIMED
diff --git a/MAINTAINING.md b/MAINTAINING.md
new file mode 100644
index 0000000..c68abd9
--- /dev/null
+++ b/MAINTAINING.md
@@ -0,0 +1,54 @@
+# Maintaining the leaderboard
+
+Notes for whoever has push access to `cybertronai/wikitext`.
+
+## Branching
+
+- **`main`** — stable. Every row of `README.md`'s Record History was scored
+  under the same setup.
+- **`dev`** — staging. Feature PRs (new submissions, new paradigms, harness
+  tweaks) target `dev` and merge as soon as review is green.
+- **`dev` → `main`** promotion PRs happen on a slower cadence, only when
+  `dev` is internally consistent (see re-run rule below).
+
+## The setup-change re-run rule
+
+If a PR changes anything that can move where existing submissions land on
+the leaderboard, the **prior leaderboard rows in `README.md` must be re-run
+on the new setup before that PR merges to `main`**. Otherwise the
+half-old/half-new comparison is meaningless.
+
+| Change | Triggers re-run? |
+|---|---|
+| `EnergyMeter` semantics, idle-baseline default, scoring formula | **Yes** |
+| Hardware pin (PCIe ↔ SXM4, A100 ↔ H100) | **Yes** |
+| `MAX_TRAIN_SECONDS`, `ACC_MIN`, eval window | **Yes** |
+| Container-image bump with numerical drift | **Maybe** — re-run if anything visibly drifts |
+| New submission, doc/typo, `.scratch/`, internal refactor | No |
+| Additive optional field on `result.json` (existing semantics intact) | No — but new field is `null` on old entries; mention in PR |
+
+When in doubt, re-run. ~$0.50/submission on Modal A100 is cheaper than a
+broken leaderboard.
+
+## Process
+
+1. Land the setup change on a branch (typically targeting `dev`); don't merge yet.
+2. Re-run the rows currently in `README.md`'s Record History on the new
+   harness — `python submit.py submissions/<slot> --yes`, fire in parallel
+   (Modal cap: 10 concurrent).
+3. When `result.json` files all reflect the new setup, append the re-run
+   rows to `README.md` (old rows stay as history) and add a dated banner
+   above the table noting the schema change.
+4. Restate the leaderboard table in the promotion PR body, confirming all
+   rows shown are under the new setup. Then merge.
+
+Don't: ship a half-new/half-old table; claim a new leader without re-running
+the priors; silently overwrite old `result.json` files without a banner in
+`README.md`.
+
+## Reference: setup-change events
+
+| Date | Change | PR | Re-ran upstream? |
+|---|---|---|---|
+| 2026-05-18 | Hardware pin: SXM4 → PCIe A100-80GB | (n/a) | partial — older SXM4 rows kept as history |
+| 2026-05-19 | `EnergyMeter` gains `cpu_energy_J` + `total_energy_J` via CodeCarbon | #4 | yes — `lwta_k2`, `lwta_k4`, `modded_nanogpt` re-run |
diff --git a/README.md b/README.md
index cd99cd5..ce37b6c 100644
--- a/README.md
+++ b/README.md
@@ -23,6 +23,8 @@ python submit.py submissions/modded_nanogpt
 
 ## Record History
 
+The `Energy (J)` column reports **`total_energy_J`** (GPU NVML net of idle baseline + CodeCarbon CPU estimate, floored at `duration_s × 50 W`) for rows dated **2026-05-20 and later**. Earlier rows report the prior NVML-only `training_energy_J`. The semantic change is the new total-system-energy rule per @yaroslavvb2's Telegram note; see `MAINTAINING.md` and the `EnergyMeter` source for details. Upstream-leaderboard rows from before the change have been re-run under the new harness — those re-runs appear below as the canonical entries for those submissions; the original rows are preserved for history.
+
 | Date | Energy (J) | Val char-acc | GPU | Config | Submission | Contributor |
 |------|-----------:|-------------:|-----|--------|------------|-------------|
 | 2026-05-12 |     51,704 | 0.7374    | A100 80GB PCIe | modded_nanogpt | [dir](submissions/modded_nanogpt) | @KellerJordan |
@@ -33,6 +35,9 @@ python submit.py submissions/modded_nanogpt
 | 2026-05-18 |      3,612 |       DQ | A100 80GB PCIe | chunker_d1       | [dir](research/catalog/new_directions/chunker_d1)       | @ab-10 |
 | 2026-05-18 |        735 |       DQ | A100 80GB PCIe | ppm_c            | [dir](research/catalog/new_directions/ppm_c)            | @ab-10 |
 | 2026-05-17 |         70 |       DQ | A100 80GB SXM4 | P2-A_random_projection | [dir](research/forward-forward-deep/runs/phase2/P2-A_random_projection) | @ab-10 |
+| 2026-05-20 |     53,683 | 0.7246    | A100 80GB PCIe | lwta_k4        | [dir](submissions/lwta_k4)        | @ab-10 (re-run on new harness; total_J = 44,329 gpu + 9,354 cpu) |
+| 2026-05-20 |     54,614 | 0.7145    | A100 80GB PCIe | lwta_k2        | [dir](submissions/lwta_k2)        | @ab-10 (re-run on new harness; total_J = 44,583 gpu + 10,031 cpu) |
+| 2026-05-20 |     66,747 |       DQ | A100 80GB SXM4 | modded_nanogpt | [dir](submissions/modded_nanogpt) | @ab-10 (re-run on new harness landed on SXM4 and hit 300 s cap; re-running) |
 
 
 ## Rules
@@ -64,3 +69,12 @@ For an internal-BPE submission, `predict()` returns `P(next_char | observed_char
 
 [^1]: More energy efficient
 [^2]: As of writing this
+| 2026-05-21 |      3,092 | 0.7050 | gpu_ngram_w31_k11 | [dir](submissions/gpu_ngram_w31_k11) | @follow-up-paq-prediction |
+| 2026-05-21 |      2,474 | 0.7031 | subset_70_mkn | [dir](submissions/subset_70_mkn) | @exp-batch-iter4 |
+| 2026-05-21 |      4,607 | 0.7047 | paq_mixer_v3 | [dir](submissions/paq_mixer_v3) | @worker-paq-mixer |
+| 2026-05-21 |      8,602 | 0.7184 | gpu_ngram_o14_xorfix | [dir](submissions/gpu_ngram_o14_xorfix) | @subagent-xorfix-2026-05-19 |
+| 2026-05-21 |     14,578 | 0.7184 | deep_backoff_kn | [dir](submissions/deep_backoff_kn) | @nakajimagabriel |
+| 2026-05-21 |      9,591 | 0.7063 | chunker_phase1_v1 | [dir](submissions/chunker_phase1_v1) | @explore-chunker-2026-05-19 |
+| 2026-05-21 |     19,922 | 0.7328 | lwta_k4_alpha_065 | [dir](submissions/lwta_k4_alpha_065) | @subagent-L2clean-2026-05-19 |
+| 2026-05-21 |     20,743 | 0.7390 | alpha_06 | [dir](submissions/alpha_06) | @subagent-xorfix-2026-05-19 |
+| 2026-05-21 |     62,006 | 0.7337 | modded_nanogpt | [dir](submissions/modded_nanogpt) | @ab-10 |
diff --git a/requirements.txt b/requirements.txt
index 0eef3e7..827b6f6 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -7,3 +7,9 @@ modal>=0.66
 # Optional: tests run with stdlib if pytest is missing, but `pytest
 # test_wikitext.py` gives nicer output.
 pytest
+# CodeCarbon: CPU energy estimation backend for EnergyMeter's
+# total_energy_J field. EnergyMeter reads ``tracker._total_cpu_energy``
+# after stop, which is internal to CodeCarbon — pin a minor range to
+# keep that path stable. Required on the leaderboard (raises if NVML is
+# available and this isn't); optional on dev boxes without a GPU.
+codecarbon~=3.2
diff --git a/run_eval.py b/run_eval.py
index 270332e..c9e5806 100644
--- a/run_eval.py
+++ b/run_eval.py
@@ -128,6 +128,8 @@ def main() -> None:
                 "max_train_seconds": args.max_train_seconds,
                 "training_energy_J": m.energy_joules if m is not None else None,
                 "training_duration_s": m.duration_s if m is not None else None,
+                "cpu_energy_J": m.cpu_energy_J if m is not None else None,
+                "total_energy_J": m.total_energy_J if m is not None else None,
                 "gpu_name": _gpu_name(),
                 "date_utc": _utc_now(),
             }
@@ -165,6 +167,8 @@ def main() -> None:
                 "val_chars": val_result.n_chars,
                 "training_energy_J": m.energy_joules,
                 "training_duration_s": m.duration_s,
+                "cpu_energy_J": m.cpu_energy_J,
+                "total_energy_J": m.total_energy_J,
                 "gpu_name": _gpu_name(),
                 "date_utc": _utc_now(),
             }
@@ -187,6 +191,8 @@ def main() -> None:
             "submission": submission_name,
             "training_energy_J": m.energy_joules,
             "training_duration_s": m.duration_s,
+            "cpu_energy_J": m.cpu_energy_J,
+            "total_energy_J": m.total_energy_J,
             "val_char_accuracy": val_result.accuracy,
             "val_chars": val_result.n_chars,
             "gpu_name": _gpu_name(),
diff --git a/submissions/alpha_06/nvml.json b/submissions/alpha_06/nvml.json
index 61cc54d..cb06cb5 100644
--- a/submissions/alpha_06/nvml.json
+++ b/submissions/alpha_06/nvml.json
@@ -2,10 +2,10 @@
   "nvml_available": true,
   "energy_counter_supported": true,
   "monotonic": true,
-  "idle_watts": 65.36740677966102,
-  "stress_watts_avg": 352.24618801540856,
-  "stress_energy_joules": 13133.406,
-  "stress_duration_s": 37.284735639000004,
+  "idle_watts": 63.85837288135594,
+  "stress_watts_avg": 352.31625628183394,
+  "stress_energy_joules": 13073.966,
+  "stress_duration_s": 37.108608436,
   "gpu_name": "NVIDIA A100-SXM4-80GB",
   "notes": []
 }
diff --git a/submissions/alpha_06/result.json b/submissions/alpha_06/result.json
index 5e95fea..b96b4ca 100644
--- a/submissions/alpha_06/result.json
+++ b/submissions/alpha_06/result.json
@@ -1,19 +1,21 @@
 {
   "submission": "alpha_06",
-  "training_energy_J": 14731.7458852,
-  "training_duration_s": 140.096942296,
-  "val_char_accuracy": 0.7405,
+  "training_energy_J": 14613.997913750001,
+  "training_duration_s": 144.859441725,
+  "cpu_energy_J": 6129.255896584997,
+  "total_energy_J": 20743.253810334998,
+  "val_char_accuracy": 0.7390333333333333,
   "val_chars": 60000,
   "gpu_name": "NVIDIA A100-SXM4-80GB",
-  "date_utc": "2026-05-20T01:55:05Z",
+  "date_utc": "2026-05-21T05:29:17Z",
   "_nvml": {
     "nvml_available": true,
     "energy_counter_supported": true,
     "monotonic": true,
-    "idle_watts": 65.36740677966102,
-    "stress_watts_avg": 352.24618801540856,
-    "stress_energy_joules": 13133.406,
-    "stress_duration_s": 37.284735639000004,
+    "idle_watts": 63.85837288135594,
+    "stress_watts_avg": 352.31625628183394,
+    "stress_energy_joules": 13073.966,
+    "stress_duration_s": 37.108608436,
     "gpu_name": "NVIDIA A100-SXM4-80GB",
     "notes": []
   },
diff --git a/submissions/alpha_06/run.log b/submissions/alpha_06/run.log
index 12d4a02..ecb9a9d 100644
--- a/submissions/alpha_06/run.log
+++ b/submissions/alpha_06/run.log
@@ -1,25 +1,25 @@
-# wikitext submit.py log — alpha_06 — 2026-05-20T01:45:46+00:00Z
+# wikitext submit.py log — alpha_06 — 2026-05-21T05:19:18+00:00Z
 [modal] launching A100-80GB ...
 ✓ Initialized. View run at 
-https://modal.com/apps/gabriel-nakajima-an/main/ap-Msdp1r91xRTCvRAxIaShM8
+https://modal.com/apps/gabriel-nakajima-an/main/ap-UUVKwxlYo8DV3G5hw1WK3q
 ✓ Created objects.
 ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/submit.py
+├── 🔨 Created mount /Users/naka/src/sutro/wikitext/verify_nvml.py
 ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/task.py
 ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/run_eval.py
-├── 🔨 Created mount /Users/naka/src/sutro/wikitext/verify_nvml.py
 ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/wikitext.py
 └── 🔨 Created function run_submission.
 [modal] verifying NVML energy counter ...
 GPU: NVIDIA A100-SXM4-80GB
 sampling idle power for 3s ...
-  idle: 65.4 W
+  idle: 63.9 W
 running 30s stress workload ...
-  duration:       37.3 s
-  energy delta:   13,133.4 J
-  avg power:      352.2 W
+  duration:       37.1 s
+  energy delta:   13,074.0 J
+  avg power:      352.3 W
   monotonic:      True
 ---
-{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 65.36740677966102, "stress_watts_avg": 352.24618801540856, "stress_energy_joules": 13133.406, "stress_duration_s": 37.284735639000004, "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": []}
+{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 63.85837288135594, "stress_watts_avg": 352.31625628183394, "stress_energy_joules": 13073.966, "stress_duration_s": 37.108608436, "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": []}
 [modal] running submission (TEST_CHARS=60000 MAX_TRAIN_SECONDS=300.0 ACC_MIN=0.7) ...
 loading WikiText-103 from /data ...
   train chars: 540,095,682
@@ -27,116 +27,119 @@ loading WikiText-103 from /data ...
 train wall-clock cap: 300 s
 val accuracy floor : 0.7000
 training submission /workspace/alpha_06.py ...
+[codecarbon WARNING @ 05:20:11] Multiple instances of codecarbon are allowed to run at the same time.
 [clean_w31] starting GPU KN build; max_order=12 D=0.5
 [clean_w31] top order=12 unique pairs: 157,942,722  2.5s
-[clean_w31] ctx_len=11 ctxs=119,285,712 24.2s
-[clean_w31] ctx_len=10 ctxs=84,282,364 17.4s
-[clean_w31] ctx_len=9 ctxs=54,720,376 11.1s
-[clean_w31] ctx_len=8 ctxs=31,924,091 6.6s
-[clean_w31] ctx_len=7 ctxs=16,284,921 3.5s
-[clean_w31] ctx_len=6 ctxs=7,016,442 1.7s
+[clean_w31] ctx_len=11 ctxs=119,285,712 25.5s
+[clean_w31] ctx_len=10 ctxs=84,282,364 17.6s
+[clean_w31] ctx_len=9 ctxs=54,720,376 12.0s
+[clean_w31] ctx_len=8 ctxs=31,924,091 7.1s
+[clean_w31] ctx_len=7 ctxs=16,284,921 3.6s
+[clean_w31] ctx_len=6 ctxs=7,016,442 1.6s
 [clean_w31] ctx_len=5 ctxs=2,438,281 0.6s
 [clean_w31] ctx_len=4 ctxs=637,143 0.1s
 [clean_w31] ctx_len=3 ctxs=122,882 0.0s
 [clean_w31] ctx_len=2 ctxs=12,282 0.0s
 [clean_w31] ctx_len=1 ctxs=204 0.0s
 [clean_w31] ctx_len=0 ctxs=1 0.0s
-[clean_w31] KN build done: 67.8s
+[clean_w31] KN build done: 70.7s
 [clean_w31] NN 3.29M params  cfg=TrainConfig(d=256 L=4 H=4 bs=32 T=1024 steps=1200)
 [clean_w31] NN step     0/1200  loss 5.5452  elapsed 1s
-[clean_w31] NN step   100/1200  loss 1.8056  elapsed 7s
-[clean_w31] NN step   200/1200  loss 1.4371  elapsed 12s
-[clean_w31] NN step   300/1200  loss 1.4222  elapsed 18s
-[clean_w31] NN step   400/1200  loss 1.3516  elapsed 24s
-[clean_w31] NN step   500/1200  loss 1.2951  elapsed 29s
-[clean_w31] NN step   600/1200  loss 1.2552  elapsed 35s
-[clean_w31] NN step   700/1200  loss 1.2157  elapsed 41s
-[clean_w31] NN step   800/1200  loss 1.1424  elapsed 46s
-[clean_w31] NN step   900/1200  loss 1.1424  elapsed 52s
-[clean_w31] NN step  1000/1200  loss 1.1414  elapsed 58s
-[clean_w31] NN step  1100/1200  loss 1.1226  elapsed 63s
-[clean_w31] NN step  1199/1200  loss 1.1011  elapsed 69s
-training: 14,731.7 J   duration=140.1s
+[clean_w31] NN step   100/1200  loss 1.7487  elapsed 7s
+[clean_w31] NN step   200/1200  loss 1.4373  elapsed 12s
+[clean_w31] NN step   300/1200  loss 1.3942  elapsed 18s
+[clean_w31] NN step   400/1200  loss 1.3113  elapsed 24s
+[clean_w31] NN step   500/1200  loss 1.3140  elapsed 30s
+[clean_w31] NN step   600/1200  loss 1.2792  elapsed 36s
+[clean_w31] NN step   700/1200  loss 1.2438  elapsed 41s
+[clean_w31] NN step   800/1200  loss 1.1484  elapsed 47s
+[clean_w31] NN step   900/1200  loss 1.1466  elapsed 53s
+[clean_w31] NN step  1000/1200  loss 1.1834  elapsed 59s
+[clean_w31] NN step  1100/1200  loss 1.1053  elapsed 65s
+[clean_w31] NN step  1199/1200  loss 1.1100  elapsed 71s
+training: 14,614.0 J   duration=144.9s
 evaluating on val split ...
-  eval      1,200/60,000 (  2.0%)  acc=0.7300      164 char/s  eta=   359s
-  eval      2,400/60,000 (  4.0%)  acc=0.7167      166 char/s  eta=   347s
-  eval      3,600/60,000 (  6.0%)  acc=0.7167      166 char/s  eta=   339s
-  eval      4,800/60,000 (  8.0%)  acc=0.7260      165 char/s  eta=   336s
-  eval      6,000/60,000 ( 10.0%)  acc=0.7230      156 char/s  eta=   346s
-  eval      7,200/60,000 ( 12.0%)  acc=0.7190      158 char/s  eta=   334s
-  eval      8,400/60,000 ( 14.0%)  acc=0.7189      159 char/s  eta=   325s
-  eval      9,600/60,000 ( 16.0%)  acc=0.7250      160 char/s  eta=   316s
-  eval     10,800/60,000 ( 18.0%)  acc=0.7304      160 char/s  eta=   307s
-  eval     12,000/60,000 ( 20.0%)  acc=0.7304      161 char/s  eta=   298s
-  eval     13,200/60,000 ( 22.0%)  acc=0.7347      161 char/s  eta=   290s
-  eval     14,400/60,000 ( 24.0%)  acc=0.7361      162 char/s  eta=   282s
-  eval     15,600/60,000 ( 26.0%)  acc=0.7383      162 char/s  eta=   274s
-  eval     16,800/60,000 ( 28.0%)  acc=0.7412      162 char/s  eta=   266s
-  eval     18,000/60,000 ( 30.0%)  acc=0.7422      163 char/s  eta=   258s
-  eval     19,200/60,000 ( 32.0%)  acc=0.7455      163 char/s  eta=   250s
-  eval     20,400/60,000 ( 34.0%)  acc=0.7473      163 char/s  eta=   243s
-  eval     21,600/60,000 ( 36.0%)  acc=0.7475      163 char/s  eta=   235s
-  eval     22,800/60,000 ( 38.0%)  acc=0.7479      163 char/s  eta=   228s
-  eval     24,000/60,000 ( 40.0%)  acc=0.7473      163 char/s  eta=   220s
-  eval     25,200/60,000 ( 42.0%)  acc=0.7475      164 char/s  eta=   213s
-  eval     26,400/60,000 ( 44.0%)  acc=0.7485      164 char/s  eta=   205s
-  eval     27,600/60,000 ( 46.0%)  acc=0.7479      164 char/s  eta=   198s
-  eval     28,800/60,000 ( 48.0%)  acc=0.7487      164 char/s  eta=   190s
-  eval     30,000/60,000 ( 50.0%)  acc=0.7482      164 char/s  eta=   183s
-  eval     31,200/60,000 ( 52.0%)  acc=0.7457      164 char/s  eta=   176s
-  eval     32,400/60,000 ( 54.0%)  acc=0.7447      164 char/s  eta=   169s
-  eval     33,600/60,000 ( 56.0%)  acc=0.7423      162 char/s  eta=   163s
-  eval     34,800/60,000 ( 58.0%)  acc=0.7427      161 char/s  eta=   156s
-  eval     36,000/60,000 ( 60.0%)  acc=0.7429      161 char/s  eta=   149s
-  eval     37,200/60,000 ( 62.0%)  acc=0.7428      161 char/s  eta=   141s
-  eval     38,400/60,000 ( 64.0%)  acc=0.7429      161 char/s  eta=   134s
-  eval     39,600/60,000 ( 66.0%)  acc=0.7424      161 char/s  eta=   126s
-  eval     40,800/60,000 ( 68.0%)  acc=0.7417      162 char/s  eta=   119s
-  eval     42,000/60,000 ( 70.0%)  acc=0.7409      162 char/s  eta=   111s
-  eval     43,200/60,000 ( 72.0%)  acc=0.7410      162 char/s  eta=   104s
-  eval     44,400/60,000 ( 74.0%)  acc=0.7407      162 char/s  eta=    96s
-  eval     45,600/60,000 ( 76.0%)  acc=0.7405      162 char/s  eta=    89s
-  eval     46,800/60,000 ( 78.0%)  acc=0.7397      162 char/s  eta=    81s
-  eval     48,000/60,000 ( 80.0%)  acc=0.7398      162 char/s  eta=    74s
-  eval     49,200/60,000 ( 82.0%)  acc=0.7395      163 char/s  eta=    66s
-  eval     50,400/60,000 ( 84.0%)  acc=0.7402      163 char/s  eta=    59s
-  eval     51,600/60,000 ( 86.0%)  acc=0.7403      163 char/s  eta=    52s
-  eval     52,800/60,000 ( 88.0%)  acc=0.7398      163 char/s  eta=    44s
-  eval     54,000/60,000 ( 90.0%)  acc=0.7397      163 char/s  eta=    37s
-  eval     55,200/60,000 ( 92.0%)  acc=0.7389      163 char/s  eta=    29s
-  eval     56,400/60,000 ( 94.0%)  acc=0.7387      163 char/s  eta=    22s
-  eval     57,600/60,000 ( 96.0%)  acc=0.7390      163 char/s  eta=    15s
-  eval     58,800/60,000 ( 98.0%)  acc=0.7397      163 char/s  eta=     7s
-  eval     60,000/60,000 (100.0%)  acc=0.7405      163 char/s  eta=     0s
-chars=60,000  acc=0.7405  eval_duration=368.2s
+  eval      1,200/60,000 (  2.0%)  acc=0.7258      160 char/s  eta=   367s
+  eval      2,400/60,000 (  4.0%)  acc=0.7146      161 char/s  eta=   358s
+  eval      3,600/60,000 (  6.0%)  acc=0.7156      161 char/s  eta=   350s
+  eval      4,800/60,000 (  8.0%)  acc=0.7248      160 char/s  eta=   344s
+  eval      6,000/60,000 ( 10.0%)  acc=0.7212      160 char/s  eta=   337s
+  eval      7,200/60,000 ( 12.0%)  acc=0.7168      159 char/s  eta=   332s
+  eval      8,400/60,000 ( 14.0%)  acc=0.7161      159 char/s  eta=   325s
+  eval      9,600/60,000 ( 16.0%)  acc=0.7234      158 char/s  eta=   318s
+  eval     10,800/60,000 ( 18.0%)  acc=0.7300      158 char/s  eta=   311s
+  eval     12,000/60,000 ( 20.0%)  acc=0.7307      158 char/s  eta=   304s
+  eval     13,200/60,000 ( 22.0%)  acc=0.7347      155 char/s  eta=   302s
+  eval     14,400/60,000 ( 24.0%)  acc=0.7371      153 char/s  eta=   298s
+  eval     15,600/60,000 ( 26.0%)  acc=0.7385      153 char/s  eta=   290s
+  eval     16,800/60,000 ( 28.0%)  acc=0.7408      151 char/s  eta=   286s
+  eval     18,000/60,000 ( 30.0%)  acc=0.7421      151 char/s  eta=   278s
+  eval     19,200/60,000 ( 32.0%)  acc=0.7450      152 char/s  eta=   269s
+  eval     20,400/60,000 ( 34.0%)  acc=0.7468      152 char/s  eta=   260s
+  eval     21,600/60,000 ( 36.0%)  acc=0.7468      152 char/s  eta=   252s
+  eval     22,800/60,000 ( 38.0%)  acc=0.7466      153 char/s  eta=   243s
+  eval     24,000/60,000 ( 40.0%)  acc=0.7462      153 char/s  eta=   235s
+  eval     25,200/60,000 ( 42.0%)  acc=0.7463      154 char/s  eta=   226s
+  eval     26,400/60,000 ( 44.0%)  acc=0.7470      154 char/s  eta=   218s
+  eval     27,600/60,000 ( 46.0%)  acc=0.7465      155 char/s  eta=   210s
+  eval     28,800/60,000 ( 48.0%)  acc=0.7469      155 char/s  eta=   202s
+  eval     30,000/60,000 ( 50.0%)  acc=0.7458      153 char/s  eta=   196s
+  eval     31,200/60,000 ( 52.0%)  acc=0.7433      153 char/s  eta=   189s
+  eval     32,400/60,000 ( 54.0%)  acc=0.7421      151 char/s  eta=   183s
+  eval     33,600/60,000 ( 56.0%)  acc=0.7401      149 char/s  eta=   177s
+  eval     34,800/60,000 ( 58.0%)  acc=0.7405      147 char/s  eta=   172s
+  eval     36,000/60,000 ( 60.0%)  acc=0.7405      147 char/s  eta=   163s
+  eval     37,200/60,000 ( 62.0%)  acc=0.7408      147 char/s  eta=   155s
+  eval     38,400/60,000 ( 64.0%)  acc=0.7405      148 char/s  eta=   146s
+  eval     39,600/60,000 ( 66.0%)  acc=0.7404      148 char/s  eta=   138s
+  eval     40,800/60,000 ( 68.0%)  acc=0.7398      148 char/s  eta=   130s
+  eval     42,000/60,000 ( 70.0%)  acc=0.7391      148 char/s  eta=   121s
+  eval     43,200/60,000 ( 72.0%)  acc=0.7390      148 char/s  eta=   113s
+  eval     44,400/60,000 ( 74.0%)  acc=0.7387      149 char/s  eta=   105s
+  eval     45,600/60,000 ( 76.0%)  acc=0.7386      149 char/s  eta=    97s
+  eval     46,800/60,000 ( 78.0%)  acc=0.7379      149 char/s  eta=    89s
+  eval     48,000/60,000 ( 80.0%)  acc=0.7381      149 char/s  eta=    80s
+  eval     49,200/60,000 ( 82.0%)  acc=0.7378      149 char/s  eta=    72s
+  eval     50,400/60,000 ( 84.0%)  acc=0.7388      150 char/s  eta=    64s
+  eval     51,600/60,000 ( 86.0%)  acc=0.7390      150 char/s  eta=    56s
+  eval     52,800/60,000 ( 88.0%)  acc=0.7381      150 char/s  eta=    48s
+  eval     54,000/60,000 ( 90.0%)  acc=0.7381      150 char/s  eta=    40s
+  eval     55,200/60,000 ( 92.0%)  acc=0.7372      150 char/s  eta=    32s
+  eval     56,400/60,000 ( 94.0%)  acc=0.7371      150 char/s  eta=    24s
+  eval     57,600/60,000 ( 96.0%)  acc=0.7375      150 char/s  eta=    16s
+  eval     58,800/60,000 ( 98.0%)  acc=0.7383      151 char/s  eta=     8s
+  eval     60,000/60,000 (100.0%)  acc=0.7390      151 char/s  eta=     0s
+chars=60,000  acc=0.7390  eval_duration=397.6s
 ---
 submission         : alpha_06
-training energy (J): 14,731.7
-training duration  : 140.1s
-val  char-accuracy : 0.7405
+training energy (J): 14,614.0
+training duration  : 144.9s
+val  char-accuracy : 0.7390
 val  chars         : 60,000
 wrote /tmp/result.json
 Stopping app - local entrypoint completed.
 ✓ App completed. View run at 
-https://modal.com/apps/gabriel-nakajima-an/main/ap-Msdp1r91xRTCvRAxIaShM8
+https://modal.com/apps/gabriel-nakajima-an/main/ap-UUVKwxlYo8DV3G5hw1WK3q
 
 # final result
 {
   "submission": "alpha_06",
-  "training_energy_J": 14731.7458852,
-  "training_duration_s": 140.096942296,
-  "val_char_accuracy": 0.7405,
+  "training_energy_J": 14613.997913750001,
+  "training_duration_s": 144.859441725,
+  "cpu_energy_J": 6129.255896584997,
+  "total_energy_J": 20743.253810334998,
+  "val_char_accuracy": 0.7390333333333333,
   "val_chars": 60000,
   "gpu_name": "NVIDIA A100-SXM4-80GB",
-  "date_utc": "2026-05-20T01:55:05Z",
+  "date_utc": "2026-05-21T05:29:17Z",
   "_nvml": {
     "nvml_available": true,
     "energy_counter_supported": true,
     "monotonic": true,
-    "idle_watts": 65.36740677966102,
-    "stress_watts_avg": 352.24618801540856,
-    "stress_energy_joules": 13133.406,
-    "stress_duration_s": 37.284735639000004,
+    "idle_watts": 63.85837288135594,
+    "stress_watts_avg": 352.31625628183394,
+    "stress_energy_joules": 13073.966,
+    "stress_duration_s": 37.108608436,
     "gpu_name": "NVIDIA A100-SXM4-80GB",
     "notes": []
   },
diff --git a/submissions/chunker_phase1_v1/nvml.json b/submissions/chunker_phase1_v1/nvml.json
index 08b4a9f..ed2d072 100644
--- a/submissions/chunker_phase1_v1/nvml.json
+++ b/submissions/chunker_phase1_v1/nvml.json
@@ -2,10 +2,10 @@
   "nvml_available": true,
   "energy_counter_supported": true,
   "monotonic": true,
-  "idle_watts": 62.56199999999998,
-  "stress_watts_avg": 330.37581396177535,
-  "stress_energy_joules": 12495.641,
-  "stress_duration_s": 37.822505377,
-  "gpu_name": "NVIDIA A100-SXM4-80GB",
+  "idle_watts": 52.40838333333333,
+  "stress_watts_avg": 227.61565140099268,
+  "stress_energy_joules": 8488.717,
+  "stress_duration_s": 37.29408302,
+  "gpu_name": "NVIDIA A100 80GB PCIe",
   "notes": []
 }
diff --git a/submissions/chunker_phase1_v1/result.json b/submissions/chunker_phase1_v1/result.json
index d6d16a9..d95dc4f 100644
--- a/submissions/chunker_phase1_v1/result.json
+++ b/submissions/chunker_phase1_v1/result.json
@@ -1,20 +1,22 @@
 {
   "submission": "chunker_phase1_v1",
-  "training_energy_J": 5917.810853299999,
-  "training_duration_s": 98.94530293400001,
-  "val_char_accuracy": 0.7057333333333333,
+  "training_energy_J": 5569.715063649999,
+  "training_duration_s": 95.017418727,
+  "cpu_energy_J": 4020.9563191850025,
+  "total_energy_J": 9590.671382835002,
+  "val_char_accuracy": 0.7063,
   "val_chars": 60000,
-  "gpu_name": "NVIDIA A100-SXM4-80GB",
-  "date_utc": "2026-05-20T02:02:50Z",
+  "gpu_name": "NVIDIA A100 80GB PCIe",
+  "date_utc": "2026-05-21T05:27:51Z",
   "_nvml": {
     "nvml_available": true,
     "energy_counter_supported": true,
     "monotonic": true,
-    "idle_watts": 62.56199999999998,
-    "stress_watts_avg": 330.37581396177535,
-    "stress_energy_joules": 12495.641,
-    "stress_duration_s": 37.822505377,
-    "gpu_name": "NVIDIA A100-SXM4-80GB",
+    "idle_watts": 52.40838333333333,
+    "stress_watts_avg": 227.61565140099268,
+    "stress_energy_joules": 8488.717,
+    "stress_duration_s": 37.29408302,
+    "gpu_name": "NVIDIA A100 80GB PCIe",
     "notes": []
   },
   "contributor": "@explore-chunker-2026-05-19"
diff --git a/submissions/chunker_phase1_v1/run.log b/submissions/chunker_phase1_v1/run.log
index da1fdf5..e20d274 100644
--- a/submissions/chunker_phase1_v1/run.log
+++ b/submissions/chunker_phase1_v1/run.log
@@ -1,7 +1,7 @@
-# wikitext submit.py log — chunker_phase1_v1 — 2026-05-20T01:54:03+00:00Z
+# wikitext submit.py log — chunker_phase1_v1 — 2026-05-21T05:19:18+00:00Z
 [modal] launching A100-80GB ...
 ✓ Initialized. View run at 
-https://modal.com/apps/gabriel-nakajima-an/main/ap-cKJi20KhEnnNVSFIPgeSmH
+https://modal.com/apps/gabriel-nakajima-an/main/ap-FdmB8quO8669PzaHLSgb5f
 ✓ Created objects.
 ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/submit.py
 ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/task.py
@@ -10,16 +10,16 @@ https://modal.com/apps/gabriel-nakajima-an/main/ap-cKJi20KhEnnNVSFIPgeSmH
 ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/wikitext.py
 └── 🔨 Created function run_submission.
 [modal] verifying NVML energy counter ...
-GPU: NVIDIA A100-SXM4-80GB
+GPU: NVIDIA A100 80GB PCIe
 sampling idle power for 3s ...
-  idle: 62.6 W
+  idle: 52.4 W
 running 30s stress workload ...
-  duration:       37.8 s
-  energy delta:   12,495.6 J
-  avg power:      330.4 W
+  duration:       37.3 s
+  energy delta:   8,488.7 J
+  avg power:      227.6 W
   monotonic:      True
 ---
-{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 62.56199999999998, "stress_watts_avg": 330.37581396177535, "stress_energy_joules": 12495.641, "stress_duration_s": 37.822505377, "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": []}
+{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 52.40838333333333, "stress_watts_avg": 227.61565140099268, "stress_energy_joules": 8488.717, "stress_duration_s": 37.29408302, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": []}
 [modal] running submission (TEST_CHARS=60000 MAX_TRAIN_SECONDS=300.0 ACC_MIN=0.7) ...
 loading WikiText-103 from /data ...
   train chars: 540,095,682
@@ -27,116 +27,119 @@ loading WikiText-103 from /data ...
 train wall-clock cap: 300 s
 val accuracy floor : 0.7000
 training submission /workspace/chunker_phase1_v1.py ...
+[codecarbon WARNING @ 05:20:11] Multiple instances of codecarbon are allowed to run at the same time.
 [chunker] starting GPU KN build; max_order=12 D=0.5
-[chunker] top order=12 unique pairs: 157,942,722  2.6s
-[chunker] ctx_len=11 ctxs=119,285,712 17.4s
-[chunker] ctx_len=10 ctxs=84,282,364 12.4s
-[chunker] ctx_len=9 ctxs=54,720,376 7.7s
-[chunker] ctx_len=8 ctxs=31,924,091 4.6s
-[chunker] ctx_len=7 ctxs=16,284,921 2.5s
-[chunker] ctx_len=6 ctxs=7,016,442 1.2s
-[chunker] ctx_len=5 ctxs=2,438,281 0.5s
+[chunker] top order=12 unique pairs: 157,942,722  2.7s
+[chunker] ctx_len=11 ctxs=119,285,712 15.6s
+[chunker] ctx_len=10 ctxs=84,282,364 12.9s
+[chunker] ctx_len=9 ctxs=54,720,376 8.7s
+[chunker] ctx_len=8 ctxs=31,924,091 5.3s
+[chunker] ctx_len=7 ctxs=16,284,921 2.4s
+[chunker] ctx_len=6 ctxs=7,016,442 1.1s
+[chunker] ctx_len=5 ctxs=2,438,281 0.6s
 [chunker] ctx_len=4 ctxs=637,143 0.1s
 [chunker] ctx_len=3 ctxs=122,882 0.0s
 [chunker] ctx_len=2 ctxs=12,282 0.0s
 [chunker] ctx_len=1 ctxs=204 0.0s
 [chunker] ctx_len=0 ctxs=1 0.0s
-[chunker] KN build done: 49.0s
+[chunker] KN build done: 49.4s
 [chunker] computing surprise mask (tau=0.3) ...
 [chunker] surprise pass k_ctx=4 done
-[chunker] surprise computed in 2.6s: p_s = 0.4351 (235,445,737/541,096,898)
+[chunker] surprise computed in 2.7s: p_s = 0.4351 (235,445,737/541,096,898)
 [chunker] H model: 1.88M params, surprise positions: 235,445,737/541,096,898 (43.5%)
 [chunker] H step     0/800  loss 5.5452  elapsed 1s
-[chunker] H step   100/800  loss 2.7588  elapsed 6s
-[chunker] H step   200/800  loss 2.6080  elapsed 12s
-[chunker] H step   300/800  loss 2.4467  elapsed 17s
-[chunker] H step   400/800  loss 2.3904  elapsed 22s
-[chunker] H step   500/800  loss 2.3457  elapsed 28s
-[chunker] H step   600/800  loss 2.3157  elapsed 33s
-[chunker] H step   700/800  loss 2.2688  elapsed 38s
-[chunker] H step   799/800  loss 2.2480  elapsed 44s
-training: 5,917.8 J   duration=98.9s
+[chunker] H step   100/800  loss 2.7771  elapsed 6s
+[chunker] H step   200/800  loss 2.5867  elapsed 11s
+[chunker] H step   300/800  loss 2.4754  elapsed 15s
+[chunker] H step   400/800  loss 2.4454  elapsed 20s
+[chunker] H step   500/800  loss 2.3815  elapsed 25s
+[chunker] H step   600/800  loss 2.3375  elapsed 30s
+[chunker] H step   700/800  loss 2.3123  elapsed 35s
+[chunker] H step   799/800  loss 2.2933  elapsed 40s
+training: 5,569.7 J   duration=95.0s
 evaluating on val split ...
-  eval      1,200/60,000 (  2.0%)  acc=0.6925      159 char/s  eta=   370s
-  eval      2,400/60,000 (  4.0%)  acc=0.6767      160 char/s  eta=   361s
-  eval      3,600/60,000 (  6.0%)  acc=0.6753      160 char/s  eta=   353s
-  eval      4,800/60,000 (  8.0%)  acc=0.6887      160 char/s  eta=   345s
-  eval      6,000/60,000 ( 10.0%)  acc=0.6885      160 char/s  eta=   338s
-  eval      7,200/60,000 ( 12.0%)  acc=0.6831      160 char/s  eta=   331s
-  eval      8,400/60,000 ( 14.0%)  acc=0.6821      160 char/s  eta=   323s
-  eval      9,600/60,000 ( 16.0%)  acc=0.6892      160 char/s  eta=   316s
-  eval     10,800/60,000 ( 18.0%)  acc=0.6975      160 char/s  eta=   308s
-  eval     12,000/60,000 ( 20.0%)  acc=0.6993      160 char/s  eta=   301s
-  eval     13,200/60,000 ( 22.0%)  acc=0.7031      160 char/s  eta=   293s
-  eval     14,400/60,000 ( 24.0%)  acc=0.7050      160 char/s  eta=   286s
-  eval     15,600/60,000 ( 26.0%)  acc=0.7069      160 char/s  eta=   278s
-  eval     16,800/60,000 ( 28.0%)  acc=0.7104      160 char/s  eta=   271s
-  eval     18,000/60,000 ( 30.0%)  acc=0.7131      160 char/s  eta=   263s
-  eval     19,200/60,000 ( 32.0%)  acc=0.7177      160 char/s  eta=   255s
-  eval     20,400/60,000 ( 34.0%)  acc=0.7195      160 char/s  eta=   248s
-  eval     21,600/60,000 ( 36.0%)  acc=0.7201      160 char/s  eta=   240s
-  eval     22,800/60,000 ( 38.0%)  acc=0.7203      160 char/s  eta=   233s
-  eval     24,000/60,000 ( 40.0%)  acc=0.7200      160 char/s  eta=   225s
-  eval     25,200/60,000 ( 42.0%)  acc=0.7206      160 char/s  eta=   218s
-  eval     26,400/60,000 ( 44.0%)  acc=0.7215      160 char/s  eta=   210s
-  eval     27,600/60,000 ( 46.0%)  acc=0.7197      160 char/s  eta=   203s
-  eval     28,800/60,000 ( 48.0%)  acc=0.7196      160 char/s  eta=   195s
-  eval     30,000/60,000 ( 50.0%)  acc=0.7183      160 char/s  eta=   188s
-  eval     31,200/60,000 ( 52.0%)  acc=0.7150      160 char/s  eta=   180s
-  eval     32,400/60,000 ( 54.0%)  acc=0.7129      160 char/s  eta=   173s
-  eval     33,600/60,000 ( 56.0%)  acc=0.7103      160 char/s  eta=   165s
-  eval     34,800/60,000 ( 58.0%)  acc=0.7107      160 char/s  eta=   158s
-  eval     36,000/60,000 ( 60.0%)  acc=0.7107      160 char/s  eta=   150s
-  eval     37,200/60,000 ( 62.0%)  acc=0.7109      159 char/s  eta=   143s
-  eval     38,400/60,000 ( 64.0%)  acc=0.7111      159 char/s  eta=   136s
-  eval     39,600/60,000 ( 66.0%)  acc=0.7101      159 char/s  eta=   128s
-  eval     40,800/60,000 ( 68.0%)  acc=0.7096      159 char/s  eta=   121s
-  eval     42,000/60,000 ( 70.0%)  acc=0.7085      159 char/s  eta=   113s
-  eval     43,200/60,000 ( 72.0%)  acc=0.7078      159 char/s  eta=   106s
-  eval     44,400/60,000 ( 74.0%)  acc=0.7078      159 char/s  eta=    98s
-  eval     45,600/60,000 ( 76.0%)  acc=0.7075      159 char/s  eta=    91s
-  eval     46,800/60,000 ( 78.0%)  acc=0.7068      159 char/s  eta=    83s
-  eval     48,000/60,000 ( 80.0%)  acc=0.7066      159 char/s  eta=    75s
-  eval     49,200/60,000 ( 82.0%)  acc=0.7058      159 char/s  eta=    68s
-  eval     50,400/60,000 ( 84.0%)  acc=0.7060      159 char/s  eta=    60s
-  eval     51,600/60,000 ( 86.0%)  acc=0.7060      159 char/s  eta=    53s
-  eval     52,800/60,000 ( 88.0%)  acc=0.7046      159 char/s  eta=    45s
-  eval     54,000/60,000 ( 90.0%)  acc=0.7045      159 char/s  eta=    38s
-  eval     55,200/60,000 ( 92.0%)  acc=0.7040      159 char/s  eta=    30s
-  eval     56,400/60,000 ( 94.0%)  acc=0.7034      159 char/s  eta=    23s
-  eval     57,600/60,000 ( 96.0%)  acc=0.7038      159 char/s  eta=    15s
-  eval     58,800/60,000 ( 98.0%)  acc=0.7044      159 char/s  eta=     8s
-  eval     60,000/60,000 (100.0%)  acc=0.7057      159 char/s  eta=     0s
-chars=60,000  acc=0.7057  eval_duration=376.7s
+  eval      1,200/60,000 (  2.0%)  acc=0.6908      172 char/s  eta=   342s
+  eval      2,400/60,000 (  4.0%)  acc=0.6767      172 char/s  eta=   335s
+  eval      3,600/60,000 (  6.0%)  acc=0.6742      167 char/s  eta=   338s
+  eval      4,800/60,000 (  8.0%)  acc=0.6879      166 char/s  eta=   333s
+  eval      6,000/60,000 ( 10.0%)  acc=0.6875      165 char/s  eta=   327s
+  eval      7,200/60,000 ( 12.0%)  acc=0.6833      165 char/s  eta=   319s
+  eval      8,400/60,000 ( 14.0%)  acc=0.6813      166 char/s  eta=   312s
+  eval      9,600/60,000 ( 16.0%)  acc=0.6887      166 char/s  eta=   304s
+  eval     10,800/60,000 ( 18.0%)  acc=0.6974      166 char/s  eta=   296s
+  eval     12,000/60,000 ( 20.0%)  acc=0.6994      166 char/s  eta=   289s
+  eval     13,200/60,000 ( 22.0%)  acc=0.7030      166 char/s  eta=   282s
+  eval     14,400/60,000 ( 24.0%)  acc=0.7051      166 char/s  eta=   275s
+  eval     15,600/60,000 ( 26.0%)  acc=0.7069      166 char/s  eta=   267s
+  eval     16,800/60,000 ( 28.0%)  acc=0.7100      166 char/s  eta=   260s
+  eval     18,000/60,000 ( 30.0%)  acc=0.7128      166 char/s  eta=   253s
+  eval     19,200/60,000 ( 32.0%)  acc=0.7178      166 char/s  eta=   246s
+  eval     20,400/60,000 ( 34.0%)  acc=0.7195      166 char/s  eta=   239s
+  eval     21,600/60,000 ( 36.0%)  acc=0.7203      166 char/s  eta=   231s
+  eval     22,800/60,000 ( 38.0%)  acc=0.7206      166 char/s  eta=   224s
+  eval     24,000/60,000 ( 40.0%)  acc=0.7207      166 char/s  eta=   217s
+  eval     25,200/60,000 ( 42.0%)  acc=0.7212      166 char/s  eta=   210s
+  eval     26,400/60,000 ( 44.0%)  acc=0.7223      166 char/s  eta=   203s
+  eval     27,600/60,000 ( 46.0%)  acc=0.7203      166 char/s  eta=   195s
+  eval     28,800/60,000 ( 48.0%)  acc=0.7201      166 char/s  eta=   188s
+  eval     30,000/60,000 ( 50.0%)  acc=0.7186      166 char/s  eta=   181s
+  eval     31,200/60,000 ( 52.0%)  acc=0.7152      166 char/s  eta=   174s
+  eval     32,400/60,000 ( 54.0%)  acc=0.7130      166 char/s  eta=   166s
+  eval     33,600/60,000 ( 56.0%)  acc=0.7103      166 char/s  eta=   159s
+  eval     34,800/60,000 ( 58.0%)  acc=0.7110      166 char/s  eta=   152s
+  eval     36,000/60,000 ( 60.0%)  acc=0.7110      166 char/s  eta=   145s
+  eval     37,200/60,000 ( 62.0%)  acc=0.7113      166 char/s  eta=   137s
+  eval     38,400/60,000 ( 64.0%)  acc=0.7115      166 char/s  eta=   130s
+  eval     39,600/60,000 ( 66.0%)  acc=0.7106      166 char/s  eta=   123s
+  eval     40,800/60,000 ( 68.0%)  acc=0.7101      166 char/s  eta=   115s
+  eval     42,000/60,000 ( 70.0%)  acc=0.7090      166 char/s  eta=   108s
+  eval     43,200/60,000 ( 72.0%)  acc=0.7082      166 char/s  eta=   101s
+  eval     44,400/60,000 ( 74.0%)  acc=0.7082      167 char/s  eta=    94s
+  eval     45,600/60,000 ( 76.0%)  acc=0.7081      166 char/s  eta=    87s
+  eval     46,800/60,000 ( 78.0%)  acc=0.7075      166 char/s  eta=    79s
+  eval     48,000/60,000 ( 80.0%)  acc=0.7072      166 char/s  eta=    72s
+  eval     49,200/60,000 ( 82.0%)  acc=0.7065      167 char/s  eta=    65s
+  eval     50,400/60,000 ( 84.0%)  acc=0.7068      167 char/s  eta=    58s
+  eval     51,600/60,000 ( 86.0%)  acc=0.7067      167 char/s  eta=    50s
+  eval     52,800/60,000 ( 88.0%)  acc=0.7051      166 char/s  eta=    43s
+  eval     54,000/60,000 ( 90.0%)  acc=0.7049      166 char/s  eta=    36s
+  eval     55,200/60,000 ( 92.0%)  acc=0.7045      166 char/s  eta=    29s
+  eval     56,400/60,000 ( 94.0%)  acc=0.7039      166 char/s  eta=    22s
+  eval     57,600/60,000 ( 96.0%)  acc=0.7044      166 char/s  eta=    14s
+  eval     58,800/60,000 ( 98.0%)  acc=0.7051      166 char/s  eta=     7s
+  eval     60,000/60,000 (100.0%)  acc=0.7063      166 char/s  eta=     0s
+chars=60,000  acc=0.7063  eval_duration=360.4s
 ---
 submission         : chunker_phase1_v1
-training energy (J): 5,917.8
-training duration  : 98.9s
-val  char-accuracy : 0.7057
+training energy (J): 5,569.7
+training duration  : 95.0s
+val  char-accuracy : 0.7063
 val  chars         : 60,000
 wrote /tmp/result.json
 Stopping app - local entrypoint completed.
 ✓ App completed. View run at 
-https://modal.com/apps/gabriel-nakajima-an/main/ap-cKJi20KhEnnNVSFIPgeSmH
+https://modal.com/apps/gabriel-nakajima-an/main/ap-FdmB8quO8669PzaHLSgb5f
 
 # final result
 {
   "submission": "chunker_phase1_v1",
-  "training_energy_J": 5917.810853299999,
-  "training_duration_s": 98.94530293400001,
-  "val_char_accuracy": 0.7057333333333333,
+  "training_energy_J": 5569.715063649999,
+  "training_duration_s": 95.017418727,
+  "cpu_energy_J": 4020.9563191850025,
+  "total_energy_J": 9590.671382835002,
+  "val_char_accuracy": 0.7063,
   "val_chars": 60000,
-  "gpu_name": "NVIDIA A100-SXM4-80GB",
-  "date_utc": "2026-05-20T02:02:50Z",
+  "gpu_name": "NVIDIA A100 80GB PCIe",
+  "date_utc": "2026-05-21T05:27:51Z",
   "_nvml": {
     "nvml_available": true,
     "energy_counter_supported": true,
     "monotonic": true,
-    "idle_watts": 62.56199999999998,
-    "stress_watts_avg": 330.37581396177535,
-    "stress_energy_joules": 12495.641,
-    "stress_duration_s": 37.822505377,
-    "gpu_name": "NVIDIA A100-SXM4-80GB",
+    "idle_watts": 52.40838333333333,
+    "stress_watts_avg": 227.61565140099268,
+    "stress_energy_joules": 8488.717,
+    "stress_duration_s": 37.29408302,
+    "gpu_name": "NVIDIA A100 80GB PCIe",
     "notes": []
   },
   "contributor": "@explore-chunker-2026-05-19"
diff --git a/submissions/deep_backoff_kn/nvml.json b/submissions/deep_backoff_kn/nvml.json
index d56941e..a93be86 100644
--- a/submissions/deep_backoff_kn/nvml.json
+++ b/submissions/deep_backoff_kn/nvml.json
@@ -2,10 +2,10 @@
   "nvml_available": true,
   "energy_counter_supported": true,
   "monotonic": true,
-  "idle_watts": 63.14000000000002,
-  "stress_watts_avg": 339.34362349669493,
-  "stress_energy_joules": 12477.219,
-  "stress_duration_s": 36.768685592,
-  "gpu_name": "NVIDIA A100-SXM4-80GB",
+  "idle_watts": 52.31990000000003,
+  "stress_watts_avg": 231.05724561444927,
+  "stress_energy_joules": 8643.325,
+  "stress_duration_s": 37.407721091,
+  "gpu_name": "NVIDIA A100 80GB PCIe",
   "notes": []
 }
diff --git a/submissions/deep_backoff_kn/result.json b/submissions/deep_backoff_kn/result.json
index 68c5005..934a739 100644
--- a/submissions/deep_backoff_kn/result.json
+++ b/submissions/deep_backoff_kn/result.json
@@ -1,23 +1,22 @@
 {
   "submission": "deep_backoff_kn",
-  "disqualified": true,
-  "reason": "train_time_exceeded",
-  "max_train_seconds": 300.0,
-  "training_energy_J": 4789.383014900002,
-  "training_duration_s": 300.091439702,
-  "cpu_energy_J": 12692.014912755005,
-  "total_energy_J": 17481.39792765501,
-  "gpu_name": "NVIDIA A100-SXM4-80GB",
-  "date_utc": "2026-05-20T07:15:05Z",
+  "training_energy_J": 962.9188647999999,
+  "training_duration_s": 291.568102704,
+  "cpu_energy_J": 12338.053218035013,
+  "total_energy_J": 14578.4051352,
+  "val_char_accuracy": 0.7184166666666667,
+  "val_chars": 60000,
+  "gpu_name": "NVIDIA A100 80GB PCIe",
+  "date_utc": "2026-05-21T05:09:09Z",
   "_nvml": {
     "nvml_available": true,
     "energy_counter_supported": true,
     "monotonic": true,
-    "idle_watts": 63.14000000000002,
-    "stress_watts_avg": 339.34362349669493,
-    "stress_energy_joules": 12477.219,
-    "stress_duration_s": 36.768685592,
-    "gpu_name": "NVIDIA A100-SXM4-80GB",
+    "idle_watts": 52.31990000000003,
+    "stress_watts_avg": 231.05724561444927,
+    "stress_energy_joules": 8643.325,
+    "stress_duration_s": 37.407721091,
+    "gpu_name": "NVIDIA A100 80GB PCIe",
     "notes": []
   },
   "contributor": "@nakajimagabriel"
diff --git a/submissions/deep_backoff_kn/run.log b/submissions/deep_backoff_kn/run.log
index 5bd4715..5b64b23 100644
--- a/submissions/deep_backoff_kn/run.log
+++ b/submissions/deep_backoff_kn/run.log
@@ -1,25 +1,25 @@
-# wikitext submit.py log — deep_backoff_kn — 2026-05-20T07:08:43+00:00Z
+# wikitext submit.py log — deep_backoff_kn — 2026-05-21T05:03:02+00:00Z
 [modal] launching A100-80GB ...
 ✓ Initialized. View run at 
-https://modal.com/apps/gabriel-nakajima-an/main/ap-45b2NtjIL0LErZ1xaqrUeX
+https://modal.com/apps/gabriel-nakajima-an/main/ap-1cZpQht7xa0YYz3oXegD83
 ✓ Created objects.
 ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/submit.py
 ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/task.py
-├── 🔨 Created mount /Users/naka/src/sutro/wikitext/run_eval.py
 ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/verify_nvml.py
+├── 🔨 Created mount /Users/naka/src/sutro/wikitext/run_eval.py
 ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/wikitext.py
 └── 🔨 Created function run_submission.
 [modal] verifying NVML energy counter ...
-GPU: NVIDIA A100-SXM4-80GB
+GPU: NVIDIA A100 80GB PCIe
 sampling idle power for 3s ...
-  idle: 63.1 W
+  idle: 52.3 W
 running 30s stress workload ...
-  duration:       36.8 s
-  energy delta:   12,477.2 J
-  avg power:      339.3 W
+  duration:       37.4 s
+  energy delta:   8,643.3 J
+  avg power:      231.1 W
   monotonic:      True
 ---
-{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 63.14000000000002, "stress_watts_avg": 339.34362349669493, "stress_energy_joules": 12477.219, "stress_duration_s": 36.768685592, "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": []}
+{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 52.31990000000003, "stress_watts_avg": 231.05724561444927, "stress_energy_joules": 8643.325, "stress_duration_s": 37.407721091, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": []}
 [modal] running submission (TEST_CHARS=60000 MAX_TRAIN_SECONDS=300.0 ACC_MIN=0.7) ...
 loading WikiText-103 from /data ...
   train chars: 540,095,682
@@ -27,129 +27,109 @@ loading WikiText-103 from /data ...
 train wall-clock cap: 300 s
 val accuracy floor : 0.7000
 training submission /workspace/deep_backoff_kn.py ...
-[codecarbon WARNING @ 07:10:00] Multiple instances of codecarbon are allowed to run at the same time.
+[codecarbon WARNING @ 05:03:59] Multiple instances of codecarbon are allowed to run at the same time.
 [deep-backoff-kn] starting build; max_ctx_len=13 D=0.5
-[deep-backoff-kn] encoded train: 541,096,898 bytes (0.7s)[[deep-backoff-kn] np.unique k=14: 238,387,519 pairs  113.0s (n_workers=auto)
-[deep-backoff-kn] order=14 ctx_len=13 ctxs=198,300,622  rows=238,387,519    18.2s
-[deep-backoff-kn] order=13 ctx_len=12 ctxs=157,942,721  rows=198,300,621   6045.7 MB    49.7s
-[deep-backoff-kn] order=12 ctx_len=11 ctxs=119,285,711  rows=157,942,720   4487.6 MB    39.6s
-[deep-backoff-kn] order=11 ctx_len=10 ctxs= 84,282,363  rows=119,285,710   3124.9 MB    29.6s
-[deep-backoff-kn] order=10 ctx_len= 9 ctxs= 54,720,376  rows= 84,282,363   2008.3 MB    21.5s
-[deep-backoff-kn] order= 9 ctx_len= 8 ctxs= 31,924,091  rows= 54,720,376   1167.5 MB    14.5s
-[deep-backoff-kn] order= 8 ctx_len= 7 ctxs= 16,284,921  rows= 31,924,091    599.3 MB     9.0s
----
-DISQUALIFIED: training wall-clock budget exceeded (300.0 s)
-submission         : deep_backoff_kn
-training duration  : 300.1s
-training energy (J): 4,789.4  (at kill)
-wrote /tmp/result.json
-Stopping app - local entrypoint completed.
-✓ App completed. View run at 
-https://modal.com/apps/gabriel-nakajima-an/main/ap-45b2NtjIL0LErZ1xaqrUeX
-
-# final result
-{
-  "submission": "deep_backoff_kn",
-  "disqualified": true,
-  "reason": "train_time_exceeded",
-  "max_train_seconds": 300.0,
-  "training_energy_J": 4789.383014900002,
-  "training_duration_s": 300.091439702,
-  "cpu_energy_J": 12692.014912755005,
-  "total_energy_J": 17481.39792765501,
-  "gpu_name": "NVIDIA A100-SXM4-80GB",
-  "date_utc": "2026-05-20T07:15:05Z",
-  "_nvml": {
-    "nvml_available": true,
-    "energy_counter_supported": true,
-    "monotonic": true,
-    "idle_watts": 63.14000000000002,
-    "stress_watts_avg": 339.34362349669493,
-    "stress_energy_joules": 12477.219,
-    "stress_duration_s": 36.768685592,
-    "gpu_name": "NVIDIA A100-SXM4-80GB",
-    "notes": []
-  },
-  "contributor": "@nakajimagabriel"
-}
-0.6973     4389 char/s  eta=    13s
-  eval      6,000/60,000 ( 10.0%)  acc=0.6990     4443 char/s  eta=    12s
-  eval      7,200/60,000 ( 12.0%)  acc=0.6917     4487 char/s  eta=    12s
-  eval      8,400/60,000 ( 14.0%)  acc=0.6920     4521 char/s  eta=    11s
-  eval      9,600/60,000 ( 16.0%)  acc=0.6997     4527 char/s  eta=    11s
-  eval     10,800/60,000 ( 18.0%)  acc=0.7088     4532 char/s  eta=    11s
-  eval     12,000/60,000 ( 20.0%)  acc=0.7113     4538 char/s  eta=    11s
-  eval     13,200/60,000 ( 22.0%)  acc=0.7142     4539 char/s  eta=    10s
-  eval     14,400/60,000 ( 24.0%)  acc=0.7164     4545 char/s  eta=    10s
-  eval     15,600/60,000 ( 26.0%)  acc=0.7179     4548 char/s  eta=    10s
-  eval     16,800/60,000 ( 28.0%)  acc=0.7220     4552 char/s  eta=     9s
-  eval     18,000/60,000 ( 30.0%)  acc=0.7261     4554 char/s  eta=     9s
-  eval     19,200/60,000 ( 32.0%)  acc=0.7314     4551 char/s  eta=     9s
-  eval     20,400/60,000 ( 34.0%)  acc=0.7333     4554 char/s  eta=     9s
-  eval     21,600/60,000 ( 36.0%)  acc=0.7343     4561 char/s  eta=     8s
-  eval     22,800/60,000 ( 38.0%)  acc=0.7341     4563 char/s  eta=     8s
-  eval     24,000/60,000 ( 40.0%)  acc=0.7338     4566 char/s  eta=     8s
-  eval     25,200/60,000 ( 42.0%)  acc=0.7341     4567 char/s  eta=     8s
-  eval     26,400/60,000 ( 44.0%)  acc=0.7352     4568 char/s  eta=     7s
-  eval     27,600/60,000 ( 46.0%)  acc=0.7333     4572 char/s  eta=     7s
-  eval     28,800/60,000 ( 48.0%)  acc=0.7338     4577 char/s  eta=     7s
-  eval     30,000/60,000 ( 50.0%)  acc=0.7327     4582 char/s  eta=     7s
-  eval     31,200/60,000 ( 52.0%)  acc=0.7294     4589 char/s  eta=     6s
-  eval     32,400/60,000 ( 54.0%)  acc=0.7267     4596 char/s  eta=     6s
-  eval     33,600/60,000 ( 56.0%)  acc=0.7242     4602 char/s  eta=     6s
-  eval     34,800/60,000 ( 58.0%)  acc=0.7250     4604 char/s  eta=     5s
-  eval     36,000/60,000 ( 60.0%)  acc=0.7259     4604 char/s  eta=     5s
-  eval     37,200/60,000 ( 62.0%)  acc=0.7258     4604 char/s  eta=     5s
-  eval     38,400/60,000 ( 64.0%)  acc=0.7253     4603 char/s  eta=     5s
-  eval     39,600/60,000 ( 66.0%)  acc=0.7237     4605 char/s  eta=     4s
-  eval     40,800/60,000 ( 68.0%)  acc=0.7231     4606 char/s  eta=     4s
-  eval     42,000/60,000 ( 70.0%)  acc=0.7220     4606 char/s  eta=     4s
-  eval     43,200/60,000 ( 72.0%)  acc=0.7212     4607 char/s  eta=     4s
-  eval     44,400/60,000 ( 74.0%)  acc=0.7211     4605 char/s  eta=     3s
-  eval     45,600/60,000 ( 76.0%)  acc=0.7207     4604 char/s  eta=     3s
-  eval     46,800/60,000 ( 78.0%)  acc=0.7200     4604 char/s  eta=     3s
-  eval     48,000/60,000 ( 80.0%)  acc=0.7195     4603 char/s  eta=     3s
-  eval     49,200/60,000 ( 82.0%)  acc=0.7187     4603 char/s  eta=     2s
-  eval     50,400/60,000 ( 84.0%)  acc=0.7190     4603 char/s  eta=     2s
-  eval     51,600/60,000 ( 86.0%)  acc=0.7192     4604 char/s  eta=     2s
-  eval     52,800/60,000 ( 88.0%)  acc=0.7179     4612 char/s  eta=     2s
-  eval     54,000/60,000 ( 90.0%)  acc=0.7177     4613 char/s  eta=     1s
-  eval     55,200/60,000 ( 92.0%)  acc=0.7168     4614 char/s  eta=     1s
-  eval     56,400/60,000 ( 94.0%)  acc=0.7157     4616 char/s  eta=     1s
-  eval     57,600/60,000 ( 96.0%)  acc=0.7160     4616 char/s  eta=     1s
-  eval     58,800/60,000 ( 98.0%)  acc=0.7166     4616 char/s  eta=     0s
-  eval     60,000/60,000 (100.0%)  acc=0.7184     4615 char/s  eta=     0s
-chars=60,000  acc=0.7184  eval_duration=13.0s
+[deep-backoff-kn] encoded train: 541,096,898 bytes (0.7s)
+[deep-backoff-kn] np.unique k=14: 238,387,519 pairs  132.2s (n_workers=auto)
+[deep-backoff-kn] order=14 ctx_len=13 ctxs=198,300,622  rows=238,387,519    13.7s
+[deep-backoff-kn] order=13 ctx_len=12 ctxs=157,942,721  rows=198,300,621   6045.7 MB    40.4s
+[deep-backoff-kn] order=12 ctx_len=11 ctxs=119,285,711  rows=157,942,720   4487.6 MB    32.6s
+[deep-backoff-kn] order=11 ctx_len=10 ctxs= 84,282,363  rows=119,285,710   3124.9 MB    25.4s
+[deep-backoff-kn] order=10 ctx_len= 9 ctxs= 54,720,376  rows= 84,282,363   2008.3 MB    18.6s
+[deep-backoff-kn] order= 9 ctx_len= 8 ctxs= 31,924,091  rows= 54,720,376   1167.5 MB    12.7s
+[deep-backoff-kn] order= 8 ctx_len= 7 ctxs= 16,284,921  rows= 31,924,091    599.3 MB     7.8s
+[deep-backoff-kn] order= 7 ctx_len= 6 ctxs=  7,016,442  rows= 16,284,921    263.9 MB     4.3s
+[deep-backoff-kn] order= 6 ctx_len= 5 ctxs=  2,438,281  rows=  7,016,442     96.0 MB     2.1s
+[deep-backoff-kn] order= 5 ctx_len= 4 ctxs=    637,143  rows=  2,438,281     27.5 MB     0.8s
+[deep-backoff-kn] order= 4 ctx_len= 3 ctxs=    122,882  rows=    637,143      6.0 MB     0.3s
+[deep-backoff-kn] order= 3 ctx_len= 2 ctxs=     12,282  rows=    122,882      0.9 MB     0.1s
+[deep-backoff-kn] order= 2 ctx_len= 1 ctxs=        204  rows=     12,282      0.1 MB     0.0s
+[deep-backoff-kn] order= 1 ctx_len= 0 ctxs=          1  rows=        204      0.0 MB     0.0s
+[deep-backoff-kn] continuation base: entropy=5.083 nats
+[deep-backoff-kn] total build: 291.6s
+training: 962.9 J   duration=291.6s
+evaluating on val split ...
+  eval      1,200/60,000 (  2.0%)  acc=0.7058     3616 char/s  eta=    16s
+  eval      2,400/60,000 (  4.0%)  acc=0.6846     3861 char/s  eta=    15s
+  eval      3,600/60,000 (  6.0%)  acc=0.6842     3973 char/s  eta=    14s
+  eval      4,800/60,000 (  8.0%)  acc=0.6973     4017 char/s  eta=    14s
+  eval      6,000/60,000 ( 10.0%)  acc=0.6990     4069 char/s  eta=    13s
+  eval      7,200/60,000 ( 12.0%)  acc=0.6917     4142 char/s  eta=    13s
+  eval      8,400/60,000 ( 14.0%)  acc=0.6920     4201 char/s  eta=    12s
+  eval      9,600/60,000 ( 16.0%)  acc=0.6997     4215 char/s  eta=    12s
+  eval     10,800/60,000 ( 18.0%)  acc=0.7088     4255 char/s  eta=    12s
+  eval     12,000/60,000 ( 20.0%)  acc=0.7113     4251 char/s  eta=    11s
+  eval     13,200/60,000 ( 22.0%)  acc=0.7142     4248 char/s  eta=    11s
+  eval     14,400/60,000 ( 24.0%)  acc=0.7164     4260 char/s  eta=    11s
+  eval     15,600/60,000 ( 26.0%)  acc=0.7179     4288 char/s  eta=    10s
+  eval     16,800/60,000 ( 28.0%)  acc=0.7220     4297 char/s  eta=    10s
+  eval     18,000/60,000 ( 30.0%)  acc=0.7261     4304 char/s  eta=    10s
+  eval     19,200/60,000 ( 32.0%)  acc=0.7314     4304 char/s  eta=     9s
+  eval     20,400/60,000 ( 34.0%)  acc=0.7333     4311 char/s  eta=     9s
+  eval     21,600/60,000 ( 36.0%)  acc=0.7343     4318 char/s  eta=     9s
+  eval     22,800/60,000 ( 38.0%)  acc=0.7341     4319 char/s  eta=     9s
+  eval     24,000/60,000 ( 40.0%)  acc=0.7338     4321 char/s  eta=     8s
+  eval     25,200/60,000 ( 42.0%)  acc=0.7341     4327 char/s  eta=     8s
+  eval     26,400/60,000 ( 44.0%)  acc=0.7352     4341 char/s  eta=     8s
+  eval     27,600/60,000 ( 46.0%)  acc=0.7333     4360 char/s  eta=     7s
+  eval     28,800/60,000 ( 48.0%)  acc=0.7338     4364 char/s  eta=     7s
+  eval     30,000/60,000 ( 50.0%)  acc=0.7327     4365 char/s  eta=     7s
+  eval     31,200/60,000 ( 52.0%)  acc=0.7294     4370 char/s  eta=     7s
+  eval     32,400/60,000 ( 54.0%)  acc=0.7267     4374 char/s  eta=     6s
+  eval     33,600/60,000 ( 56.0%)  acc=0.7242     4382 char/s  eta=     6s
+  eval     34,800/60,000 ( 58.0%)  acc=0.7250     4386 char/s  eta=     6s
+  eval     36,000/60,000 ( 60.0%)  acc=0.7259     4388 char/s  eta=     5s
+  eval     37,200/60,000 ( 62.0%)  acc=0.7258     4393 char/s  eta=     5s
+  eval     38,400/60,000 ( 64.0%)  acc=0.7253     4396 char/s  eta=     5s
+  eval     39,600/60,000 ( 66.0%)  acc=0.7237     4394 char/s  eta=     5s
+  eval     40,800/60,000 ( 68.0%)  acc=0.7231     4395 char/s  eta=     4s
+  eval     42,000/60,000 ( 70.0%)  acc=0.7220     4395 char/s  eta=     4s
+  eval     43,200/60,000 ( 72.0%)  acc=0.7212     4398 char/s  eta=     4s
+  eval     44,400/60,000 ( 74.0%)  acc=0.7211     4398 char/s  eta=     4s
+  eval     45,600/60,000 ( 76.0%)  acc=0.7207     4400 char/s  eta=     3s
+  eval     46,800/60,000 ( 78.0%)  acc=0.7200     4397 char/s  eta=     3s
+  eval     48,000/60,000 ( 80.0%)  acc=0.7195     4392 char/s  eta=     3s
+  eval     49,200/60,000 ( 82.0%)  acc=0.7187     4392 char/s  eta=     2s
+  eval     50,400/60,000 ( 84.0%)  acc=0.7190     4392 char/s  eta=     2s
+  eval     51,600/60,000 ( 86.0%)  acc=0.7192     4391 char/s  eta=     2s
+  eval     52,800/60,000 ( 88.0%)  acc=0.7179     4401 char/s  eta=     2s
+  eval     54,000/60,000 ( 90.0%)  acc=0.7177     4405 char/s  eta=     1s
+  eval     55,200/60,000 ( 92.0%)  acc=0.7168     4411 char/s  eta=     1s
+  eval     56,400/60,000 ( 94.0%)  acc=0.7157     4411 char/s  eta=     1s
+  eval     57,600/60,000 ( 96.0%)  acc=0.7160     4407 char/s  eta=     1s
+  eval     58,800/60,000 ( 98.0%)  acc=0.7166     4403 char/s  eta=     0s
+  eval     60,000/60,000 (100.0%)  acc=0.7184     4405 char/s  eta=     0s
+chars=60,000  acc=0.7184  eval_duration=13.6s
 ---
 submission         : deep_backoff_kn
-training energy (J): 2,172.0
-training duration  : 245.5s
+training energy (J): 962.9
+training duration  : 291.6s
 val  char-accuracy : 0.7184
 val  chars         : 60,000
 wrote /tmp/result.json
-Stopping app - local entrypoint completed.
+Stopping app - local client disconnected. Use `modal run --detach` to keep apps running even if your local client disconnects.
 ✓ App completed. View run at 
-https://modal.com/apps/gabriel-nakajima-an/main/ap-wEz27zjOURQzDmbGKNMPqb
+https://modal.com/apps/gabriel-nakajima-an/main/ap-1cZpQht7xa0YYz3oXegD83
 
 # final result
 {
   "submission": "deep_backoff_kn",
-  "training_energy_J": 2172.0416936,
-  "training_duration_s": 245.475966128,
-  "cpu_energy_J": 10385.495287457501,
-  "total_energy_J": 12557.536981057501,
+  "training_energy_J": 962.9188647999999,
+  "training_duration_s": 291.568102704,
+  "cpu_energy_J": 12338.053218035013,
+  "total_energy_J": 14578.4051352,
   "val_char_accuracy": 0.7184166666666667,
   "val_chars": 60000,
   "gpu_name": "NVIDIA A100 80GB PCIe",
-  "date_utc": "2026-05-20T07:13:17Z",
+  "date_utc": "2026-05-21T05:09:09Z",
   "_nvml": {
     "nvml_available": true,
     "energy_counter_supported": true,
     "monotonic": true,
-    "idle_watts": 57.467733333333335,
-    "stress_watts_avg": 236.89272714598408,
-    "stress_energy_joules": 8693.929,
-    "stress_duration_s": 36.699856111,
+    "idle_watts": 52.31990000000003,
+    "stress_watts_avg": 231.05724561444927,
+    "stress_energy_joules": 8643.325,
+    "stress_duration_s": 37.407721091,
     "gpu_name": "NVIDIA A100 80GB PCIe",
     "notes": []
   },
diff --git a/submissions/gpu_ngram_o14_xorfix/nvml.json b/submissions/gpu_ngram_o14_xorfix/nvml.json
index ef2d82e..decba91 100644
--- a/submissions/gpu_ngram_o14_xorfix/nvml.json
+++ b/submissions/gpu_ngram_o14_xorfix/nvml.json
@@ -2,10 +2,10 @@
   "nvml_available": true,
   "energy_counter_supported": true,
   "monotonic": true,
-  "idle_watts": 53.04595000000001,
-  "stress_watts_avg": 222.8521356537582,
-  "stress_energy_joules": 8477.795,
-  "stress_duration_s": 38.042242562000006,
+  "idle_watts": 55.68680000000003,
+  "stress_watts_avg": 236.68642874335322,
+  "stress_energy_joules": 8655.791,
+  "stress_duration_s": 36.570711071,
   "gpu_name": "NVIDIA A100 80GB PCIe",
   "notes": []
 }
diff --git a/submissions/gpu_ngram_o14_xorfix/result.json b/submissions/gpu_ngram_o14_xorfix/result.json
index 96d8384..2b79e7d 100644
--- a/submissions/gpu_ngram_o14_xorfix/result.json
+++ b/submissions/gpu_ngram_o14_xorfix/result.json
@@ -1,21 +1,21 @@
 {
   "submission": "gpu_ngram_o14_xorfix",
-  "training_energy_J": 3441.0376875,
-  "training_duration_s": 97.64232625,
-  "cpu_energy_J": 4134.604408382503,
-  "total_energy_J": 7575.642095882503,
+  "training_energy_J": 3981.1039870000004,
+  "training_duration_s": 109.11744026,
+  "cpu_energy_J": 4621.207360232503,
+  "total_energy_J": 8602.311347232502,
   "val_char_accuracy": 0.7184166666666667,
   "val_chars": 60000,
   "gpu_name": "NVIDIA A100 80GB PCIe",
-  "date_utc": "2026-05-20T07:11:46Z",
+  "date_utc": "2026-05-21T05:06:04Z",
   "_nvml": {
     "nvml_available": true,
     "energy_counter_supported": true,
     "monotonic": true,
-    "idle_watts": 53.04595000000001,
-    "stress_watts_avg": 222.8521356537582,
-    "stress_energy_joules": 8477.795,
-    "stress_duration_s": 38.042242562000006,
+    "idle_watts": 55.68680000000003,
+    "stress_watts_avg": 236.68642874335322,
+    "stress_energy_joules": 8655.791,
+    "stress_duration_s": 36.570711071,
     "gpu_name": "NVIDIA A100 80GB PCIe",
     "notes": []
   },
diff --git a/submissions/gpu_ngram_o14_xorfix/run.log b/submissions/gpu_ngram_o14_xorfix/run.log
index ee3d8b8..997ca3c 100644
--- a/submissions/gpu_ngram_o14_xorfix/run.log
+++ b/submissions/gpu_ngram_o14_xorfix/run.log
@@ -1,7 +1,7 @@
-# wikitext submit.py log — gpu_ngram_o14_xorfix — 2026-05-20T07:08:39+00:00Z
+# wikitext submit.py log — gpu_ngram_o14_xorfix — 2026-05-21T05:03:02+00:00Z
 [modal] launching A100-80GB ...
 ✓ Initialized. View run at 
-https://modal.com/apps/gabriel-nakajima-an/main/ap-i8ghm6z5tQ198XlJNnHqte
+https://modal.com/apps/gabriel-nakajima-an/main/ap-Hgn8xfV74JBawHiLMba2fr
 ✓ Created objects.
 ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/submit.py
 ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/task.py
@@ -12,14 +12,14 @@ https://modal.com/apps/gabriel-nakajima-an/main/ap-i8ghm6z5tQ198XlJNnHqte
 [modal] verifying NVML energy counter ...
 GPU: NVIDIA A100 80GB PCIe
 sampling idle power for 3s ...
-  idle: 53.0 W
+  idle: 55.7 W
 running 30s stress workload ...
-  duration:       38.0 s
-  energy delta:   8,477.8 J
-  avg power:      222.9 W
+  duration:       36.6 s
+  energy delta:   8,655.8 J
+  avg power:      236.7 W
   monotonic:      True
 ---
-{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 53.04595000000001, "stress_watts_avg": 222.8521356537582, "stress_energy_joules": 8477.795, "stress_duration_s": 38.042242562000006, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": []}
+{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 55.68680000000003, "stress_watts_avg": 236.68642874335322, "stress_energy_joules": 8655.791, "stress_duration_s": 36.570711071, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": []}
 [modal] running submission (TEST_CHARS=60000 MAX_TRAIN_SECONDS=300.0 ACC_MIN=0.7) ...
 loading WikiText-103 from /data ...
   train chars: 540,095,682
@@ -27,112 +27,110 @@ loading WikiText-103 from /data ...
 train wall-clock cap: 300 s
 val accuracy floor : 0.7000
 training submission /workspace/gpu_ngram_o14_xorfix.py ...
-[codecarbon WARNING @ 07:09:50] Multiple instances of codecarbon are allowed to run at the same time.
+[codecarbon WARNING @ 05:03:58] Multiple instances of codecarbon are allowed to run at the same time.
 [gpu_ngram_o14_xorfix] starting build; max_order=14 D=0.5
-[gpu_ngram_o14_xorfix] encoded train: 541,096,898 bytes (0.4s)
-[gpu_ngram_o14_xorfix] top order=14 unique pairs: 238,387,519  3.5s
-[gpu_ngram_o14_xorfix] ctx_len=13 ctxs=198,300,622 rows=238,387,519  26.6s
-[gpu_ngram_o14_xorfix] ctx_len=12 ctxs=157,942,721 rows=198,300,621  20.2s
-[gpu_ngram_o14_xorfix] ctx_len=11 ctxs=119,285,711 rows=157,942,720  18.2s
-[gpu_ngram_o14_xorfix] ctx_len=10 ctxs=84,282,363 rows=119,285,710  11.0s
-[gpu_ngram_o14_xorfix] ctx_len=9 ctxs=54,720,376 rows=84,282,363  7.3s
-[gpu_ngram_o14_xorfix] ctx_len=8 ctxs=31,924,091 rows=54,720,376  4.5s
-[gpu_ngram_o14_xorfix] ctx_len=7 ctxs=16,284,921 rows=31,924,091  3.0s
-[gpu_ngram_o14_xorfix] ctx_len=6 ctxs=7,016,442 rows=16,284,921  1.4s
+[gpu_ngram_o14_xorfix] encoded train: 541,096,898 bytes (0.3s)
+[gpu_ngram_o14_xorfix] top order=14 unique pairs: 238,387,519  3.3s
+[gpu_ngram_o14_xorfix] ctx_len=13 ctxs=198,300,622 rows=238,387,519  32.0s
+[gpu_ngram_o14_xorfix] ctx_len=12 ctxs=157,942,721 rows=198,300,621  24.4s
+[gpu_ngram_o14_xorfix] ctx_len=11 ctxs=119,285,711 rows=157,942,720  18.0s
+[gpu_ngram_o14_xorfix] ctx_len=10 ctxs=84,282,363 rows=119,285,710  12.6s
+[gpu_ngram_o14_xorfix] ctx_len=9 ctxs=54,720,376 rows=84,282,363  8.4s
+[gpu_ngram_o14_xorfix] ctx_len=8 ctxs=31,924,091 rows=54,720,376  4.9s
+[gpu_ngram_o14_xorfix] ctx_len=7 ctxs=16,284,921 rows=31,924,091  2.6s
+[gpu_ngram_o14_xorfix] ctx_len=6 ctxs=7,016,442 rows=16,284,921  1.2s
 [gpu_ngram_o14_xorfix] ctx_len=5 ctxs=2,438,281 rows=7,016,442  0.5s
 [gpu_ngram_o14_xorfix] ctx_len=4 ctxs=637,143 rows=2,438,281  0.1s
 [gpu_ngram_o14_xorfix] ctx_len=3 ctxs=122,882 rows=637,143  0.0s
 [gpu_ngram_o14_xorfix] ctx_len=2 ctxs=12,282 rows=122,882  0.0s
 [gpu_ngram_o14_xorfix] ctx_len=1 ctxs=204 rows=12,282  0.0s
 [gpu_ngram_o14_xorfix] ctx_len=0 ctxs=1 rows=204  0.0s
-[gpu_ngram_o14_xorfix] total build: 96.9s
-training: 3,441.0 J   duration=97.6s
+[gpu_ngram_o14_xorfix] total build: 108.4s
+training: 3,981.1 J   duration=109.1s
 evaluating on val split ...
-  eval      1,200/60,000 (  2.0%)  acc=0.7058     3412 char/s  eta=    17s
-  eval      2,400/60,000 (  4.0%)  acc=0.6846     3685 char/s  eta=    16s
-  eval      3,600/60,000 (  6.0%)  acc=0.6842     3807 char/s  eta=    15s
-  eval      4,800/60,000 (  8.0%)  acc=0.6973     3873 char/s  eta=    14s
-  eval      6,000/60,000 ( 10.0%)  acc=0.6990     3943 char/s  eta=    14s
-  eval      7,200/60,000 ( 12.0%)  acc=0.6917     3994 char/s  eta=    13s
-  eval      8,400/60,000 ( 14.0%)  acc=0.6920     4064 char/s  eta=    13s
-  eval      9,600/60,000 ( 16.0%)  acc=0.6997     4071 char/s  eta=    12s
-  eval     10,800/60,000 ( 18.0%)  acc=0.7088     4079 char/s  eta=    12s
-  eval     12,000/60,000 ( 20.0%)  acc=0.7113     4086 char/s  eta=    12s
-  eval     13,200/60,000 ( 22.0%)  acc=0.7142     4117 char/s  eta=    11s
-  eval     14,400/60,000 ( 24.0%)  acc=0.7164     4122 char/s  eta=    11s
-  eval     15,600/60,000 ( 26.0%)  acc=0.7179     4123 char/s  eta=    11s
-  eval     16,800/60,000 ( 28.0%)  acc=0.7220     4127 char/s  eta=    10s
-  eval     18,000/60,000 ( 30.0%)  acc=0.7261     4122 char/s  eta=    10s
-  eval     19,200/60,000 ( 32.0%)  acc=0.7314     4118 char/s  eta=    10s
-  eval     20,400/60,000 ( 34.0%)  acc=0.7333     4119 char/s  eta=    10s
-  eval     21,600/60,000 ( 36.0%)  acc=0.7343     4122 char/s  eta=     9s
-  eval     22,800/60,000 ( 38.0%)  acc=0.7341     4135 char/s  eta=     9s
-  eval     24,000/60,000 ( 40.0%)  acc=0.7338     4136 char/s  eta=     9s
-  eval     25,200/60,000 ( 42.0%)  acc=0.7341     4138 char/s  eta=     8s
-  eval     26,400/60,000 ( 44.0%)  acc=0.7352     4139 char/s  eta=     8s
-  eval     27,600/60,000 ( 46.0%)  acc=0.7333     4141 char/s  eta=     8s
-  eval     28,800/60,000 ( 48.0%)  acc=0.7338     4145 char/s  eta=     8s
-  eval     30,000/60,000 ( 50.0%)  acc=0.7327     4155 char/s  eta=     7s
-  eval     31,200/60,000 ( 52.0%)  acc=0.7294     4162 char/s  eta=     7s
-  eval     32,400/60,000 ( 54.0%)  acc=0.7267     4183 char/s  eta=     7s
-  eval     33,600/60,000 ( 56.0%)  acc=0.7242     4186 char/s  eta=     6s
-  eval     34,800/60,000 ( 58.0%)  acc=0.7250     4186 char/s  eta=     6s
-  eval     36,000/60,000 ( 60.0%)  acc=0.7259     4185 char/s  eta=     6s
-  eval     37,200/60,000 ( 62.0%)  acc=0.7258     4185 char/s  eta=     5s
-  eval     38,400/60,000 ( 64.0%)  acc=0.7253     4198 char/s  eta=     5s
-  eval     39,600/60,000 ( 66.0%)  acc=0.7237     4199 char/s  eta=     5s
-  eval     40,800/60,000 ( 68.0%)  acc=0.7231     4197 char/s  eta=     5s
-  eval     42,000/60,000 ( 70.0%)  acc=0.7220     4197 char/s  eta=     4s
-  eval     43,200/60,000 ( 72.0%)  acc=0.7212     4197 char/s  eta=     4s
-  eval     44,400/60,000 ( 74.0%)  acc=0.7211     4209 char/s  eta=     4s
-  eval     45,600/60,000 ( 76.0%)  acc=0.7207     4206 char/s  eta=     3s
-  eval     46,800/60,000 ( 78.0%)  acc=0.7200     4203 char/s  eta=     3s
-  eval     48,000/60,000 ( 80.0%)  acc=0.7195     4200 char/s  eta=     3s
-  eval     49,200/60,000 ( 82.0%)  acc=0.7187     4199 char/s  eta=     3s
-  eval     50,400/60,000 ( 84.0%)  acc=0.7190     4198 char/s  eta=     2s
-  eval     51,600/60,000 ( 86.0%)  acc=0.7192     4198 char/s  eta=     2s
-  eval     52,800/60,000 ( 88.0%)  acc=0.7179     4206 char/s  eta=     2s
-  eval     54,000/60,000 ( 90.0%)  acc=0.7177     4206 char/s  eta=     1s
-  eval     55,200/60,000 ( 92.0%)  acc=0.7168     4207 char/s  eta=     1s
-  eval     56,400/60,000 ( 94.0%)  acc=0.7157     4208 char/s  eta=     1s
-  eval     57,600/60,000 ( 96.0%)  acc=0.7160     4217 char/s  eta=     1s
-  eval     58,800/60,000 ( 98.0%)  acc=0.7166     4215 char/s  eta=     0s
-  eval     60,000/60,000 (100.0%)  acc=0.7184     4213 char/s  eta=     0s
-chars=60,000  acc=0.7184  eval_duration=14.2s
+  eval      1,200/60,000 (  2.0%)  acc=0.7058     4071 char/s  eta=    14s
+  eval      2,400/60,000 (  4.0%)  acc=0.6846     4390 char/s  eta=    13s
+  eval      3,600/60,000 (  6.0%)  acc=0.6842     4530 char/s  eta=    12s
+  eval      4,800/60,000 (  8.0%)  acc=0.6973     4595 char/s  eta=    12s
+  eval      6,000/60,000 ( 10.0%)  acc=0.6990     4664 char/s  eta=    12s
+  eval      7,200/60,000 ( 12.0%)  acc=0.6917     4716 char/s  eta=    11s
+  eval      8,400/60,000 ( 14.0%)  acc=0.6920     4755 char/s  eta=    11s
+  eval      9,600/60,000 ( 16.0%)  acc=0.6997     4769 char/s  eta=    11s
+  eval     10,800/60,000 ( 18.0%)  acc=0.7088     4778 char/s  eta=    10s
+  eval     12,000/60,000 ( 20.0%)  acc=0.7113     4789 char/s  eta=    10s
+  eval     13,200/60,000 ( 22.0%)  acc=0.7142     4795 char/s  eta=    10s
+  eval     14,400/60,000 ( 24.0%)  acc=0.7164     4803 char/s  eta=     9s
+  eval     15,600/60,000 ( 26.0%)  acc=0.7179     4807 char/s  eta=     9s
+  eval     16,800/60,000 ( 28.0%)  acc=0.7220     4813 char/s  eta=     9s
+  eval     18,000/60,000 ( 30.0%)  acc=0.7261     4815 char/s  eta=     9s
+  eval     19,200/60,000 ( 32.0%)  acc=0.7314     4812 char/s  eta=     8s
+  eval     20,400/60,000 ( 34.0%)  acc=0.7333     4815 char/s  eta=     8s
+  eval     21,600/60,000 ( 36.0%)  acc=0.7343     4822 char/s  eta=     8s
+  eval     22,800/60,000 ( 38.0%)  acc=0.7341     4824 char/s  eta=     8s
+  eval     24,000/60,000 ( 40.0%)  acc=0.7338     4827 char/s  eta=     7s
+  eval     25,200/60,000 ( 42.0%)  acc=0.7341     4826 char/s  eta=     7s
+  eval     26,400/60,000 ( 44.0%)  acc=0.7352     4826 char/s  eta=     7s
+  eval     27,600/60,000 ( 46.0%)  acc=0.7333     4830 char/s  eta=     7s
+  eval     28,800/60,000 ( 48.0%)  acc=0.7338     4836 char/s  eta=     6s
+  eval     30,000/60,000 ( 50.0%)  acc=0.7327     4841 char/s  eta=     6s
+  eval     31,200/60,000 ( 52.0%)  acc=0.7294     4848 char/s  eta=     6s
+  eval     32,400/60,000 ( 54.0%)  acc=0.7267     4856 char/s  eta=     6s
+  eval     33,600/60,000 ( 56.0%)  acc=0.7242     4863 char/s  eta=     5s
+  eval     34,800/60,000 ( 58.0%)  acc=0.7250     4864 char/s  eta=     5s
+  eval     36,000/60,000 ( 60.0%)  acc=0.7259     4865 char/s  eta=     5s
+  eval     37,200/60,000 ( 62.0%)  acc=0.7258     4864 char/s  eta=     5s
+  eval     38,400/60,000 ( 64.0%)  acc=0.7253     4864 char/s  eta=     4s
+  eval     39,600/60,000 ( 66.0%)  acc=0.7237     4868 char/s  eta=     4s
+  eval     40,800/60,000 ( 68.0%)  acc=0.7231     4869 char/s  eta=     4s
+  eval     42,000/60,000 ( 70.0%)  acc=0.7220     4870 char/s  eta=     4s
+  eval     43,200/60,000 ( 72.0%)  acc=0.7212     4872 char/s  eta=     3s
+  eval     44,400/60,000 ( 74.0%)  acc=0.7211     4870 char/s  eta=     3s
+  eval     45,600/60,000 ( 76.0%)  acc=0.7207     4870 char/s  eta=     3s
+  eval     46,800/60,000 ( 78.0%)  acc=0.7200     4868 char/s  eta=     3s
+  eval     48,000/60,000 ( 80.0%)  acc=0.7195     4868 char/s  eta=     2s
+  eval     49,200/60,000 ( 82.0%)  acc=0.7187     4868 char/s  eta=     2s
+  eval     50,400/60,000 ( 84.0%)  acc=0.7190     4868 char/s  eta=     2s
+  eval     51,600/60,000 ( 86.0%)  acc=0.7192     4868 char/s  eta=     2s
+  eval     52,800/60,000 ( 88.0%)  acc=0.7179     4876 char/s  eta=     1s
+  eval     54,000/60,000 ( 90.0%)  acc=0.7177     4876 char/s  eta=     1s
+  eval     55,200/60,000 ( 92.0%)  acc=0.7168     4878 char/s  eta=     1s
+  eval     56,400/60,000 ( 94.0%)  acc=0.7157     4880 char/s  eta=     1s
+  eval     57,600/60,000 ( 96.0%)  acc=0.7160     4880 char/s  eta=     0s
+  eval     58,800/60,000 ( 98.0%)  acc=0.7166     4879 char/s  eta=     0s
+  eval     60,000/60,000 (100.0%)  acc=0.7184     4879 char/s  eta=     0s
+chars=60,000  acc=0.7184  eval_duration=12.3s
 ---
 submission         : gpu_ngram_o14_xorfix
-training energy (J): 3,441.0
-training duration  : 97.6s
+training energy (J): 3,981.1
+training duration  : 109.1s
 val  char-accuracy : 0.7184
 val  chars         : 60,000
 wrote /tmp/result.json
 Stopping app - local entrypoint completed.
 ✓ App completed. View run at 
-https://modal.com/apps/gabriel-nakajima-an/main/ap-i8ghm6z5tQ198XlJNnHqte
+https://modal.com/apps/gabriel-nakajima-an/main/ap-Hgn8xfV74JBawHiLMba2fr
 
 # final result
 {
   "submission": "gpu_ngram_o14_xorfix",
-  "training_energy_J": 3441.0376875,
-  "training_duration_s": 97.64232625,
-  "cpu_energy_J": 4134.604408382503,
-  "total_energy_J": 7575.642095882503,
+  "training_energy_J": 3981.1039870000004,
+  "training_duration_s": 109.11744026,
+  "cpu_energy_J": 4621.207360232503,
+  "total_energy_J": 8602.311347232502,
   "val_char_accuracy": 0.7184166666666667,
   "val_chars": 60000,
   "gpu_name": "NVIDIA A100 80GB PCIe",
-  "date_utc": "2026-05-20T07:11:46Z",
+  "date_utc": "2026-05-21T05:06:04Z",
   "_nvml": {
     "nvml_available": true,
     "energy_counter_supported": true,
     "monotonic": true,
-    "idle_watts": 53.04595000000001,
-    "stress_watts_avg": 222.8521356537582,
-    "stress_energy_joules": 8477.795,
-    "stress_duration_s": 38.042242562000006,
+    "idle_watts": 55.68680000000003,
+    "stress_watts_avg": 236.68642874335322,
+    "stress_energy_joules": 8655.791,
+    "stress_duration_s": 36.570711071,
     "gpu_name": "NVIDIA A100 80GB PCIe",
     "notes": []
   },
   "contributor": "@subagent-xorfix-2026-05-19"
 }
-ix-2026-05-19"
-}
diff --git a/submissions/gpu_ngram_w31_k11/nvml.json b/submissions/gpu_ngram_w31_k11/nvml.json
index a8d215a..f9eae32 100644
--- a/submissions/gpu_ngram_w31_k11/nvml.json
+++ b/submissions/gpu_ngram_w31_k11/nvml.json
@@ -2,10 +2,10 @@
   "nvml_available": true,
   "energy_counter_supported": true,
   "monotonic": true,
-  "idle_watts": 52.723183333333296,
-  "stress_watts_avg": 226.95174796885868,
-  "stress_energy_joules": 8335.027,
-  "stress_duration_s": 36.72598724,
+  "idle_watts": 55.01793220338981,
+  "stress_watts_avg": 230.13301547418834,
+  "stress_energy_joules": 8413.972,
+  "stress_duration_s": 36.561342503000006,
   "gpu_name": "NVIDIA A100 80GB PCIe",
   "notes": []
 }
diff --git a/submissions/gpu_ngram_w31_k11/result.json b/submissions/gpu_ngram_w31_k11/result.json
index 54dd7c0..3d6a2df 100644
--- a/submissions/gpu_ngram_w31_k11/result.json
+++ b/submissions/gpu_ngram_w31_k11/result.json
@@ -1,21 +1,21 @@
 {
   "submission": "gpu_ngram_w31_k11",
-  "training_energy_J": 1332.8045820499997,
-  "training_duration_s": 33.551668359000004,
-  "cpu_energy_J": 1420.9300898524978,
-  "total_energy_J": 2753.734671902497,
+  "training_energy_J": 1612.2052069500003,
+  "training_duration_s": 34.944975860999996,
+  "cpu_energy_J": 1479.7634497275012,
+  "total_energy_J": 3091.9686566775017,
   "val_char_accuracy": 0.7050333333333333,
   "val_chars": 60000,
   "gpu_name": "NVIDIA A100 80GB PCIe",
-  "date_utc": "2026-05-20T07:07:33Z",
+  "date_utc": "2026-05-21T05:04:49Z",
   "_nvml": {
     "nvml_available": true,
     "energy_counter_supported": true,
     "monotonic": true,
-    "idle_watts": 52.723183333333296,
-    "stress_watts_avg": 226.95174796885868,
-    "stress_energy_joules": 8335.027,
-    "stress_duration_s": 36.72598724,
+    "idle_watts": 55.01793220338981,
+    "stress_watts_avg": 230.13301547418834,
+    "stress_energy_joules": 8413.972,
+    "stress_duration_s": 36.561342503000006,
     "gpu_name": "NVIDIA A100 80GB PCIe",
     "notes": []
   },
diff --git a/submissions/gpu_ngram_w31_k11/run.log b/submissions/gpu_ngram_w31_k11/run.log
index cabeef8..c1cc8fe 100644
--- a/submissions/gpu_ngram_w31_k11/run.log
+++ b/submissions/gpu_ngram_w31_k11/run.log
@@ -1,163 +1,7 @@
-# wikitext submit.py log — gpu_ngram_w31_k11 — 2026-05-20T07:05:37+00:00Z
+# wikitext submit.py log — gpu_ngram_w31_k11 — 2026-05-21T05:03:02+00:00Z
 [modal] launching A100-80GB ...
 ✓ Initialized. View run at 
-https://modal.com/apps/gabriel-nakajima-an/main/ap-Xr2U1qCw3wvtCqAVizWeyd
-Building image im-HqRgnUnflxE8oQRhywMp4D
-
-=> Step 0: FROM base
-
-=> Step 1: RUN python -m pip install codecarbon
-Looking in indexes: http://pypi-mirror.modal.local:5555/simple
-Collecting codecarbon
-  Downloading http://pypi-mirror.modal.local:5555/simple/codecarbon/codecarbon-3.2.7-py3-none-any.whl.metadata (9.7 kB)
-Collecting arrow (from codecarbon)
-  Downloading http://pypi-mirror.modal.local:5555/simple/arrow/arrow-1.4.0-py3-none-any.whl.metadata (7.7 kB)
-Collecting authlib>=1.2.1 (from codecarbon)
-  Downloading http://pypi-mirror.modal.local:5555/simple/authlib/authlib-1.7.2-py2.py3-none-any.whl.metadata (10 kB)
-Collecting click (from codecarbon)
-  Downloading http://pypi-mirror.modal.local:5555/simple/click/click-8.4.0-py3-none-any.whl.metadata (2.6 kB)
-Collecting pandas (from codecarbon)
-  Downloading http://pypi-mirror.modal.local:5555/simple/pandas/pandas-3.0.3-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.metadata (79 kB)
-     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 79.5/79.5 kB 240.5 MB/s eta 0:00:00
-Collecting prometheus_client (from codecarbon)
-  Downloading http://pypi-mirror.modal.local:5555/simple/prometheus-client/prometheus_client-0.25.0-py3-none-any.whl.metadata (2.1 kB)
-Collecting psutil>=6.0.0 (from codecarbon)
-  Downloading http://pypi-mirror.modal.local:5555/simple/psutil/psutil-7.2.2-cp36-abi3-manylinux2010_x86_64.manylinux_2_12_x86_64.manylinux_2_28_x86_64.whl.metadata (22 kB)
-Collecting py-cpuinfo (from codecarbon)
-  Downloading http://pypi-mirror.modal.local:5555/simple/py-cpuinfo/py_cpuinfo-9.0.0-py3-none-any.whl.metadata (794 bytes)
-Collecting pydantic (from codecarbon)
-  Downloading http://pypi-mirror.modal.local:5555/simple/pydantic/pydantic-2.13.4-py3-none-any.whl.metadata (109 kB)
-     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 109.4/109.4 kB 233.4 MB/s eta 0:00:00
-Requirement already satisfied: nvidia-ml-py in /usr/local/lib/python3.11/site-packages (from codecarbon) (12.560.30)
-Collecting rapidfuzz (from codecarbon)
-  Downloading http://pypi-mirror.modal.local:5555/simple/rapidfuzz/rapidfuzz-3.14.5-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (12 kB)
-Requirement already satisfied: requests in /usr/local/lib/python3.11/site-packages (from codecarbon) (2.34.2)
-Collecting questionary (from codecarbon)
-  Downloading http://pypi-mirror.modal.local:5555/simple/questionary/questionary-2.1.1-py3-none-any.whl.metadata (5.4 kB)
-Collecting rich (from codecarbon)
-  Downloading http://pypi-mirror.modal.local:5555/simple/rich/rich-15.0.0-py3-none-any.whl.metadata (18 kB)
-Collecting typer (from codecarbon)
-  Downloading http://pypi-mirror.modal.local:5555/simple/typer/typer-0.25.1-py3-none-any.whl.metadata (15 kB)
-Collecting pycountry (from codecarbon)
-  Downloading http://pypi-mirror.modal.local:5555/simple/pycountry/pycountry-26.2.16-py3-none-any.whl.metadata (12 kB)
-Collecting cryptography (from authlib>=1.2.1->codecarbon)
-  Downloading http://pypi-mirror.modal.local:5555/simple/cryptography/cryptography-48.0.0-cp311-abi3-manylinux_2_34_x86_64.whl.metadata (4.3 kB)
-Collecting joserfc>=1.6.0 (from authlib>=1.2.1->codecarbon)
-  Downloading http://pypi-mirror.modal.local:5555/simple/joserfc/joserfc-1.6.5-py3-none-any.whl.metadata (3.2 kB)
-Collecting python-dateutil>=2.7.0 (from arrow->codecarbon)
-  Downloading http://pypi-mirror.modal.local:5555/simple/python-dateutil/python_dateutil-2.9.0.post0-py2.py3-none-any.whl.metadata (8.4 kB)
-Collecting tzdata (from arrow->codecarbon)
-  Downloading http://pypi-mirror.modal.local:5555/simple/tzdata/tzdata-2026.2-py2.py3-none-any.whl.metadata (1.4 kB)
-Requirement already satisfied: numpy>=1.26.0 in /usr/local/lib/python3.11/site-packages (from pandas->codecarbon) (2.1.3)
-Collecting annotated-types>=0.6.0 (from pydantic->codecarbon)
-  Downloading http://pypi-mirror.modal.local:5555/simple/annotated-types/annotated_types-0.7.0-py3-none-any.whl.metadata (15 kB)
-Collecting pydantic-core==2.46.4 (from pydantic->codecarbon)
-  Downloading http://pypi-mirror.modal.local:5555/simple/pydantic-core/pydantic_core-2.46.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
-Requirement already satisfied: typing-extensions>=4.14.1 in /usr/local/lib/python3.11/site-packages (from pydantic->codecarbon) (4.15.0)
-Collecting typing-inspection>=0.4.2 (from pydantic->codecarbon)
-  Downloading http://pypi-mirror.modal.local:5555/simple/typing-inspection/typing_inspection-0.4.2-py3-none-any.whl.metadata (2.6 kB)
-Collecting prompt_toolkit<4.0,>=2.0 (from questionary->codecarbon)
-  Downloading http://pypi-mirror.modal.local:5555/simple/prompt-toolkit/prompt_toolkit-3.0.52-py3-none-any.whl.metadata (6.4 kB)
-Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.11/site-packages (from requests->codecarbon) (3.4.7)
-Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/site-packages (from requests->codecarbon) (3.15)
-Requirement already satisfied: urllib3<3,>=1.26 in /usr/local/lib/python3.11/site-packages (from requests->codecarbon) (2.7.0)
-Requirement already satisfied: certifi>=2023.5.7 in /usr/local/lib/python3.11/site-packages (from requests->codecarbon) (2026.4.22)
-Collecting markdown-it-py>=2.2.0 (from rich->codecarbon)
-  Downloading http://pypi-mirror.modal.local:5555/simple/markdown-it-py/markdown_it_py-4.2.0-py3-none-any.whl.metadata (7.4 kB)
-Collecting pygments<3.0.0,>=2.13.0 (from rich->codecarbon)
-  Downloading http://pypi-mirror.modal.local:5555/simple/pygments/pygments-2.20.0-py3-none-any.whl.metadata (2.5 kB)
-Collecting shellingham>=1.3.0 (from typer->codecarbon)
-  Downloading http://pypi-mirror.modal.local:5555/simple/shellingham/shellingham-1.5.4-py2.py3-none-any.whl.metadata (3.5 kB)
-Collecting annotated-doc>=0.0.2 (from typer->codecarbon)
-  Downloading http://pypi-mirror.modal.local:5555/simple/annotated-doc/annotated_doc-0.0.4-py3-none-any.whl.metadata (6.6 kB)
-Collecting cffi>=2.0.0 (from cryptography->authlib>=1.2.1->codecarbon)
-  Downloading http://pypi-mirror.modal.local:5555/simple/cffi/cffi-2.0.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.6 kB)
-Collecting mdurl~=0.1 (from markdown-it-py>=2.2.0->rich->codecarbon)
-  Downloading http://pypi-mirror.modal.local:5555/simple/mdurl/mdurl-0.1.2-py3-none-any.whl.metadata (1.6 kB)
-Collecting wcwidth (from prompt_toolkit<4.0,>=2.0->questionary->codecarbon)
-  Downloading http://pypi-mirror.modal.local:5555/simple/wcwidth/wcwidth-0.7.0-py3-none-any.whl.metadata (36 kB)
-Collecting six>=1.5 (from python-dateutil>=2.7.0->arrow->codecarbon)
-  Downloading http://pypi-mirror.modal.local:5555/simple/six/six-1.17.0-py2.py3-none-any.whl.metadata (1.7 kB)
-Collecting pycparser (from cffi>=2.0.0->cryptography->authlib>=1.2.1->codecarbon)
-  Downloading http://pypi-mirror.modal.local:5555/simple/pycparser/pycparser-3.0-py3-none-any.whl.metadata (8.2 kB)
-Downloading http://pypi-mirror.modal.local:5555/simple/codecarbon/codecarbon-3.2.7-py3-none-any.whl (380 kB)
-   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 380.5/380.5 kB 109.2 MB/s eta 0:00:00
-Downloading http://pypi-mirror.modal.local:5555/simple/authlib/authlib-1.7.2-py2.py3-none-any.whl (259 kB)
-   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 259.5/259.5 kB 276.8 MB/s eta 0:00:00
-Downloading http://pypi-mirror.modal.local:5555/simple/psutil/psutil-7.2.2-cp36-abi3-manylinux2010_x86_64.manylinux_2_12_x86_64.manylinux_2_28_x86_64.whl (155 kB)
-   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 155.6/155.6 kB 266.9 MB/s eta 0:00:00
-Downloading http://pypi-mirror.modal.local:5555/simple/arrow/arrow-1.4.0-py3-none-any.whl (68 kB)
-   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 68.8/68.8 kB 235.6 MB/s eta 0:00:00
-Downloading http://pypi-mirror.modal.local:5555/simple/click/click-8.4.0-py3-none-any.whl (116 kB)
-   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 116.1/116.1 kB 77.3 MB/s eta 0:00:00
-Downloading http://pypi-mirror.modal.local:5555/simple/pandas/pandas-3.0.3-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (11.3 MB)
-   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.3/11.3 MB 259.3 MB/s eta 0:00:00
-Downloading http://pypi-mirror.modal.local:5555/simple/prometheus-client/prometheus_client-0.25.0-py3-none-any.whl (64 kB)
-   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.2/64.2 kB 237.0 MB/s eta 0:00:00
-Downloading http://pypi-mirror.modal.local:5555/simple/py-cpuinfo/py_cpuinfo-9.0.0-py3-none-any.whl (22 kB)
-Downloading http://pypi-mirror.modal.local:5555/simple/pycountry/pycountry-26.2.16-py3-none-any.whl (8.0 MB)
-   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.0/8.0 MB 251.2 MB/s eta 0:00:00
-Downloading http://pypi-mirror.modal.local:5555/simple/pydantic/pydantic-2.13.4-py3-none-any.whl (472 kB)
-   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 472.3/472.3 kB 283.6 MB/s eta 0:00:00
-Downloading http://pypi-mirror.modal.local:5555/simple/pydantic-core/pydantic_core-2.46.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB)
-   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 229.6 MB/s eta 0:00:00
-Downloading http://pypi-mirror.modal.local:5555/simple/questionary/questionary-2.1.1-py3-none-any.whl (36 kB)
-Downloading http://pypi-mirror.modal.local:5555/simple/rapidfuzz/rapidfuzz-3.14.5-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (3.2 MB)
-   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.2/3.2 MB 249.4 MB/s eta 0:00:00
-Downloading http://pypi-mirror.modal.local:5555/simple/rich/rich-15.0.0-py3-none-any.whl (310 kB)
-   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 310.7/310.7 kB 254.2 MB/s eta 0:00:00
-Downloading http://pypi-mirror.modal.local:5555/simple/typer/typer-0.25.1-py3-none-any.whl (58 kB)
-   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.4/58.4 kB 235.9 MB/s eta 0:00:00
-Downloading http://pypi-mirror.modal.local:5555/simple/annotated-doc/annotated_doc-0.0.4-py3-none-any.whl (5.3 kB)
-Downloading http://pypi-mirror.modal.local:5555/simple/annotated-types/annotated_types-0.7.0-py3-none-any.whl (13 kB)
-Downloading http://pypi-mirror.modal.local:5555/simple/joserfc/joserfc-1.6.5-py3-none-any.whl (70 kB)
-   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 70.5/70.5 kB 242.0 MB/s eta 0:00:00
-Downloading http://pypi-mirror.modal.local:5555/simple/cryptography/cryptography-48.0.0-cp311-abi3-manylinux_2_34_x86_64.whl (4.7 MB)
-   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.7/4.7 MB 234.6 MB/s eta 0:00:00
-Downloading http://pypi-mirror.modal.local:5555/simple/markdown-it-py/markdown_it_py-4.2.0-py3-none-any.whl (91 kB)
-   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 91.7/91.7 kB 260.7 MB/s eta 0:00:00
-Downloading http://pypi-mirror.modal.local:5555/simple/prompt-toolkit/prompt_toolkit-3.0.52-py3-none-any.whl (391 kB)
-   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 391.4/391.4 kB 168.2 MB/s eta 0:00:00
-Downloading http://pypi-mirror.modal.local:5555/simple/pygments/pygments-2.20.0-py3-none-any.whl (1.2 MB)
-   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 259.4 MB/s eta 0:00:00
-Downloading http://pypi-mirror.modal.local:5555/simple/python-dateutil/python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
-   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 229.9/229.9 kB 265.0 MB/s eta 0:00:00
-Downloading http://pypi-mirror.modal.local:5555/simple/shellingham/shellingham-1.5.4-py2.py3-none-any.whl (9.8 kB)
-Downloading http://pypi-mirror.modal.local:5555/simple/typing-inspection/typing_inspection-0.4.2-py3-none-any.whl (14 kB)
-Downloading http://pypi-mirror.modal.local:5555/simple/tzdata/tzdata-2026.2-py2.py3-none-any.whl (349 kB)
-   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 349.3/349.3 kB 256.1 MB/s eta 0:00:00
-Downloading http://pypi-mirror.modal.local:5555/simple/cffi/cffi-2.0.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (215 kB)
-   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 215.6/215.6 kB 271.1 MB/s eta 0:00:00
-Downloading http://pypi-mirror.modal.local:5555/simple/mdurl/mdurl-0.1.2-py3-none-any.whl (10.0 kB)
-Downloading http://pypi-mirror.modal.local:5555/simple/six/six-1.17.0-py2.py3-none-any.whl (11 kB)
-Downloading http://pypi-mirror.modal.local:5555/simple/wcwidth/wcwidth-0.7.0-py3-none-any.whl (110 kB)
-   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 110.8/110.8 kB 252.0 MB/s eta 0:00:00
-Downloading http://pypi-mirror.modal.local:5555/simple/pycparser/pycparser-3.0-py3-none-any.whl (48 kB)
-   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 48.2/48.2 kB 235.8 MB/s eta 0:00:00
-Installing collected packages: py-cpuinfo, wcwidth, tzdata, typing-inspection, six, shellingham, rapidfuzz, pygments, pydantic-core, pycparser, pycountry, psutil, prometheus_client, mdurl, click, annotated-types, annotated-doc, python-dateutil, pydantic, prompt_toolkit, markdown-it-py, cffi, rich, questionary, pandas, cryptography, arrow, typer, joserfc, authlib, codecarbon
-Successfully installed annotated-doc-0.0.4 annotated-types-0.7.0 arrow-1.4.0 authlib-1.7.2 cffi-2.0.0 click-8.4.0 codecarbon-3.2.7 cryptography-48.0.0 joserfc-1.6.5 markdown-it-py-4.2.0 mdurl-0.1.2 pandas-3.0.3 prometheus_client-0.25.0 prompt_toolkit-3.0.52 psutil-7.2.2 py-cpuinfo-9.0.0 pycountry-26.2.16 pycparser-3.0 pydantic-2.13.4 pydantic-core-2.46.4 pygments-2.20.0 python-dateutil-2.9.0.post0 questionary-2.1.1 rapidfuzz-3.14.5 rich-15.0.0 shellingham-1.5.4 six-1.17.0 typer-0.25.1 typing-inspection-0.4.2 tzdata-2026.2 wcwidth-0.7.0
-
-[notice] A new release of pip is available: 24.0 -> 26.1.1
-[notice] To update, run: pip install --upgrade pip
-Saving image...
-Image saved, took 1.14s
-
-Built image im-HqRgnUnflxE8oQRhywMp4D in 14.22s
-
-
-Building image im-BnlecuknJA8QM6WpMCGVmT
-
-=> Step 0: FROM base
-
-=> Step 1: ENV PYTHONPATH=/workspace
-
-=> Step 2: ENV PYTHONUNBUFFERED=1
-Saving image...
-Image saved, took 602.25ms
-
-Built image im-BnlecuknJA8QM6WpMCGVmT in 3.23s
-
-
+https://modal.com/apps/gabriel-nakajima-an/main/ap-vsSxbVNSQV79ZMlqcuvDGB
 ✓ Created objects.
 ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/submit.py
 ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/task.py
@@ -168,14 +12,14 @@ Built image im-BnlecuknJA8QM6WpMCGVmT in 3.23s
 [modal] verifying NVML energy counter ...
 GPU: NVIDIA A100 80GB PCIe
 sampling idle power for 3s ...
-  idle: 52.7 W
+  idle: 55.0 W
 running 30s stress workload ...
-  duration:       36.7 s
-  energy delta:   8,335.0 J
-  avg power:      227.0 W
+  duration:       36.6 s
+  energy delta:   8,414.0 J
+  avg power:      230.1 W
   monotonic:      True
 ---
-{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 52.723183333333296, "stress_watts_avg": 226.95174796885868, "stress_energy_joules": 8335.027, "stress_duration_s": 36.72598724, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": []}
+{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 55.01793220338981, "stress_watts_avg": 230.13301547418834, "stress_energy_joules": 8413.972, "stress_duration_s": 36.561342503000006, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": []}
 [modal] running submission (TEST_CHARS=60000 MAX_TRAIN_SECONDS=300.0 ACC_MIN=0.7) ...
 loading WikiText-103 from /data ...
   train chars: 540,095,682
@@ -183,13 +27,13 @@ loading WikiText-103 from /data ...
 train wall-clock cap: 300 s
 val accuracy floor : 0.7000
 training submission /workspace/gpu_ngram_w31_k11.py ...
-[codecarbon WARNING @ 07:06:46] Multiple instances of codecarbon are allowed to run at the same time.
+[codecarbon WARNING @ 05:04:00] Multiple instances of codecarbon are allowed to run at the same time.
 [gpu_ngram_w3] starting build; max_order=11 D=0.5
 [gpu_ngram_w3] encoded train: 541,096,898 bytes (0.4s)
 [gpu_ngram_w3] top order=11 unique pairs: 119,285,712  2.0s
-[gpu_ngram_w3] ctx_len=10 ctxs=84,282,364 rows=119,285,712  12.8s
-[gpu_ngram_w3] ctx_len=9 ctxs=54,720,376 rows=84,282,364  8.3s
-[gpu_ngram_w3] ctx_len=8 ctxs=31,924,091 rows=54,720,376  4.9s
+[gpu_ngram_w3] ctx_len=10 ctxs=84,282,364 rows=119,285,712  13.2s
+[gpu_ngram_w3] ctx_len=9 ctxs=54,720,376 rows=84,282,364  9.2s
+[gpu_ngram_w3] ctx_len=8 ctxs=31,924,091 rows=54,720,376  5.0s
 [gpu_ngram_w3] ctx_len=7 ctxs=16,284,921 rows=31,924,091  2.6s
 [gpu_ngram_w3] ctx_len=6 ctxs=7,016,442 rows=16,284,921  1.2s
 [gpu_ngram_w3] ctx_len=5 ctxs=2,438,281 rows=7,016,442  0.5s
@@ -198,90 +42,90 @@ training submission /workspace/gpu_ngram_w31_k11.py ...
 [gpu_ngram_w3] ctx_len=2 ctxs=12,282 rows=122,882  0.0s
 [gpu_ngram_w3] ctx_len=1 ctxs=204 rows=12,282  0.0s
 [gpu_ngram_w3] ctx_len=0 ctxs=1 rows=204  0.0s
-[gpu_ngram_w3] total build: 32.8s
-training: 1,332.8 J   duration=33.6s
+[gpu_ngram_w3] total build: 34.2s
+training: 1,612.2 J   duration=34.9s
 evaluating on val split ...
-  eval      1,200/60,000 (  2.0%)  acc=0.6967     5689 char/s  eta=    10s
-  eval      2,400/60,000 (  4.0%)  acc=0.6792     5867 char/s  eta=    10s
-  eval      3,600/60,000 (  6.0%)  acc=0.6767     5955 char/s  eta=     9s
-  eval      4,800/60,000 (  8.0%)  acc=0.6894     5997 char/s  eta=     9s
-  eval      6,000/60,000 ( 10.0%)  acc=0.6917     6031 char/s  eta=     9s
-  eval      7,200/60,000 ( 12.0%)  acc=0.6846     6057 char/s  eta=     9s
-  eval      8,400/60,000 ( 14.0%)  acc=0.6844     6077 char/s  eta=     8s
-  eval      9,600/60,000 ( 16.0%)  acc=0.6914     6081 char/s  eta=     8s
-  eval     10,800/60,000 ( 18.0%)  acc=0.7002     6079 char/s  eta=     8s
-  eval     12,000/60,000 ( 20.0%)  acc=0.7020     6085 char/s  eta=     8s
-  eval     13,200/60,000 ( 22.0%)  acc=0.7056     6085 char/s  eta=     8s
-  eval     14,400/60,000 ( 24.0%)  acc=0.7074     6091 char/s  eta=     7s
-  eval     15,600/60,000 ( 26.0%)  acc=0.7091     6094 char/s  eta=     7s
-  eval     16,800/60,000 ( 28.0%)  acc=0.7121     6099 char/s  eta=     7s
-  eval     18,000/60,000 ( 30.0%)  acc=0.7139     6102 char/s  eta=     7s
-  eval     19,200/60,000 ( 32.0%)  acc=0.7176     6101 char/s  eta=     7s
-  eval     20,400/60,000 ( 34.0%)  acc=0.7186     6101 char/s  eta=     6s
-  eval     21,600/60,000 ( 36.0%)  acc=0.7197     6105 char/s  eta=     6s
-  eval     22,800/60,000 ( 38.0%)  acc=0.7198     6105 char/s  eta=     6s
-  eval     24,000/60,000 ( 40.0%)  acc=0.7198     6108 char/s  eta=     6s
-  eval     25,200/60,000 ( 42.0%)  acc=0.7202     6109 char/s  eta=     6s
-  eval     26,400/60,000 ( 44.0%)  acc=0.7210     6111 char/s  eta=     5s
-  eval     27,600/60,000 ( 46.0%)  acc=0.7189     6114 char/s  eta=     5s
-  eval     28,800/60,000 ( 48.0%)  acc=0.7189     6120 char/s  eta=     5s
-  eval     30,000/60,000 ( 50.0%)  acc=0.7174     6125 char/s  eta=     5s
-  eval     31,200/60,000 ( 52.0%)  acc=0.7144     6131 char/s  eta=     5s
-  eval     32,400/60,000 ( 54.0%)  acc=0.7120     6138 char/s  eta=     4s
-  eval     33,600/60,000 ( 56.0%)  acc=0.7096     6144 char/s  eta=     4s
-  eval     34,800/60,000 ( 58.0%)  acc=0.7098     6146 char/s  eta=     4s
-  eval     36,000/60,000 ( 60.0%)  acc=0.7096     6146 char/s  eta=     4s
-  eval     37,200/60,000 ( 62.0%)  acc=0.7095     6146 char/s  eta=     4s
-  eval     38,400/60,000 ( 64.0%)  acc=0.7096     6145 char/s  eta=     4s
-  eval     39,600/60,000 ( 66.0%)  acc=0.7086     6147 char/s  eta=     3s
-  eval     40,800/60,000 ( 68.0%)  acc=0.7083     6148 char/s  eta=     3s
-  eval     42,000/60,000 ( 70.0%)  acc=0.7075     6148 char/s  eta=     3s
-  eval     43,200/60,000 ( 72.0%)  acc=0.7068     6148 char/s  eta=     3s
-  eval     44,400/60,000 ( 74.0%)  acc=0.7067     6148 char/s  eta=     3s
-  eval     45,600/60,000 ( 76.0%)  acc=0.7068     6148 char/s  eta=     2s
-  eval     46,800/60,000 ( 78.0%)  acc=0.7061     6148 char/s  eta=     2s
-  eval     48,000/60,000 ( 80.0%)  acc=0.7062     6148 char/s  eta=     2s
-  eval     49,200/60,000 ( 82.0%)  acc=0.7055     6149 char/s  eta=     2s
-  eval     50,400/60,000 ( 84.0%)  acc=0.7058     6149 char/s  eta=     2s
-  eval     51,600/60,000 ( 86.0%)  acc=0.7058     6150 char/s  eta=     1s
-  eval     52,800/60,000 ( 88.0%)  acc=0.7046     6157 char/s  eta=     1s
-  eval     54,000/60,000 ( 90.0%)  acc=0.7045     6157 char/s  eta=     1s
-  eval     55,200/60,000 ( 92.0%)  acc=0.7038     6159 char/s  eta=     1s
-  eval     56,400/60,000 ( 94.0%)  acc=0.7029     6160 char/s  eta=     1s
-  eval     57,600/60,000 ( 96.0%)  acc=0.7034     6160 char/s  eta=     0s
-  eval     58,800/60,000 ( 98.0%)  acc=0.7040     6160 char/s  eta=     0s
-  eval     60,000/60,000 (100.0%)  acc=0.7050     6161 char/s  eta=     0s
-chars=60,000  acc=0.7050  eval_duration=9.7s
+  eval      1,200/60,000 (  2.0%)  acc=0.6967     5720 char/s  eta=    10s
+  eval      2,400/60,000 (  4.0%)  acc=0.6792     5938 char/s  eta=    10s
+  eval      3,600/60,000 (  6.0%)  acc=0.6767     6036 char/s  eta=     9s
+  eval      4,800/60,000 (  8.0%)  acc=0.6894     6078 char/s  eta=     9s
+  eval      6,000/60,000 ( 10.0%)  acc=0.6917     6119 char/s  eta=     9s
+  eval      7,200/60,000 ( 12.0%)  acc=0.6846     6153 char/s  eta=     9s
+  eval      8,400/60,000 ( 14.0%)  acc=0.6844     6179 char/s  eta=     8s
+  eval      9,600/60,000 ( 16.0%)  acc=0.6914     6187 char/s  eta=     8s
+  eval     10,800/60,000 ( 18.0%)  acc=0.7002     6194 char/s  eta=     8s
+  eval     12,000/60,000 ( 20.0%)  acc=0.7020     6199 char/s  eta=     8s
+  eval     13,200/60,000 ( 22.0%)  acc=0.7056     6200 char/s  eta=     8s
+  eval     14,400/60,000 ( 24.0%)  acc=0.7074     6204 char/s  eta=     7s
+  eval     15,600/60,000 ( 26.0%)  acc=0.7091     6204 char/s  eta=     7s
+  eval     16,800/60,000 ( 28.0%)  acc=0.7121     6206 char/s  eta=     7s
+  eval     18,000/60,000 ( 30.0%)  acc=0.7139     6207 char/s  eta=     7s
+  eval     19,200/60,000 ( 32.0%)  acc=0.7176     6206 char/s  eta=     7s
+  eval     20,400/60,000 ( 34.0%)  acc=0.7186     6207 char/s  eta=     6s
+  eval     21,600/60,000 ( 36.0%)  acc=0.7197     6212 char/s  eta=     6s
+  eval     22,800/60,000 ( 38.0%)  acc=0.7198     6213 char/s  eta=     6s
+  eval     24,000/60,000 ( 40.0%)  acc=0.7198     6216 char/s  eta=     6s
+  eval     25,200/60,000 ( 42.0%)  acc=0.7202     6218 char/s  eta=     6s
+  eval     26,400/60,000 ( 44.0%)  acc=0.7210     6220 char/s  eta=     5s
+  eval     27,600/60,000 ( 46.0%)  acc=0.7189     6222 char/s  eta=     5s
+  eval     28,800/60,000 ( 48.0%)  acc=0.7189     6228 char/s  eta=     5s
+  eval     30,000/60,000 ( 50.0%)  acc=0.7174     6230 char/s  eta=     5s
+  eval     31,200/60,000 ( 52.0%)  acc=0.7144     6235 char/s  eta=     5s
+  eval     32,400/60,000 ( 54.0%)  acc=0.7120     6239 char/s  eta=     4s
+  eval     33,600/60,000 ( 56.0%)  acc=0.7096     6243 char/s  eta=     4s
+  eval     34,800/60,000 ( 58.0%)  acc=0.7098     6244 char/s  eta=     4s
+  eval     36,000/60,000 ( 60.0%)  acc=0.7096     6244 char/s  eta=     4s
+  eval     37,200/60,000 ( 62.0%)  acc=0.7095     6244 char/s  eta=     4s
+  eval     38,400/60,000 ( 64.0%)  acc=0.7096     6242 char/s  eta=     3s
+  eval     39,600/60,000 ( 66.0%)  acc=0.7086     6243 char/s  eta=     3s
+  eval     40,800/60,000 ( 68.0%)  acc=0.7083     6243 char/s  eta=     3s
+  eval     42,000/60,000 ( 70.0%)  acc=0.7075     6242 char/s  eta=     3s
+  eval     43,200/60,000 ( 72.0%)  acc=0.7068     6242 char/s  eta=     3s
+  eval     44,400/60,000 ( 74.0%)  acc=0.7067     6240 char/s  eta=     2s
+  eval     45,600/60,000 ( 76.0%)  acc=0.7068     6240 char/s  eta=     2s
+  eval     46,800/60,000 ( 78.0%)  acc=0.7061     6239 char/s  eta=     2s
+  eval     48,000/60,000 ( 80.0%)  acc=0.7062     6237 char/s  eta=     2s
+  eval     49,200/60,000 ( 82.0%)  acc=0.7055     6236 char/s  eta=     2s
+  eval     50,400/60,000 ( 84.0%)  acc=0.7058     6236 char/s  eta=     2s
+  eval     51,600/60,000 ( 86.0%)  acc=0.7058     6237 char/s  eta=     1s
+  eval     52,800/60,000 ( 88.0%)  acc=0.7046     6243 char/s  eta=     1s
+  eval     54,000/60,000 ( 90.0%)  acc=0.7045     6244 char/s  eta=     1s
+  eval     55,200/60,000 ( 92.0%)  acc=0.7038     6245 char/s  eta=     1s
+  eval     56,400/60,000 ( 94.0%)  acc=0.7029     6246 char/s  eta=     1s
+  eval     57,600/60,000 ( 96.0%)  acc=0.7034     6246 char/s  eta=     0s
+  eval     58,800/60,000 ( 98.0%)  acc=0.7040     6247 char/s  eta=     0s
+  eval     60,000/60,000 (100.0%)  acc=0.7050     6247 char/s  eta=     0s
+chars=60,000  acc=0.7050  eval_duration=9.6s
 ---
 submission         : gpu_ngram_w31_k11
-training energy (J): 1,332.8
-training duration  : 33.6s
+training energy (J): 1,612.2
+training duration  : 34.9s
 val  char-accuracy : 0.7050
 val  chars         : 60,000
 wrote /tmp/result.json
 Stopping app - local entrypoint completed.
 ✓ App completed. View run at 
-https://modal.com/apps/gabriel-nakajima-an/main/ap-Xr2U1qCw3wvtCqAVizWeyd
+https://modal.com/apps/gabriel-nakajima-an/main/ap-vsSxbVNSQV79ZMlqcuvDGB
 
 # final result
 {
   "submission": "gpu_ngram_w31_k11",
-  "training_energy_J": 1332.8045820499997,
-  "training_duration_s": 33.551668359000004,
-  "cpu_energy_J": 1420.9300898524978,
-  "total_energy_J": 2753.734671902497,
+  "training_energy_J": 1612.2052069500003,
+  "training_duration_s": 34.944975860999996,
+  "cpu_energy_J": 1479.7634497275012,
+  "total_energy_J": 3091.9686566775017,
   "val_char_accuracy": 0.7050333333333333,
   "val_chars": 60000,
   "gpu_name": "NVIDIA A100 80GB PCIe",
-  "date_utc": "2026-05-20T07:07:33Z",
+  "date_utc": "2026-05-21T05:04:49Z",
   "_nvml": {
     "nvml_available": true,
     "energy_counter_supported": true,
     "monotonic": true,
-    "idle_watts": 52.723183333333296,
-    "stress_watts_avg": 226.95174796885868,
-    "stress_energy_joules": 8335.027,
-    "stress_duration_s": 36.72598724,
+    "idle_watts": 55.01793220338981,
+    "stress_watts_avg": 230.13301547418834,
+    "stress_energy_joules": 8413.972,
+    "stress_duration_s": 36.561342503000006,
     "gpu_name": "NVIDIA A100 80GB PCIe",
     "notes": []
   },
diff --git a/submissions/lwta_k2/nvml.json b/submissions/lwta_k2/nvml.json
index d2be759..c341a02 100644
--- a/submissions/lwta_k2/nvml.json
+++ b/submissions/lwta_k2/nvml.json
@@ -2,10 +2,10 @@
   "nvml_available": true,
   "energy_counter_supported": true,
   "monotonic": true,
-  "idle_watts": 53.31471666666669,
-  "stress_watts_avg": 235.04048546725176,
-  "stress_energy_joules": 8621.146,
-  "stress_duration_s": 36.679408583,
+  "idle_watts": 52.363366666666685,
+  "stress_watts_avg": 227.75408348564008,
+  "stress_energy_joules": 8575.362,
+  "stress_duration_s": 37.651847417,
   "gpu_name": "NVIDIA A100 80GB PCIe",
   "notes": []
 }
diff --git a/submissions/lwta_k2/result.json b/submissions/lwta_k2/result.json
index e675e37..6a5ae25 100644
--- a/submissions/lwta_k2/result.json
+++ b/submissions/lwta_k2/result.json
@@ -1,19 +1,21 @@
 {
   "submission": "lwta_k2",
-  "training_energy_J": 46131.8433434,
-  "training_duration_s": 222.46477313199998,
-  "val_char_accuracy": 0.7145833333333333,
+  "training_energy_J": 44582.984815150005,
+  "training_duration_s": 237.14902369700002,
+  "cpu_energy_J": 10030.720041672495,
+  "total_energy_J": 54613.7048568225,
+  "val_char_accuracy": 0.7145333333333334,
   "val_chars": 60000,
   "gpu_name": "NVIDIA A100 80GB PCIe",
-  "date_utc": "2026-05-18T18:04:06Z",
+  "date_utc": "2026-05-20T22:57:38Z",
   "_nvml": {
     "nvml_available": true,
     "energy_counter_supported": true,
     "monotonic": true,
-    "idle_watts": 53.31471666666669,
-    "stress_watts_avg": 235.04048546725176,
-    "stress_energy_joules": 8621.146,
-    "stress_duration_s": 36.679408583,
+    "idle_watts": 52.363366666666685,
+    "stress_watts_avg": 227.75408348564008,
+    "stress_energy_joules": 8575.362,
+    "stress_duration_s": 37.651847417,
     "gpu_name": "NVIDIA A100 80GB PCIe",
     "notes": []
   },
diff --git a/submissions/lwta_k2/run.log b/submissions/lwta_k2/run.log
index f7bfcac..d4788c8 100644
--- a/submissions/lwta_k2/run.log
+++ b/submissions/lwta_k2/run.log
@@ -1,141 +1,140 @@
-# wikitext submit.py log — lwta_k2 — 2026-05-18T17:52:37+00:00Z
+# wikitext submit.py log — lwta_k2 — 2026-05-20T22:44:26+00:00Z
 [modal] launching A100-80GB ...
 ✓ Initialized. View run at 
-https://modal.com/apps/ab-10/main/ap-XpG4oyoioa8EfEnrW23Vzh
+https://modal.com/apps/gabriel-nakajima-an/main/ap-cyrfdrD3yrTAPYiz98tDZZ
 ✓ Created objects.
-├── 🔨 Created mount /home/seneca/wikitext/submit.py
-├── 🔨 Created mount /home/seneca/wikitext/task.py
-├── 🔨 Created mount /home/seneca/wikitext/verify_nvml.py
-├── 🔨 Created mount /home/seneca/wikitext/run_eval.py
-├── 🔨 Created mount /home/seneca/wikitext/wikitext.py
+├── 🔨 Created mount /Users/naka/src/sutro/wikitext/submit.py
+├── 🔨 Created mount /Users/naka/src/sutro/wikitext/task.py
+├── 🔨 Created mount /Users/naka/src/sutro/wikitext/verify_nvml.py
+├── 🔨 Created mount /Users/naka/src/sutro/wikitext/run_eval.py
+├── 🔨 Created mount /Users/naka/src/sutro/wikitext/wikitext.py
 └── 🔨 Created function run_submission.
 [modal] verifying NVML energy counter ...
 GPU: NVIDIA A100 80GB PCIe
 sampling idle power for 3s ...
-  idle: 53.3 W
+  idle: 52.4 W
 running 30s stress workload ...
-/usr/local/lib/python3.11/site-packages/torch/_subclasses/functional_tensor.py:295: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
-  cpu = _conversion_method_template(device=torch.device("cpu"))
-  duration:       36.7 s
-  energy delta:   8,621.1 J
-  avg power:      235.0 W
+  duration:       37.7 s
+  energy delta:   8,575.4 J
+  avg power:      227.8 W
   monotonic:      True
 ---
-{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 53.31471666666669, "stress_watts_avg": 235.04048546725176, "stress_energy_joules": 8621.146, "stress_duration_s": 36.679408583, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": []}
+{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 52.363366666666685, "stress_watts_avg": 227.75408348564008, "stress_energy_joules": 8575.362, "stress_duration_s": 37.651847417, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": []}
 [modal] running submission (TEST_CHARS=60000 MAX_TRAIN_SECONDS=300.0 ACC_MIN=0.7) ...
 loading WikiText-103 from /data ...
   train chars: 540,095,682
   val   chars: 60,000  (scored, gated by --acc-min)
 train wall-clock cap: 300 s
 val accuracy floor : 0.7000
-/usr/local/lib/python3.11/site-packages/torch/_subclasses/functional_tensor.py:295: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
-  cpu = _conversion_method_template(device=torch.device("cpu"))
 training submission /workspace/lwta_k2.py ...
+[codecarbon WARNING @ 22:45:17] Multiple instances of codecarbon are allowed to run at the same time.
 [lwta_k2] 10.84M params  cfg=TrainConfig(d=384 L=6 H=6 bs=32 T=1024 steps=2150 lwta_k=2)
 [lwta_k2] step     0/2150  loss 5.5452  elapsed 1s
-[lwta_k2] step   100/2150  loss 1.6783  elapsed 11s
-[lwta_k2] step   200/2150  loss 1.4527  elapsed 21s
-[lwta_k2] step   300/2150  loss 1.4018  elapsed 31s
-[lwta_k2] step   400/2150  loss 1.2794  elapsed 41s
-[lwta_k2] step   500/2150  loss 1.2397  elapsed 51s
-[lwta_k2] step   600/2150  loss 1.1819  elapsed 61s
-[lwta_k2] step   700/2150  loss 1.1676  elapsed 71s
-[lwta_k2] step   800/2150  loss 1.1465  elapsed 82s
-[lwta_k2] step   900/2150  loss 1.1261  elapsed 92s
-[lwta_k2] step  1000/2150  loss 1.1568  elapsed 102s
-[lwta_k2] step  1100/2150  loss 1.0974  elapsed 112s
-[lwta_k2] step  1200/2150  loss 1.1161  elapsed 123s
-[lwta_k2] step  1300/2150  loss 1.0829  elapsed 133s
-[lwta_k2] step  1400/2150  loss 1.0580  elapsed 143s
-[lwta_k2] step  1500/2150  loss 1.0709  elapsed 154s
-[lwta_k2] step  1600/2150  loss 1.0651  elapsed 164s
-[lwta_k2] step  1700/2150  loss 1.0906  elapsed 174s
-[lwta_k2] step  1800/2150  loss 1.0662  elapsed 184s
-[lwta_k2] step  1900/2150  loss 1.0359  elapsed 195s
-[lwta_k2] step  2000/2150  loss 0.9915  elapsed 205s
-[lwta_k2] step  2100/2150  loss 1.0221  elapsed 215s
-[lwta_k2] step  2149/2150  loss 1.0155  elapsed 220s
-training: 46,131.8 J   duration=222.5s
+[lwta_k2] step   100/2150  loss 1.6789  elapsed 12s
+[lwta_k2] step   200/2150  loss 1.4378  elapsed 23s
+[lwta_k2] step   300/2150  loss 1.3555  elapsed 34s
+[lwta_k2] step   400/2150  loss 1.2876  elapsed 45s
+[lwta_k2] step   500/2150  loss 1.2187  elapsed 55s
+[lwta_k2] step   600/2150  loss 1.1854  elapsed 66s
+[lwta_k2] step   700/2150  loss 1.1995  elapsed 77s
+[lwta_k2] step   800/2150  loss 1.1438  elapsed 88s
+[lwta_k2] step   900/2150  loss 1.1942  elapsed 99s
+[lwta_k2] step  1000/2150  loss 1.1058  elapsed 110s
+[lwta_k2] step  1100/2150  loss 1.1240  elapsed 120s
+[lwta_k2] step  1200/2150  loss 1.0758  elapsed 131s
+[lwta_k2] step  1300/2150  loss 1.0643  elapsed 142s
+[lwta_k2] step  1400/2150  loss 1.0712  elapsed 153s
+[lwta_k2] step  1500/2150  loss 1.0610  elapsed 164s
+[lwta_k2] step  1600/2150  loss 1.0715  elapsed 175s
+[lwta_k2] step  1700/2150  loss 0.9912  elapsed 185s
+[lwta_k2] step  1800/2150  loss 1.0097  elapsed 196s
+[lwta_k2] step  1900/2150  loss 1.0175  elapsed 207s
+[lwta_k2] step  2000/2150  loss 1.0429  elapsed 218s
+[lwta_k2] step  2100/2150  loss 1.0159  elapsed 228s
+[lwta_k2] step  2149/2150  loss 1.0382  elapsed 234s
+training: 44,583.0 J   duration=237.1s
 evaluating on val split ...
-  eval      1,200/60,000 (  2.0%)  acc=0.7125      142 char/s  eta=   413s
-  eval      2,400/60,000 (  4.0%)  acc=0.7004      142 char/s  eta=   405s
-  eval      3,600/60,000 (  6.0%)  acc=0.6981      143 char/s  eta=   395s
-  eval      4,800/60,000 (  8.0%)  acc=0.7075      143 char/s  eta=   385s
-  eval      6,000/60,000 ( 10.0%)  acc=0.6957      143 char/s  eta=   377s
-  eval      7,200/60,000 ( 12.0%)  acc=0.6915      143 char/s  eta=   368s
-  eval      8,400/60,000 ( 14.0%)  acc=0.6918      143 char/s  eta=   360s
-  eval      9,600/60,000 ( 16.0%)  acc=0.6957      143 char/s  eta=   352s
-  eval     10,800/60,000 ( 18.0%)  acc=0.6971      143 char/s  eta=   344s
-  eval     12,000/60,000 ( 20.0%)  acc=0.7002      143 char/s  eta=   335s
-  eval     13,200/60,000 ( 22.0%)  acc=0.7043      143 char/s  eta=   327s
-  eval     14,400/60,000 ( 24.0%)  acc=0.7053      143 char/s  eta=   319s
-  eval     15,600/60,000 ( 26.0%)  acc=0.7074      143 char/s  eta=   310s
-  eval     16,800/60,000 ( 28.0%)  acc=0.7101      143 char/s  eta=   302s
-  eval     18,000/60,000 ( 30.0%)  acc=0.7083      143 char/s  eta=   294s
-  eval     19,200/60,000 ( 32.0%)  acc=0.7095      143 char/s  eta=   285s
-  eval     20,400/60,000 ( 34.0%)  acc=0.7108      143 char/s  eta=   277s
-  eval     21,600/60,000 ( 36.0%)  acc=0.7105      143 char/s  eta=   268s
-  eval     22,800/60,000 ( 38.0%)  acc=0.7105      143 char/s  eta=   260s
-  eval     24,000/60,000 ( 40.0%)  acc=0.7109      143 char/s  eta=   251s
-  eval     25,200/60,000 ( 42.0%)  acc=0.7118      143 char/s  eta=   243s
-  eval     26,400/60,000 ( 44.0%)  acc=0.7130      143 char/s  eta=   235s
-  eval     27,600/60,000 ( 46.0%)  acc=0.7142      143 char/s  eta=   226s
-  eval     28,800/60,000 ( 48.0%)  acc=0.7142      143 char/s  eta=   218s
-  eval     30,000/60,000 ( 50.0%)  acc=0.7132      143 char/s  eta=   210s
-  eval     31,200/60,000 ( 52.0%)  acc=0.7117      143 char/s  eta=   201s
-  eval     32,400/60,000 ( 54.0%)  acc=0.7111      143 char/s  eta=   193s
-  eval     33,600/60,000 ( 56.0%)  acc=0.7092      143 char/s  eta=   184s
-  eval     34,800/60,000 ( 58.0%)  acc=0.7080      143 char/s  eta=   176s
-  eval     36,000/60,000 ( 60.0%)  acc=0.7071      143 char/s  eta=   168s
-  eval     37,200/60,000 ( 62.0%)  acc=0.7074      143 char/s  eta=   159s
-  eval     38,400/60,000 ( 64.0%)  acc=0.7079      143 char/s  eta=   151s
-  eval     39,600/60,000 ( 66.0%)  acc=0.7082      143 char/s  eta=   143s
-  eval     40,800/60,000 ( 68.0%)  acc=0.7075      143 char/s  eta=   134s
-  eval     42,000/60,000 ( 70.0%)  acc=0.7078      143 char/s  eta=   126s
-  eval     43,200/60,000 ( 72.0%)  acc=0.7083      143 char/s  eta=   117s
-  eval     44,400/60,000 ( 74.0%)  acc=0.7087      143 char/s  eta=   109s
-  eval     45,600/60,000 ( 76.0%)  acc=0.7088      143 char/s  eta=   101s
-  eval     46,800/60,000 ( 78.0%)  acc=0.7085      143 char/s  eta=    92s
-  eval     48,000/60,000 ( 80.0%)  acc=0.7087      143 char/s  eta=    84s
-  eval     49,200/60,000 ( 82.0%)  acc=0.7092      143 char/s  eta=    75s
-  eval     50,400/60,000 ( 84.0%)  acc=0.7110      143 char/s  eta=    67s
-  eval     51,600/60,000 ( 86.0%)  acc=0.7117      143 char/s  eta=    59s
-  eval     52,800/60,000 ( 88.0%)  acc=0.7129      144 char/s  eta=    50s
-  eval     54,000/60,000 ( 90.0%)  acc=0.7133      144 char/s  eta=    42s
-  eval     55,200/60,000 ( 92.0%)  acc=0.7128      144 char/s  eta=    33s
-  eval     56,400/60,000 ( 94.0%)  acc=0.7136      144 char/s  eta=    25s
-  eval     57,600/60,000 ( 96.0%)  acc=0.7140      144 char/s  eta=    17s
-  eval     58,800/60,000 ( 98.0%)  acc=0.7146      144 char/s  eta=     8s
-  eval     60,000/60,000 (100.0%)  acc=0.7146      144 char/s  eta=     0s
-chars=60,000  acc=0.7146  eval_duration=417.8s
+  eval      1,200/60,000 (  2.0%)  acc=0.7075      124 char/s  eta=   475s
+  eval      2,400/60,000 (  4.0%)  acc=0.6992      122 char/s  eta=   474s
+  eval      3,600/60,000 (  6.0%)  acc=0.6978      121 char/s  eta=   465s
+  eval      4,800/60,000 (  8.0%)  acc=0.7063      122 char/s  eta=   454s
+  eval      6,000/60,000 ( 10.0%)  acc=0.6957      122 char/s  eta=   443s
+  eval      7,200/60,000 ( 12.0%)  acc=0.6922      122 char/s  eta=   432s
+  eval      8,400/60,000 ( 14.0%)  acc=0.6908      122 char/s  eta=   423s
+  eval      9,600/60,000 ( 16.0%)  acc=0.6931      122 char/s  eta=   414s
+  eval     10,800/60,000 ( 18.0%)  acc=0.6941      121 char/s  eta=   406s
+  eval     12,000/60,000 ( 20.0%)  acc=0.6957      121 char/s  eta=   396s
+  eval     13,200/60,000 ( 22.0%)  acc=0.7012      121 char/s  eta=   386s
+  eval     14,400/60,000 ( 24.0%)  acc=0.7022      121 char/s  eta=   377s
+  eval     15,600/60,000 ( 26.0%)  acc=0.7049      121 char/s  eta=   366s
+  eval     16,800/60,000 ( 28.0%)  acc=0.7071      121 char/s  eta=   357s
+  eval     18,000/60,000 ( 30.0%)  acc=0.7055      121 char/s  eta=   348s
+  eval     19,200/60,000 ( 32.0%)  acc=0.7071      121 char/s  eta=   338s
+  eval     20,400/60,000 ( 34.0%)  acc=0.7088      120 char/s  eta=   329s
+  eval     21,600/60,000 ( 36.0%)  acc=0.7083      120 char/s  eta=   319s
+  eval     22,800/60,000 ( 38.0%)  acc=0.7089      120 char/s  eta=   310s
+  eval     24,000/60,000 ( 40.0%)  acc=0.7095      120 char/s  eta=   300s
+  eval     25,200/60,000 ( 42.0%)  acc=0.7110      120 char/s  eta=   290s
+  eval     26,400/60,000 ( 44.0%)  acc=0.7123      120 char/s  eta=   280s
+  eval     27,600/60,000 ( 46.0%)  acc=0.7135      120 char/s  eta=   270s
+  eval     28,800/60,000 ( 48.0%)  acc=0.7137      120 char/s  eta=   260s
+  eval     30,000/60,000 ( 50.0%)  acc=0.7130      120 char/s  eta=   250s
+  eval     31,200/60,000 ( 52.0%)  acc=0.7113      120 char/s  eta=   241s
+  eval     32,400/60,000 ( 54.0%)  acc=0.7112      120 char/s  eta=   231s
+  eval     33,600/60,000 ( 56.0%)  acc=0.7095      119 char/s  eta=   221s
+  eval     34,800/60,000 ( 58.0%)  acc=0.7082      119 char/s  eta=   211s
+  eval     36,000/60,000 ( 60.0%)  acc=0.7072      119 char/s  eta=   201s
+  eval     37,200/60,000 ( 62.0%)  acc=0.7073      119 char/s  eta=   191s
+  eval     38,400/60,000 ( 64.0%)  acc=0.7077      119 char/s  eta=   181s
+  eval     39,600/60,000 ( 66.0%)  acc=0.7079      119 char/s  eta=   171s
+  eval     40,800/60,000 ( 68.0%)  acc=0.7076      119 char/s  eta=   161s
+  eval     42,000/60,000 ( 70.0%)  acc=0.7074      119 char/s  eta=   151s
+  eval     43,200/60,000 ( 72.0%)  acc=0.7079      119 char/s  eta=   141s
+  eval     44,400/60,000 ( 74.0%)  acc=0.7084      119 char/s  eta=   131s
+  eval     45,600/60,000 ( 76.0%)  acc=0.7086      119 char/s  eta=   121s
+  eval     46,800/60,000 ( 78.0%)  acc=0.7082      119 char/s  eta=   111s
+  eval     48,000/60,000 ( 80.0%)  acc=0.7088      120 char/s  eta=   100s
+  eval     49,200/60,000 ( 82.0%)  acc=0.7096      120 char/s  eta=    90s
+  eval     50,400/60,000 ( 84.0%)  acc=0.7111      120 char/s  eta=    80s
+  eval     51,600/60,000 ( 86.0%)  acc=0.7117      120 char/s  eta=    70s
+  eval     52,800/60,000 ( 88.0%)  acc=0.7130      120 char/s  eta=    60s
+  eval     54,000/60,000 ( 90.0%)  acc=0.7134      120 char/s  eta=    50s
+  eval     55,200/60,000 ( 92.0%)  acc=0.7130      120 char/s  eta=    40s
+  eval     56,400/60,000 ( 94.0%)  acc=0.7137      120 char/s  eta=    30s
+  eval     57,600/60,000 ( 96.0%)  acc=0.7141      120 char/s  eta=    20s
+  eval     58,800/60,000 ( 98.0%)  acc=0.7146      120 char/s  eta=    10s
+  eval     60,000/60,000 (100.0%)  acc=0.7145      120 char/s  eta=     0s
+chars=60,000  acc=0.7145  eval_duration=498.9s
 ---
 submission         : lwta_k2
-training energy (J): 46,131.8
-training duration  : 222.5s
-val  char-accuracy : 0.7146
+training energy (J): 44,583.0
+training duration  : 237.1s
+val  char-accuracy : 0.7145
 val  chars         : 60,000
 wrote /tmp/result.json
 Stopping app - local entrypoint completed.
 ✓ App completed. View run at 
-https://modal.com/apps/ab-10/main/ap-XpG4oyoioa8EfEnrW23Vzh
+https://modal.com/apps/gabriel-nakajima-an/main/ap-cyrfdrD3yrTAPYiz98tDZZ
 
 # final result
 {
   "submission": "lwta_k2",
-  "training_energy_J": 46131.8433434,
-  "training_duration_s": 222.46477313199998,
-  "val_char_accuracy": 0.7145833333333333,
+  "training_energy_J": 44582.984815150005,
+  "training_duration_s": 237.14902369700002,
+  "cpu_energy_J": 10030.720041672495,
+  "total_energy_J": 54613.7048568225,
+  "val_char_accuracy": 0.7145333333333334,
   "val_chars": 60000,
   "gpu_name": "NVIDIA A100 80GB PCIe",
-  "date_utc": "2026-05-18T18:04:06Z",
+  "date_utc": "2026-05-20T22:57:38Z",
   "_nvml": {
     "nvml_available": true,
     "energy_counter_supported": true,
     "monotonic": true,
-    "idle_watts": 53.31471666666669,
-    "stress_watts_avg": 235.04048546725176,
-    "stress_energy_joules": 8621.146,
-    "stress_duration_s": 36.679408583,
+    "idle_watts": 52.363366666666685,
+    "stress_watts_avg": 227.75408348564008,
+    "stress_energy_joules": 8575.362,
+    "stress_duration_s": 37.651847417,
     "gpu_name": "NVIDIA A100 80GB PCIe",
     "notes": []
   },
diff --git a/submissions/lwta_k4/nvml.json b/submissions/lwta_k4/nvml.json
index f273c88..c15557c 100644
--- a/submissions/lwta_k4/nvml.json
+++ b/submissions/lwta_k4/nvml.json
@@ -2,10 +2,10 @@
   "nvml_available": true,
   "energy_counter_supported": true,
   "monotonic": true,
-  "idle_watts": 52.98316666666664,
-  "stress_watts_avg": 227.19281000664682,
-  "stress_energy_joules": 8522.132,
-  "stress_duration_s": 37.510570866,
+  "idle_watts": 53.112000000000016,
+  "stress_watts_avg": 229.4340394814706,
+  "stress_energy_joules": 8404.0,
+  "stress_duration_s": 36.629263988,
   "gpu_name": "NVIDIA A100 80GB PCIe",
   "notes": []
 }
diff --git a/submissions/lwta_k4/result.json b/submissions/lwta_k4/result.json
index caffb8c..cc235c1 100644
--- a/submissions/lwta_k4/result.json
+++ b/submissions/lwta_k4/result.json
@@ -1,19 +1,21 @@
 {
   "submission": "lwta_k4",
-  "training_energy_J": 46222.22882105,
-  "training_duration_s": 236.750003579,
-  "val_char_accuracy": 0.72375,
+  "training_energy_J": 44328.7028316,
+  "training_duration_s": 221.249343368,
+  "cpu_energy_J": 9354.22259884751,
+  "total_energy_J": 53682.92543044751,
+  "val_char_accuracy": 0.72455,
   "val_chars": 60000,
   "gpu_name": "NVIDIA A100 80GB PCIe",
-  "date_utc": "2026-05-18T18:06:13Z",
+  "date_utc": "2026-05-20T22:55:41Z",
   "_nvml": {
     "nvml_available": true,
     "energy_counter_supported": true,
     "monotonic": true,
-    "idle_watts": 52.98316666666664,
-    "stress_watts_avg": 227.19281000664682,
-    "stress_energy_joules": 8522.132,
-    "stress_duration_s": 37.510570866,
+    "idle_watts": 53.112000000000016,
+    "stress_watts_avg": 229.4340394814706,
+    "stress_energy_joules": 8404.0,
+    "stress_duration_s": 36.629263988,
     "gpu_name": "NVIDIA A100 80GB PCIe",
     "notes": []
   },
diff --git a/submissions/lwta_k4/run.log b/submissions/lwta_k4/run.log
index 8e70573..801d461 100644
--- a/submissions/lwta_k4/run.log
+++ b/submissions/lwta_k4/run.log
@@ -1,141 +1,140 @@
-# wikitext submit.py log — lwta_k4 — 2026-05-18T17:52:38+00:00Z
+# wikitext submit.py log — lwta_k4 — 2026-05-20T22:44:26+00:00Z
 [modal] launching A100-80GB ...
 ✓ Initialized. View run at 
-https://modal.com/apps/ab-10/main/ap-rRKkJJfugmtJNujFsbVrP3
+https://modal.com/apps/gabriel-nakajima-an/main/ap-hyBCSy0XG5jwP218GI6bNc
 ✓ Created objects.
-├── 🔨 Created mount /home/seneca/wikitext/submit.py
-├── 🔨 Created mount /home/seneca/wikitext/task.py
-├── 🔨 Created mount /home/seneca/wikitext/verify_nvml.py
-├── 🔨 Created mount /home/seneca/wikitext/run_eval.py
-├── 🔨 Created mount /home/seneca/wikitext/wikitext.py
+├── 🔨 Created mount /Users/naka/src/sutro/wikitext/submit.py
+├── 🔨 Created mount /Users/naka/src/sutro/wikitext/task.py
+├── 🔨 Created mount /Users/naka/src/sutro/wikitext/verify_nvml.py
+├── 🔨 Created mount /Users/naka/src/sutro/wikitext/run_eval.py
+├── 🔨 Created mount /Users/naka/src/sutro/wikitext/wikitext.py
 └── 🔨 Created function run_submission.
 [modal] verifying NVML energy counter ...
 GPU: NVIDIA A100 80GB PCIe
 sampling idle power for 3s ...
-  idle: 53.0 W
+  idle: 53.1 W
 running 30s stress workload ...
-/usr/local/lib/python3.11/site-packages/torch/_subclasses/functional_tensor.py:295: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
-  cpu = _conversion_method_template(device=torch.device("cpu"))
-  duration:       37.5 s
-  energy delta:   8,522.1 J
-  avg power:      227.2 W
+  duration:       36.6 s
+  energy delta:   8,404.0 J
+  avg power:      229.4 W
   monotonic:      True
 ---
-{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 52.98316666666664, "stress_watts_avg": 227.19281000664682, "stress_energy_joules": 8522.132, "stress_duration_s": 37.510570866, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": []}
+{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 53.112000000000016, "stress_watts_avg": 229.4340394814706, "stress_energy_joules": 8404.0, "stress_duration_s": 36.629263988, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": []}
 [modal] running submission (TEST_CHARS=60000 MAX_TRAIN_SECONDS=300.0 ACC_MIN=0.7) ...
 loading WikiText-103 from /data ...
   train chars: 540,095,682
   val   chars: 60,000  (scored, gated by --acc-min)
 train wall-clock cap: 300 s
 val accuracy floor : 0.7000
-/usr/local/lib/python3.11/site-packages/torch/_subclasses/functional_tensor.py:295: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
-  cpu = _conversion_method_template(device=torch.device("cpu"))
 training submission /workspace/lwta_k4.py ...
+[codecarbon WARNING @ 22:45:15] Multiple instances of codecarbon are allowed to run at the same time.
 [lwta_k4] 10.84M params  cfg=TrainConfig(d=384 L=6 H=6 bs=32 T=1024 steps=2150 lwta_k=4)
 [lwta_k4] step     0/2150  loss 5.5452  elapsed 1s
-[lwta_k4] step   100/2150  loss 1.6675  elapsed 12s
-[lwta_k4] step   200/2150  loss 1.4276  elapsed 23s
-[lwta_k4] step   300/2150  loss 1.2935  elapsed 34s
-[lwta_k4] step   400/2150  loss 1.2449  elapsed 45s
-[lwta_k4] step   500/2150  loss 1.1959  elapsed 55s
-[lwta_k4] step   600/2150  loss 1.2452  elapsed 66s
-[lwta_k4] step   700/2150  loss 1.1693  elapsed 77s
-[lwta_k4] step   800/2150  loss 1.1698  elapsed 88s
-[lwta_k4] step   900/2150  loss 1.1456  elapsed 99s
-[lwta_k4] step  1000/2150  loss 1.0783  elapsed 109s
-[lwta_k4] step  1100/2150  loss 1.1223  elapsed 120s
-[lwta_k4] step  1200/2150  loss 1.0742  elapsed 131s
-[lwta_k4] step  1300/2150  loss 1.0665  elapsed 142s
-[lwta_k4] step  1400/2150  loss 1.0240  elapsed 153s
-[lwta_k4] step  1500/2150  loss 1.0439  elapsed 163s
-[lwta_k4] step  1600/2150  loss 1.0516  elapsed 174s
-[lwta_k4] step  1700/2150  loss 1.0211  elapsed 185s
-[lwta_k4] step  1800/2150  loss 1.0123  elapsed 196s
-[lwta_k4] step  1900/2150  loss 1.0387  elapsed 207s
-[lwta_k4] step  2000/2150  loss 0.9838  elapsed 217s
-[lwta_k4] step  2100/2150  loss 0.9776  elapsed 228s
-[lwta_k4] step  2149/2150  loss 0.9949  elapsed 234s
-training: 46,222.2 J   duration=236.8s
+[lwta_k4] step   100/2150  loss 1.6542  elapsed 11s
+[lwta_k4] step   200/2150  loss 1.4573  elapsed 21s
+[lwta_k4] step   300/2150  loss 1.3381  elapsed 31s
+[lwta_k4] step   400/2150  loss 1.2373  elapsed 41s
+[lwta_k4] step   500/2150  loss 1.2093  elapsed 51s
+[lwta_k4] step   600/2150  loss 1.1910  elapsed 62s
+[lwta_k4] step   700/2150  loss 1.1619  elapsed 72s
+[lwta_k4] step   800/2150  loss 1.1566  elapsed 82s
+[lwta_k4] step   900/2150  loss 1.1024  elapsed 92s
+[lwta_k4] step  1000/2150  loss 1.1438  elapsed 102s
+[lwta_k4] step  1100/2150  loss 1.0946  elapsed 112s
+[lwta_k4] step  1200/2150  loss 1.1067  elapsed 122s
+[lwta_k4] step  1300/2150  loss 1.0719  elapsed 133s
+[lwta_k4] step  1400/2150  loss 1.0670  elapsed 143s
+[lwta_k4] step  1500/2150  loss 1.0598  elapsed 153s
+[lwta_k4] step  1600/2150  loss 1.0182  elapsed 163s
+[lwta_k4] step  1700/2150  loss 1.0992  elapsed 173s
+[lwta_k4] step  1800/2150  loss 1.0282  elapsed 183s
+[lwta_k4] step  1900/2150  loss 1.0147  elapsed 194s
+[lwta_k4] step  2000/2150  loss 1.0158  elapsed 204s
+[lwta_k4] step  2100/2150  loss 0.9868  elapsed 214s
+[lwta_k4] step  2149/2150  loss 0.9310  elapsed 219s
+training: 44,328.7 J   duration=221.2s
 evaluating on val split ...
-  eval      1,200/60,000 (  2.0%)  acc=0.7292      114 char/s  eta=   517s
-  eval      2,400/60,000 (  4.0%)  acc=0.7142      114 char/s  eta=   507s
-  eval      3,600/60,000 (  6.0%)  acc=0.7114      114 char/s  eta=   497s
-  eval      4,800/60,000 (  8.0%)  acc=0.7196      114 char/s  eta=   486s
-  eval      6,000/60,000 ( 10.0%)  acc=0.7073      114 char/s  eta=   475s
-  eval      7,200/60,000 ( 12.0%)  acc=0.7024      114 char/s  eta=   464s
-  eval      8,400/60,000 ( 14.0%)  acc=0.7008      114 char/s  eta=   454s
-  eval      9,600/60,000 ( 16.0%)  acc=0.7052      114 char/s  eta=   443s
-  eval     10,800/60,000 ( 18.0%)  acc=0.7071      114 char/s  eta=   432s
-  eval     12,000/60,000 ( 20.0%)  acc=0.7086      114 char/s  eta=   422s
-  eval     13,200/60,000 ( 22.0%)  acc=0.7137      114 char/s  eta=   411s
-  eval     14,400/60,000 ( 24.0%)  acc=0.7153      114 char/s  eta=   401s
-  eval     15,600/60,000 ( 26.0%)  acc=0.7179      114 char/s  eta=   390s
-  eval     16,800/60,000 ( 28.0%)  acc=0.7202      114 char/s  eta=   379s
-  eval     18,000/60,000 ( 30.0%)  acc=0.7189      114 char/s  eta=   368s
-  eval     19,200/60,000 ( 32.0%)  acc=0.7204      114 char/s  eta=   358s
-  eval     20,400/60,000 ( 34.0%)  acc=0.7217      114 char/s  eta=   347s
-  eval     21,600/60,000 ( 36.0%)  acc=0.7214      114 char/s  eta=   337s
-  eval     22,800/60,000 ( 38.0%)  acc=0.7223      114 char/s  eta=   327s
-  eval     24,000/60,000 ( 40.0%)  acc=0.7229      114 char/s  eta=   316s
-  eval     25,200/60,000 ( 42.0%)  acc=0.7240      114 char/s  eta=   306s
-  eval     26,400/60,000 ( 44.0%)  acc=0.7248      114 char/s  eta=   295s
-  eval     27,600/60,000 ( 46.0%)  acc=0.7257      114 char/s  eta=   285s
-  eval     28,800/60,000 ( 48.0%)  acc=0.7258      114 char/s  eta=   274s
-  eval     30,000/60,000 ( 50.0%)  acc=0.7252      114 char/s  eta=   263s
-  eval     31,200/60,000 ( 52.0%)  acc=0.7232      114 char/s  eta=   253s
-  eval     32,400/60,000 ( 54.0%)  acc=0.7228      114 char/s  eta=   242s
-  eval     33,600/60,000 ( 56.0%)  acc=0.7209      114 char/s  eta=   232s
-  eval     34,800/60,000 ( 58.0%)  acc=0.7201      114 char/s  eta=   221s
-  eval     36,000/60,000 ( 60.0%)  acc=0.7196      114 char/s  eta=   211s
-  eval     37,200/60,000 ( 62.0%)  acc=0.7196      114 char/s  eta=   200s
-  eval     38,400/60,000 ( 64.0%)  acc=0.7196      114 char/s  eta=   190s
-  eval     39,600/60,000 ( 66.0%)  acc=0.7194      114 char/s  eta=   179s
-  eval     40,800/60,000 ( 68.0%)  acc=0.7189      114 char/s  eta=   169s
-  eval     42,000/60,000 ( 70.0%)  acc=0.7187      114 char/s  eta=   158s
-  eval     43,200/60,000 ( 72.0%)  acc=0.7188      114 char/s  eta=   148s
-  eval     44,400/60,000 ( 74.0%)  acc=0.7185      114 char/s  eta=   137s
-  eval     45,600/60,000 ( 76.0%)  acc=0.7185      114 char/s  eta=   126s
-  eval     46,800/60,000 ( 78.0%)  acc=0.7184      114 char/s  eta=   116s
-  eval     48,000/60,000 ( 80.0%)  acc=0.7187      114 char/s  eta=   105s
-  eval     49,200/60,000 ( 82.0%)  acc=0.7194      114 char/s  eta=    95s
-  eval     50,400/60,000 ( 84.0%)  acc=0.7208      114 char/s  eta=    84s
-  eval     51,600/60,000 ( 86.0%)  acc=0.7214      114 char/s  eta=    74s
-  eval     52,800/60,000 ( 88.0%)  acc=0.7227      114 char/s  eta=    63s
-  eval     54,000/60,000 ( 90.0%)  acc=0.7229      114 char/s  eta=    53s
-  eval     55,200/60,000 ( 92.0%)  acc=0.7224      114 char/s  eta=    42s
-  eval     56,400/60,000 ( 94.0%)  acc=0.7230      114 char/s  eta=    32s
-  eval     57,600/60,000 ( 96.0%)  acc=0.7234      114 char/s  eta=    21s
-  eval     58,800/60,000 ( 98.0%)  acc=0.7239      114 char/s  eta=    11s
-  eval     60,000/60,000 (100.0%)  acc=0.7238      114 char/s  eta=     0s
-chars=60,000  acc=0.7238  eval_duration=527.1s
+  eval      1,200/60,000 (  2.0%)  acc=0.7067      149 char/s  eta=   396s
+  eval      2,400/60,000 (  4.0%)  acc=0.7037      149 char/s  eta=   386s
+  eval      3,600/60,000 (  6.0%)  acc=0.7044      149 char/s  eta=   377s
+  eval      4,800/60,000 (  8.0%)  acc=0.7127      149 char/s  eta=   370s
+  eval      6,000/60,000 ( 10.0%)  acc=0.7010      150 char/s  eta=   360s
+  eval      7,200/60,000 ( 12.0%)  acc=0.7004      150 char/s  eta=   352s
+  eval      8,400/60,000 ( 14.0%)  acc=0.6994      151 char/s  eta=   342s
+  eval      9,600/60,000 ( 16.0%)  acc=0.7035      151 char/s  eta=   333s
+  eval     10,800/60,000 ( 18.0%)  acc=0.7044      151 char/s  eta=   326s
+  eval     12,000/60,000 ( 20.0%)  acc=0.7061      151 char/s  eta=   318s
+  eval     13,200/60,000 ( 22.0%)  acc=0.7114      151 char/s  eta=   310s
+  eval     14,400/60,000 ( 24.0%)  acc=0.7133      151 char/s  eta=   302s
+  eval     15,600/60,000 ( 26.0%)  acc=0.7163      151 char/s  eta=   294s
+  eval     16,800/60,000 ( 28.0%)  acc=0.7193      151 char/s  eta=   287s
+  eval     18,000/60,000 ( 30.0%)  acc=0.7174      151 char/s  eta=   279s
+  eval     19,200/60,000 ( 32.0%)  acc=0.7186      151 char/s  eta=   271s
+  eval     20,400/60,000 ( 34.0%)  acc=0.7201      150 char/s  eta=   263s
+  eval     21,600/60,000 ( 36.0%)  acc=0.7198      150 char/s  eta=   255s
+  eval     22,800/60,000 ( 38.0%)  acc=0.7202      151 char/s  eta=   247s
+  eval     24,000/60,000 ( 40.0%)  acc=0.7214      150 char/s  eta=   239s
+  eval     25,200/60,000 ( 42.0%)  acc=0.7225      150 char/s  eta=   231s
+  eval     26,400/60,000 ( 44.0%)  acc=0.7234      151 char/s  eta=   223s
+  eval     27,600/60,000 ( 46.0%)  acc=0.7249      150 char/s  eta=   215s
+  eval     28,800/60,000 ( 48.0%)  acc=0.7253      150 char/s  eta=   207s
+  eval     30,000/60,000 ( 50.0%)  acc=0.7242      150 char/s  eta=   200s
+  eval     31,200/60,000 ( 52.0%)  acc=0.7222      150 char/s  eta=   192s
+  eval     32,400/60,000 ( 54.0%)  acc=0.7219      150 char/s  eta=   184s
+  eval     33,600/60,000 ( 56.0%)  acc=0.7204      150 char/s  eta=   176s
+  eval     34,800/60,000 ( 58.0%)  acc=0.7194      150 char/s  eta=   168s
+  eval     36,000/60,000 ( 60.0%)  acc=0.7187      150 char/s  eta=   160s
+  eval     37,200/60,000 ( 62.0%)  acc=0.7190      150 char/s  eta=   152s
+  eval     38,400/60,000 ( 64.0%)  acc=0.7191      150 char/s  eta=   144s
+  eval     39,600/60,000 ( 66.0%)  acc=0.7191      150 char/s  eta=   136s
+  eval     40,800/60,000 ( 68.0%)  acc=0.7184      150 char/s  eta=   128s
+  eval     42,000/60,000 ( 70.0%)  acc=0.7182      150 char/s  eta=   120s
+  eval     43,200/60,000 ( 72.0%)  acc=0.7187      150 char/s  eta=   112s
+  eval     44,400/60,000 ( 74.0%)  acc=0.7186      150 char/s  eta=   104s
+  eval     45,600/60,000 ( 76.0%)  acc=0.7189      150 char/s  eta=    96s
+  eval     46,800/60,000 ( 78.0%)  acc=0.7186      150 char/s  eta=    88s
+  eval     48,000/60,000 ( 80.0%)  acc=0.7190      150 char/s  eta=    80s
+  eval     49,200/60,000 ( 82.0%)  acc=0.7196      150 char/s  eta=    72s
+  eval     50,400/60,000 ( 84.0%)  acc=0.7210      150 char/s  eta=    64s
+  eval     51,600/60,000 ( 86.0%)  acc=0.7216      150 char/s  eta=    56s
+  eval     52,800/60,000 ( 88.0%)  acc=0.7228      150 char/s  eta=    48s
+  eval     54,000/60,000 ( 90.0%)  acc=0.7233      150 char/s  eta=    40s
+  eval     55,200/60,000 ( 92.0%)  acc=0.7225      150 char/s  eta=    32s
+  eval     56,400/60,000 ( 94.0%)  acc=0.7232      150 char/s  eta=    24s
+  eval     57,600/60,000 ( 96.0%)  acc=0.7238      150 char/s  eta=    16s
+  eval     58,800/60,000 ( 98.0%)  acc=0.7245      150 char/s  eta=     8s
+  eval     60,000/60,000 (100.0%)  acc=0.7246      150 char/s  eta=     0s
+chars=60,000  acc=0.7246  eval_duration=399.9s
 ---
 submission         : lwta_k4
-training energy (J): 46,222.2
-training duration  : 236.8s
-val  char-accuracy : 0.7238
+training energy (J): 44,328.7
+training duration  : 221.2s
+val  char-accuracy : 0.7246
 val  chars         : 60,000
 wrote /tmp/result.json
 Stopping app - local entrypoint completed.
 ✓ App completed. View run at 
-https://modal.com/apps/ab-10/main/ap-rRKkJJfugmtJNujFsbVrP3
+https://modal.com/apps/gabriel-nakajima-an/main/ap-hyBCSy0XG5jwP218GI6bNc
 
 # final result
 {
   "submission": "lwta_k4",
-  "training_energy_J": 46222.22882105,
-  "training_duration_s": 236.750003579,
-  "val_char_accuracy": 0.72375,
+  "training_energy_J": 44328.7028316,
+  "training_duration_s": 221.249343368,
+  "cpu_energy_J": 9354.22259884751,
+  "total_energy_J": 53682.92543044751,
+  "val_char_accuracy": 0.72455,
   "val_chars": 60000,
   "gpu_name": "NVIDIA A100 80GB PCIe",
-  "date_utc": "2026-05-18T18:06:13Z",
+  "date_utc": "2026-05-20T22:55:41Z",
   "_nvml": {
     "nvml_available": true,
     "energy_counter_supported": true,
     "monotonic": true,
-    "idle_watts": 52.98316666666664,
-    "stress_watts_avg": 227.19281000664682,
-    "stress_energy_joules": 8522.132,
-    "stress_duration_s": 37.510570866,
+    "idle_watts": 53.112000000000016,
+    "stress_watts_avg": 229.4340394814706,
+    "stress_energy_joules": 8404.0,
+    "stress_duration_s": 36.629263988,
     "gpu_name": "NVIDIA A100 80GB PCIe",
     "notes": []
   },
diff --git a/submissions/lwta_k4_alpha_065/nvml.json b/submissions/lwta_k4_alpha_065/nvml.json
index 55f39e0..fb37849 100644
--- a/submissions/lwta_k4_alpha_065/nvml.json
+++ b/submissions/lwta_k4_alpha_065/nvml.json
@@ -2,10 +2,10 @@
   "nvml_available": true,
   "energy_counter_supported": true,
   "monotonic": true,
-  "idle_watts": 54.36613333333333,
-  "stress_watts_avg": 228.9315778481163,
-  "stress_energy_joules": 8622.915,
-  "stress_duration_s": 37.665904726,
-  "gpu_name": "NVIDIA A100 80GB PCIe",
+  "idle_watts": 60.704084745762636,
+  "stress_watts_avg": 354.6097806948912,
+  "stress_energy_joules": 13165.227,
+  "stress_duration_s": 37.125955675,
+  "gpu_name": "NVIDIA A100-SXM4-80GB",
   "notes": []
 }
diff --git a/submissions/lwta_k4_alpha_065/result.json b/submissions/lwta_k4_alpha_065/result.json
index ffa4ff7..b2d6674 100644
--- a/submissions/lwta_k4_alpha_065/result.json
+++ b/submissions/lwta_k4_alpha_065/result.json
@@ -1,20 +1,22 @@
 {
   "submission": "lwta_k4_alpha_065",
-  "training_energy_J": 13173.6836969,
-  "training_duration_s": 117.52094606199998,
-  "val_char_accuracy": 0.7381833333333333,
+  "training_energy_J": 13751.454901850002,
+  "training_duration_s": 144.607121963,
+  "cpu_energy_J": 6170.442651387502,
+  "total_energy_J": 19921.897553237504,
+  "val_char_accuracy": 0.7327833333333333,
   "val_chars": 60000,
-  "gpu_name": "NVIDIA A100 80GB PCIe",
-  "date_utc": "2026-05-20T00:58:50Z",
+  "gpu_name": "NVIDIA A100-SXM4-80GB",
+  "date_utc": "2026-05-21T05:28:08Z",
   "_nvml": {
     "nvml_available": true,
     "energy_counter_supported": true,
     "monotonic": true,
-    "idle_watts": 54.36613333333333,
-    "stress_watts_avg": 228.9315778481163,
-    "stress_energy_joules": 8622.915,
-    "stress_duration_s": 37.665904726,
-    "gpu_name": "NVIDIA A100 80GB PCIe",
+    "idle_watts": 60.704084745762636,
+    "stress_watts_avg": 354.6097806948912,
+    "stress_energy_joules": 13165.227,
+    "stress_duration_s": 37.125955675,
+    "gpu_name": "NVIDIA A100-SXM4-80GB",
     "notes": []
   },
   "contributor": "@subagent-L2clean-2026-05-19"
diff --git a/submissions/lwta_k4_alpha_065/run.log b/submissions/lwta_k4_alpha_065/run.log
index fed31cd..1673c1c 100644
--- a/submissions/lwta_k4_alpha_065/run.log
+++ b/submissions/lwta_k4_alpha_065/run.log
@@ -1,25 +1,25 @@
-# wikitext submit.py log — lwta_k4_alpha_065 — 2026-05-20T00:50:06+00:00Z
+# wikitext submit.py log — lwta_k4_alpha_065 — 2026-05-21T05:19:18+00:00Z
 [modal] launching A100-80GB ...
 ✓ Initialized. View run at 
-https://modal.com/apps/gabriel-nakajima-an/main/ap-QClLkwZItRoeZ237Shsx0Z
+https://modal.com/apps/gabriel-nakajima-an/main/ap-xwFThbQDkgtwWiaSsczU1X
 ✓ Created objects.
 ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/submit.py
-├── 🔨 Created mount /Users/naka/src/sutro/wikitext/task.py
 ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/verify_nvml.py
-├── 🔨 Created mount /Users/naka/src/sutro/wikitext/run_eval.py
+├── 🔨 Created mount /Users/naka/src/sutro/wikitext/task.py
 ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/wikitext.py
+├── 🔨 Created mount /Users/naka/src/sutro/wikitext/run_eval.py
 └── 🔨 Created function run_submission.
 [modal] verifying NVML energy counter ...
-GPU: NVIDIA A100 80GB PCIe
+GPU: NVIDIA A100-SXM4-80GB
 sampling idle power for 3s ...
-  idle: 54.4 W
+  idle: 60.7 W
 running 30s stress workload ...
-  duration:       37.7 s
-  energy delta:   8,622.9 J
-  avg power:      228.9 W
+  duration:       37.1 s
+  energy delta:   13,165.2 J
+  avg power:      354.6 W
   monotonic:      True
 ---
-{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 54.36613333333333, "stress_watts_avg": 228.9315778481163, "stress_energy_joules": 8622.915, "stress_duration_s": 37.665904726, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": []}
+{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 60.704084745762636, "stress_watts_avg": 354.6097806948912, "stress_energy_joules": 13165.227, "stress_duration_s": 37.125955675, "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": []}
 [modal] running submission (TEST_CHARS=60000 MAX_TRAIN_SECONDS=300.0 ACC_MIN=0.7) ...
 loading WikiText-103 from /data ...
   train chars: 540,095,682
@@ -27,117 +27,120 @@ loading WikiText-103 from /data ...
 train wall-clock cap: 300 s
 val accuracy floor : 0.7000
 training submission /workspace/lwta_k4_alpha_065.py ...
+[codecarbon WARNING @ 05:20:10] Multiple instances of codecarbon are allowed to run at the same time.
 [lwta_k4_a065] starting GPU KN build; max_order=12 D=0.5
 [lwta_k4_a065] top order=12 unique pairs: 157,942,722  2.5s
-[lwta_k4_a065] ctx_len=11 ctxs=119,285,712 15.0s
-[lwta_k4_a065] ctx_len=10 ctxs=84,282,364 13.0s
-[lwta_k4_a065] ctx_len=9 ctxs=54,720,376 8.5s
-[lwta_k4_a065] ctx_len=8 ctxs=31,924,091 5.2s
-[lwta_k4_a065] ctx_len=7 ctxs=16,284,921 2.3s
-[lwta_k4_a065] ctx_len=6 ctxs=7,016,442 1.1s
+[lwta_k4_a065] ctx_len=11 ctxs=119,285,712 33.7s
+[lwta_k4_a065] ctx_len=10 ctxs=84,282,364 19.5s
+[lwta_k4_a065] ctx_len=9 ctxs=54,720,376 10.5s
+[lwta_k4_a065] ctx_len=8 ctxs=31,924,091 6.3s
+[lwta_k4_a065] ctx_len=7 ctxs=16,284,921 3.4s
+[lwta_k4_a065] ctx_len=6 ctxs=7,016,442 1.5s
 [lwta_k4_a065] ctx_len=5 ctxs=2,438,281 0.6s
 [lwta_k4_a065] ctx_len=4 ctxs=637,143 0.1s
 [lwta_k4_a065] ctx_len=3 ctxs=122,882 0.0s
 [lwta_k4_a065] ctx_len=2 ctxs=12,282 0.0s
 [lwta_k4_a065] ctx_len=1 ctxs=204 0.0s
 [lwta_k4_a065] ctx_len=0 ctxs=1 0.0s
-[lwta_k4_a065] KN build done: 48.3s
+[lwta_k4_a065] KN build done: 78.1s
 [lwta_k4_a065] NN 3.29M params  cfg=TrainConfig(d=256 L=4 H=4 bs=32 T=1024 steps=1200 lwta_k=4)
 [lwta_k4_a065] NN step     0/1200  loss 5.5452  elapsed 1s
-[lwta_k4_a065] NN step   100/1200  loss 1.8225  elapsed 6s
-[lwta_k4_a065] NN step   200/1200  loss 1.5410  elapsed 12s
-[lwta_k4_a065] NN step   300/1200  loss 1.4316  elapsed 17s
-[lwta_k4_a065] NN step   400/1200  loss 1.3322  elapsed 22s
-[lwta_k4_a065] NN step   500/1200  loss 1.3151  elapsed 28s
-[lwta_k4_a065] NN step   600/1200  loss 1.2459  elapsed 33s
-[lwta_k4_a065] NN step   700/1200  loss 1.2173  elapsed 39s
-[lwta_k4_a065] NN step   800/1200  loss 1.1725  elapsed 44s
-[lwta_k4_a065] NN step   900/1200  loss 1.1813  elapsed 50s
-[lwta_k4_a065] NN step  1000/1200  loss 1.1598  elapsed 55s
-[lwta_k4_a065] NN step  1100/1200  loss 1.1275  elapsed 60s
-[lwta_k4_a065] NN step  1199/1200  loss 1.1207  elapsed 66s
-training: 13,173.7 J   duration=117.5s
+[lwta_k4_a065] NN step   100/1200  loss 1.7843  elapsed 6s
+[lwta_k4_a065] NN step   200/1200  loss 1.5196  elapsed 11s
+[lwta_k4_a065] NN step   300/1200  loss 1.4285  elapsed 16s
+[lwta_k4_a065] NN step   400/1200  loss 1.3993  elapsed 21s
+[lwta_k4_a065] NN step   500/1200  loss 1.3084  elapsed 26s
+[lwta_k4_a065] NN step   600/1200  loss 1.2457  elapsed 32s
+[lwta_k4_a065] NN step   700/1200  loss 1.2204  elapsed 37s
+[lwta_k4_a065] NN step   800/1200  loss 1.2090  elapsed 42s
+[lwta_k4_a065] NN step   900/1200  loss 1.1729  elapsed 47s
+[lwta_k4_a065] NN step  1000/1200  loss 1.1880  elapsed 52s
+[lwta_k4_a065] NN step  1100/1200  loss 1.1337  elapsed 57s
+[lwta_k4_a065] NN step  1199/1200  loss 1.1552  elapsed 62s
+training: 13,751.5 J   duration=144.6s
 evaluating on val split ...
-  eval      1,200/60,000 (  2.0%)  acc=0.7175      163 char/s  eta=   361s
-  eval      2,400/60,000 (  4.0%)  acc=0.7104      168 char/s  eta=   343s
-  eval      3,600/60,000 (  6.0%)  acc=0.7131      168 char/s  eta=   336s
-  eval      4,800/60,000 (  8.0%)  acc=0.7212      168 char/s  eta=   329s
-  eval      6,000/60,000 ( 10.0%)  acc=0.7170      169 char/s  eta=   320s
-  eval      7,200/60,000 ( 12.0%)  acc=0.7146      169 char/s  eta=   312s
-  eval      8,400/60,000 ( 14.0%)  acc=0.7156      169 char/s  eta=   305s
-  eval      9,600/60,000 ( 16.0%)  acc=0.7215      170 char/s  eta=   297s
-  eval     10,800/60,000 ( 18.0%)  acc=0.7262      169 char/s  eta=   290s
-  eval     12,000/60,000 ( 20.0%)  acc=0.7282      169 char/s  eta=   283s
-  eval     13,200/60,000 ( 22.0%)  acc=0.7321      170 char/s  eta=   276s
-  eval     14,400/60,000 ( 24.0%)  acc=0.7336      170 char/s  eta=   269s
-  eval     15,600/60,000 ( 26.0%)  acc=0.7354      170 char/s  eta=   261s
-  eval     16,800/60,000 ( 28.0%)  acc=0.7385      170 char/s  eta=   254s
-  eval     18,000/60,000 ( 30.0%)  acc=0.7392      170 char/s  eta=   247s
-  eval     19,200/60,000 ( 32.0%)  acc=0.7418      170 char/s  eta=   240s
-  eval     20,400/60,000 ( 34.0%)  acc=0.7428      170 char/s  eta=   233s
-  eval     21,600/60,000 ( 36.0%)  acc=0.7427      170 char/s  eta=   226s
-  eval     22,800/60,000 ( 38.0%)  acc=0.7430      170 char/s  eta=   219s
-  eval     24,000/60,000 ( 40.0%)  acc=0.7427      170 char/s  eta=   212s
-  eval     25,200/60,000 ( 42.0%)  acc=0.7432      170 char/s  eta=   205s
-  eval     26,400/60,000 ( 44.0%)  acc=0.7439      170 char/s  eta=   198s
-  eval     27,600/60,000 ( 46.0%)  acc=0.7439      169 char/s  eta=   191s
-  eval     28,800/60,000 ( 48.0%)  acc=0.7444      168 char/s  eta=   185s
-  eval     30,000/60,000 ( 50.0%)  acc=0.7434      168 char/s  eta=   178s
-  eval     31,200/60,000 ( 52.0%)  acc=0.7410      168 char/s  eta=   172s
-  eval     32,400/60,000 ( 54.0%)  acc=0.7404      168 char/s  eta=   165s
-  eval     33,600/60,000 ( 56.0%)  acc=0.7385      167 char/s  eta=   158s
-  eval     34,800/60,000 ( 58.0%)  acc=0.7383      167 char/s  eta=   151s
-  eval     36,000/60,000 ( 60.0%)  acc=0.7382      167 char/s  eta=   144s
-  eval     37,200/60,000 ( 62.0%)  acc=0.7385      167 char/s  eta=   136s
-  eval     38,400/60,000 ( 64.0%)  acc=0.7385      168 char/s  eta=   129s
-  eval     39,600/60,000 ( 66.0%)  acc=0.7382      168 char/s  eta=   122s
-  eval     40,800/60,000 ( 68.0%)  acc=0.7375      168 char/s  eta=   114s
-  eval     42,000/60,000 ( 70.0%)  acc=0.7368      168 char/s  eta=   107s
-  eval     43,200/60,000 ( 72.0%)  acc=0.7369      168 char/s  eta=   100s
-  eval     44,400/60,000 ( 74.0%)  acc=0.7363      168 char/s  eta=    93s
-  eval     45,600/60,000 ( 76.0%)  acc=0.7363      169 char/s  eta=    85s
-  eval     46,800/60,000 ( 78.0%)  acc=0.7354      168 char/s  eta=    78s
-  eval     48,000/60,000 ( 80.0%)  acc=0.7355      168 char/s  eta=    71s
-  eval     49,200/60,000 ( 82.0%)  acc=0.7354      168 char/s  eta=    64s
-  eval     50,400/60,000 ( 84.0%)  acc=0.7362      168 char/s  eta=    57s
-  eval     51,600/60,000 ( 86.0%)  acc=0.7364      169 char/s  eta=    50s
-  eval     52,800/60,000 ( 88.0%)  acc=0.7371      169 char/s  eta=    43s
-  eval     54,000/60,000 ( 90.0%)  acc=0.7373      169 char/s  eta=    36s
-  eval     55,200/60,000 ( 92.0%)  acc=0.7365      169 char/s  eta=    28s
-  eval     56,400/60,000 ( 94.0%)  acc=0.7365      169 char/s  eta=    21s
-  eval     57,600/60,000 ( 96.0%)  acc=0.7369      169 char/s  eta=    14s
-  eval     58,800/60,000 ( 98.0%)  acc=0.7375      169 char/s  eta=     7s
-  eval     60,000/60,000 (100.0%)  acc=0.7382      169 char/s  eta=     0s
-chars=60,000  acc=0.7382  eval_duration=355.4s
+  eval      1,200/60,000 (  2.0%)  acc=0.7200      181 char/s  eta=   326s
+  eval      2,400/60,000 (  4.0%)  acc=0.7125      180 char/s  eta=   319s
+  eval      3,600/60,000 (  6.0%)  acc=0.7111      181 char/s  eta=   312s
+  eval      4,800/60,000 (  8.0%)  acc=0.7194      181 char/s  eta=   305s
+  eval      6,000/60,000 ( 10.0%)  acc=0.7150      182 char/s  eta=   297s
+  eval      7,200/60,000 ( 12.0%)  acc=0.7108      182 char/s  eta=   289s
+  eval      8,400/60,000 ( 14.0%)  acc=0.7112      183 char/s  eta=   283s
+  eval      9,600/60,000 ( 16.0%)  acc=0.7167      183 char/s  eta=   276s
+  eval     10,800/60,000 ( 18.0%)  acc=0.7214      183 char/s  eta=   269s
+  eval     12,000/60,000 ( 20.0%)  acc=0.7228      183 char/s  eta=   263s
+  eval     13,200/60,000 ( 22.0%)  acc=0.7275      183 char/s  eta=   256s
+  eval     14,400/60,000 ( 24.0%)  acc=0.7290      183 char/s  eta=   249s
+  eval     15,600/60,000 ( 26.0%)  acc=0.7306      183 char/s  eta=   243s
+  eval     16,800/60,000 ( 28.0%)  acc=0.7339      183 char/s  eta=   236s
+  eval     18,000/60,000 ( 30.0%)  acc=0.7340      183 char/s  eta=   230s
+  eval     19,200/60,000 ( 32.0%)  acc=0.7368      183 char/s  eta=   223s
+  eval     20,400/60,000 ( 34.0%)  acc=0.7388      183 char/s  eta=   217s
+  eval     21,600/60,000 ( 36.0%)  acc=0.7386      182 char/s  eta=   210s
+  eval     22,800/60,000 ( 38.0%)  acc=0.7384      182 char/s  eta=   204s
+  eval     24,000/60,000 ( 40.0%)  acc=0.7380      182 char/s  eta=   198s
+  eval     25,200/60,000 ( 42.0%)  acc=0.7381      182 char/s  eta=   191s
+  eval     26,400/60,000 ( 44.0%)  acc=0.7388      182 char/s  eta=   185s
+  eval     27,600/60,000 ( 46.0%)  acc=0.7386      182 char/s  eta=   178s
+  eval     28,800/60,000 ( 48.0%)  acc=0.7391      182 char/s  eta=   171s
+  eval     30,000/60,000 ( 50.0%)  acc=0.7383      182 char/s  eta=   165s
+  eval     31,200/60,000 ( 52.0%)  acc=0.7353      182 char/s  eta=   158s
+  eval     32,400/60,000 ( 54.0%)  acc=0.7346      182 char/s  eta=   151s
+  eval     33,600/60,000 ( 56.0%)  acc=0.7327      182 char/s  eta=   145s
+  eval     34,800/60,000 ( 58.0%)  acc=0.7330      182 char/s  eta=   138s
+  eval     36,000/60,000 ( 60.0%)  acc=0.7331      183 char/s  eta=   131s
+  eval     37,200/60,000 ( 62.0%)  acc=0.7333      183 char/s  eta=   125s
+  eval     38,400/60,000 ( 64.0%)  acc=0.7330      182 char/s  eta=   118s
+  eval     39,600/60,000 ( 66.0%)  acc=0.7324      182 char/s  eta=   112s
+  eval     40,800/60,000 ( 68.0%)  acc=0.7317      182 char/s  eta=   105s
+  eval     42,000/60,000 ( 70.0%)  acc=0.7306      182 char/s  eta=    99s
+  eval     43,200/60,000 ( 72.0%)  acc=0.7308      183 char/s  eta=    92s
+  eval     44,400/60,000 ( 74.0%)  acc=0.7306      183 char/s  eta=    85s
+  eval     45,600/60,000 ( 76.0%)  acc=0.7306      182 char/s  eta=    79s
+  eval     46,800/60,000 ( 78.0%)  acc=0.7301      182 char/s  eta=    72s
+  eval     48,000/60,000 ( 80.0%)  acc=0.7302      183 char/s  eta=    66s
+  eval     49,200/60,000 ( 82.0%)  acc=0.7301      183 char/s  eta=    59s
+  eval     50,400/60,000 ( 84.0%)  acc=0.7312      182 char/s  eta=    53s
+  eval     51,600/60,000 ( 86.0%)  acc=0.7315      182 char/s  eta=    46s
+  eval     52,800/60,000 ( 88.0%)  acc=0.7317      182 char/s  eta=    39s
+  eval     54,000/60,000 ( 90.0%)  acc=0.7319      182 char/s  eta=    33s
+  eval     55,200/60,000 ( 92.0%)  acc=0.7310      182 char/s  eta=    26s
+  eval     56,400/60,000 ( 94.0%)  acc=0.7309      182 char/s  eta=    20s
+  eval     57,600/60,000 ( 96.0%)  acc=0.7313      182 char/s  eta=    13s
+  eval     58,800/60,000 ( 98.0%)  acc=0.7319      182 char/s  eta=     7s
+  eval     60,000/60,000 (100.0%)  acc=0.7328      182 char/s  eta=     0s
+chars=60,000  acc=0.7328  eval_duration=329.1s
 ---
 submission         : lwta_k4_alpha_065
-training energy (J): 13,173.7
-training duration  : 117.5s
-val  char-accuracy : 0.7382
+training energy (J): 13,751.5
+training duration  : 144.6s
+val  char-accuracy : 0.7328
 val  chars         : 60,000
 wrote /tmp/result.json
 Stopping app - local entrypoint completed.
 ✓ App completed. View run at 
-https://modal.com/apps/gabriel-nakajima-an/main/ap-QClLkwZItRoeZ237Shsx0Z
+https://modal.com/apps/gabriel-nakajima-an/main/ap-xwFThbQDkgtwWiaSsczU1X
 
 # final result
 {
   "submission": "lwta_k4_alpha_065",
-  "training_energy_J": 13173.6836969,
-  "training_duration_s": 117.52094606199998,
-  "val_char_accuracy": 0.7381833333333333,
+  "training_energy_J": 13751.454901850002,
+  "training_duration_s": 144.607121963,
+  "cpu_energy_J": 6170.442651387502,
+  "total_energy_J": 19921.897553237504,
+  "val_char_accuracy": 0.7327833333333333,
   "val_chars": 60000,
-  "gpu_name": "NVIDIA A100 80GB PCIe",
-  "date_utc": "2026-05-20T00:58:50Z",
+  "gpu_name": "NVIDIA A100-SXM4-80GB",
+  "date_utc": "2026-05-21T05:28:08Z",
   "_nvml": {
     "nvml_available": true,
     "energy_counter_supported": true,
     "monotonic": true,
-    "idle_watts": 54.36613333333333,
-    "stress_watts_avg": 228.9315778481163,
-    "stress_energy_joules": 8622.915,
-    "stress_duration_s": 37.665904726,
-    "gpu_name": "NVIDIA A100 80GB PCIe",
+    "idle_watts": 60.704084745762636,
+    "stress_watts_avg": 354.6097806948912,
+    "stress_energy_joules": 13165.227,
+    "stress_duration_s": 37.125955675,
+    "gpu_name": "NVIDIA A100-SXM4-80GB",
     "notes": []
   },
   "contributor": "@subagent-L2clean-2026-05-19"
diff --git a/submissions/modded_nanogpt/nvml.json b/submissions/modded_nanogpt/nvml.json
index 3423730..816fb1e 100644
--- a/submissions/modded_nanogpt/nvml.json
+++ b/submissions/modded_nanogpt/nvml.json
@@ -2,10 +2,10 @@
   "nvml_available": true,
   "energy_counter_supported": true,
   "monotonic": true,
-  "idle_watts": 55.60349152542373,
-  "stress_watts_avg": 232.33601101968227,
-  "stress_energy_joules": 8741.791,
-  "stress_duration_s": 37.625639527999994,
-  "gpu_name": "NVIDIA A100 80GB PCIe",
+  "idle_watts": 62.52783050847465,
+  "stress_watts_avg": 350.16403334081195,
+  "stress_energy_joules": 13179.464,
+  "stress_duration_s": 37.637971765,
+  "gpu_name": "NVIDIA A100-SXM4-80GB",
   "notes": []
 }
diff --git a/submissions/modded_nanogpt/result.json b/submissions/modded_nanogpt/result.json
index 2f96ede..33a46ce 100644
--- a/submissions/modded_nanogpt/result.json
+++ b/submissions/modded_nanogpt/result.json
@@ -1,20 +1,22 @@
 {
   "submission": "modded_nanogpt",
-  "training_energy_J": 51704.306257950004,
-  "training_duration_s": 246.648394841,
-  "val_char_accuracy": 0.7373666666666666,
+  "training_energy_J": 51728.92952335,
+  "training_duration_s": 242.66790953299997,
+  "cpu_energy_J": 10276.997479117497,
+  "total_energy_J": 62005.9270024675,
+  "val_char_accuracy": 0.7336833333333334,
   "val_chars": 60000,
-  "gpu_name": "NVIDIA A100 80GB PCIe",
-  "date_utc": "2026-05-12T22:10:25Z",
+  "gpu_name": "NVIDIA A100-SXM4-80GB",
+  "date_utc": "2026-05-21T05:31:57Z",
   "_nvml": {
     "nvml_available": true,
     "energy_counter_supported": true,
     "monotonic": true,
-    "idle_watts": 55.60349152542373,
-    "stress_watts_avg": 232.33601101968227,
-    "stress_energy_joules": 8741.791,
-    "stress_duration_s": 37.625639527999994,
-    "gpu_name": "NVIDIA A100 80GB PCIe",
+    "idle_watts": 62.52783050847465,
+    "stress_watts_avg": 350.16403334081195,
+    "stress_energy_joules": 13179.464,
+    "stress_duration_s": 37.637971765,
+    "gpu_name": "NVIDIA A100-SXM4-80GB",
     "notes": []
   },
   "contributor": "@ab-10"
diff --git a/submissions/modded_nanogpt/run.log b/submissions/modded_nanogpt/run.log
index 0d20c70..73fa945 100644
--- a/submissions/modded_nanogpt/run.log
+++ b/submissions/modded_nanogpt/run.log
@@ -1,142 +1,141 @@
-# wikitext submit.py log — modded_nanogpt — 2026-05-12T21:57:26+00:00Z
-[modal] launching A100-40GB ...
+# wikitext submit.py log — modded_nanogpt — 2026-05-21T05:19:18+00:00Z
+[modal] launching A100-80GB ...
 ✓ Initialized. View run at 
-https://modal.com/apps/ab-10/main/ap-E9Q8eIo1sjKa6HsN23nCdR
+https://modal.com/apps/gabriel-nakajima-an/main/ap-8CUCBSgp30OSqDLk4DjLGC
 ✓ Created objects.
-├── 🔨 Created mount /home/seneca/cybertronai-wikitext/submit.py
-├── 🔨 Created mount /home/seneca/cybertronai-wikitext/verify_nvml.py
-├── 🔨 Created mount /home/seneca/cybertronai-wikitext/run_eval.py
-├── 🔨 Created mount /home/seneca/cybertronai-wikitext/task.py
-├── 🔨 Created mount /home/seneca/cybertronai-wikitext/wikitext.py
+├── 🔨 Created mount /Users/naka/src/sutro/wikitext/submit.py
+├── 🔨 Created mount /Users/naka/src/sutro/wikitext/verify_nvml.py
+├── 🔨 Created mount /Users/naka/src/sutro/wikitext/task.py
+├── 🔨 Created mount /Users/naka/src/sutro/wikitext/run_eval.py
+├── 🔨 Created mount /Users/naka/src/sutro/wikitext/wikitext.py
 └── 🔨 Created function run_submission.
 [modal] verifying NVML energy counter ...
-GPU: NVIDIA A100 80GB PCIe
+GPU: NVIDIA A100-SXM4-80GB
 sampling idle power for 3s ...
-  idle: 55.6 W
+  idle: 62.5 W
 running 30s stress workload ...
-/usr/local/lib/python3.11/site-packages/torch/_subclasses/functional_tensor.py:295: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
-  cpu = _conversion_method_template(device=torch.device("cpu"))
   duration:       37.6 s
-  energy delta:   8,741.8 J
-  avg power:      232.3 W
+  energy delta:   13,179.5 J
+  avg power:      350.2 W
   monotonic:      True
 ---
-{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 55.60349152542373, "stress_watts_avg": 232.33601101968227, "stress_energy_joules": 8741.791, "stress_duration_s": 37.625639527999994, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": []}
+{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 62.52783050847465, "stress_watts_avg": 350.16403334081195, "stress_energy_joules": 13179.464, "stress_duration_s": 37.637971765, "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": []}
 [modal] running submission (TEST_CHARS=60000 MAX_TRAIN_SECONDS=300.0 ACC_MIN=0.7) ...
 loading WikiText-103 from /data ...
   train chars: 540,095,682
   val   chars: 60,000  (scored, gated by --acc-min)
 train wall-clock cap: 300 s
 val accuracy floor : 0.7000
-/usr/local/lib/python3.11/site-packages/torch/_subclasses/functional_tensor.py:295: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
-  cpu = _conversion_method_template(device=torch.device("cpu"))
 training submission /workspace/modded_nanogpt.py ...
+[codecarbon WARNING @ 05:20:12] Multiple instances of codecarbon are allowed to run at the same time.
 [modded] 10.84M params  cfg=TrainConfig(d=384 L=6 H=6 bs=32 T=1024 steps=2150)
 [modded] step     0/2150  loss 5.5452  elapsed 1s
-[modded] step   100/2150  loss 1.6180  elapsed 13s
-[modded] step   200/2150  loss 1.4124  elapsed 24s
-[modded] step   300/2150  loss 1.3156  elapsed 35s
-[modded] step   400/2150  loss 1.2198  elapsed 46s
-[modded] step   500/2150  loss 1.1874  elapsed 58s
-[modded] step   600/2150  loss 1.1899  elapsed 69s
-[modded] step   700/2150  loss 1.1334  elapsed 80s
-[modded] step   800/2150  loss 1.1112  elapsed 91s
-[modded] step   900/2150  loss 1.1067  elapsed 103s
-[modded] step  1000/2150  loss 1.0968  elapsed 114s
-[modded] step  1100/2150  loss 1.0989  elapsed 125s
-[modded] step  1200/2150  loss 1.0336  elapsed 136s
-[modded] step  1300/2150  loss 1.0725  elapsed 148s
-[modded] step  1400/2150  loss 1.0814  elapsed 159s
-[modded] step  1500/2150  loss 1.0162  elapsed 170s
-[modded] step  1600/2150  loss 1.0225  elapsed 181s
-[modded] step  1700/2150  loss 1.0033  elapsed 193s
-[modded] step  1800/2150  loss 0.9861  elapsed 204s
-[modded] step  1900/2150  loss 0.9606  elapsed 215s
-[modded] step  2000/2150  loss 0.9690  elapsed 227s
-[modded] step  2100/2150  loss 0.9526  elapsed 238s
-[modded] step  2149/2150  loss 0.9696  elapsed 243s
-training: 51,704.3 J   duration=246.6s
+[modded] step   100/2150  loss 1.6118  elapsed 13s
+[modded] step   200/2150  loss 1.4113  elapsed 24s
+[modded] step   300/2150  loss 1.3137  elapsed 35s
+[modded] step   400/2150  loss 1.2343  elapsed 46s
+[modded] step   500/2150  loss 1.2061  elapsed 57s
+[modded] step   600/2150  loss 1.1634  elapsed 68s
+[modded] step   700/2150  loss 1.1344  elapsed 79s
+[modded] step   800/2150  loss 1.1528  elapsed 90s
+[modded] step   900/2150  loss 1.0967  elapsed 101s
+[modded] step  1000/2150  loss 1.1196  elapsed 111s
+[modded] step  1100/2150  loss 1.0883  elapsed 122s
+[modded] step  1200/2150  loss 1.0476  elapsed 133s
+[modded] step  1300/2150  loss 1.0868  elapsed 144s
+[modded] step  1400/2150  loss 1.0414  elapsed 155s
+[modded] step  1500/2150  loss 1.0262  elapsed 166s
+[modded] step  1600/2150  loss 1.0296  elapsed 177s
+[modded] step  1700/2150  loss 0.9912  elapsed 188s
+[modded] step  1800/2150  loss 1.0246  elapsed 199s
+[modded] step  1900/2150  loss 0.9675  elapsed 210s
+[modded] step  2000/2150  loss 0.9918  elapsed 221s
+[modded] step  2100/2150  loss 0.9837  elapsed 233s
+[modded] step  2149/2150  loss 0.9957  elapsed 238s
+training: 51,728.9 J   duration=242.7s
 evaluating on val split ...
-  eval      1,200/60,000 (  2.0%)  acc=0.7342      129 char/s  eta=   456s
-  eval      2,400/60,000 (  4.0%)  acc=0.7212      128 char/s  eta=   449s
-  eval      3,600/60,000 (  6.0%)  acc=0.7258      127 char/s  eta=   444s
-  eval      4,800/60,000 (  8.0%)  acc=0.7342      126 char/s  eta=   436s
-  eval      6,000/60,000 ( 10.0%)  acc=0.7257      127 char/s  eta=   427s
-  eval      7,200/60,000 ( 12.0%)  acc=0.7228      126 char/s  eta=   418s
-  eval      8,400/60,000 ( 14.0%)  acc=0.7201      127 char/s  eta=   408s
-  eval      9,600/60,000 ( 16.0%)  acc=0.7257      127 char/s  eta=   398s
-  eval     10,800/60,000 ( 18.0%)  acc=0.7277      126 char/s  eta=   389s
-  eval     12,000/60,000 ( 20.0%)  acc=0.7286      126 char/s  eta=   381s
-  eval     13,200/60,000 ( 22.0%)  acc=0.7323      126 char/s  eta=   371s
-  eval     14,400/60,000 ( 24.0%)  acc=0.7330      126 char/s  eta=   362s
-  eval     15,600/60,000 ( 26.0%)  acc=0.7355      126 char/s  eta=   353s
-  eval     16,800/60,000 ( 28.0%)  acc=0.7379      126 char/s  eta=   343s
-  eval     18,000/60,000 ( 30.0%)  acc=0.7356      126 char/s  eta=   334s
-  eval     19,200/60,000 ( 32.0%)  acc=0.7364      126 char/s  eta=   325s
-  eval     20,400/60,000 ( 34.0%)  acc=0.7376      126 char/s  eta=   315s
-  eval     21,600/60,000 ( 36.0%)  acc=0.7368      126 char/s  eta=   306s
-  eval     22,800/60,000 ( 38.0%)  acc=0.7375      126 char/s  eta=   296s
-  eval     24,000/60,000 ( 40.0%)  acc=0.7378      125 char/s  eta=   287s
-  eval     25,200/60,000 ( 42.0%)  acc=0.7390      125 char/s  eta=   277s
-  eval     26,400/60,000 ( 44.0%)  acc=0.7399      125 char/s  eta=   268s
-  eval     27,600/60,000 ( 46.0%)  acc=0.7412      125 char/s  eta=   258s
-  eval     28,800/60,000 ( 48.0%)  acc=0.7410      125 char/s  eta=   249s
-  eval     30,000/60,000 ( 50.0%)  acc=0.7398      125 char/s  eta=   239s
-  eval     31,200/60,000 ( 52.0%)  acc=0.7378      125 char/s  eta=   230s
-  eval     32,400/60,000 ( 54.0%)  acc=0.7373      125 char/s  eta=   220s
-  eval     33,600/60,000 ( 56.0%)  acc=0.7355      125 char/s  eta=   211s
-  eval     34,800/60,000 ( 58.0%)  acc=0.7342      125 char/s  eta=   201s
-  eval     36,000/60,000 ( 60.0%)  acc=0.7335      125 char/s  eta=   192s
-  eval     37,200/60,000 ( 62.0%)  acc=0.7335      125 char/s  eta=   182s
-  eval     38,400/60,000 ( 64.0%)  acc=0.7339      125 char/s  eta=   173s
-  eval     39,600/60,000 ( 66.0%)  acc=0.7336      125 char/s  eta=   163s
-  eval     40,800/60,000 ( 68.0%)  acc=0.7331      125 char/s  eta=   153s
-  eval     42,000/60,000 ( 70.0%)  acc=0.7328      125 char/s  eta=   144s
-  eval     43,200/60,000 ( 72.0%)  acc=0.7331      125 char/s  eta=   134s
-  eval     44,400/60,000 ( 74.0%)  acc=0.7331      125 char/s  eta=   125s
-  eval     45,600/60,000 ( 76.0%)  acc=0.7330      125 char/s  eta=   115s
-  eval     46,800/60,000 ( 78.0%)  acc=0.7326      125 char/s  eta=   105s
-  eval     48,000/60,000 ( 80.0%)  acc=0.7331      125 char/s  eta=    96s
-  eval     49,200/60,000 ( 82.0%)  acc=0.7335      125 char/s  eta=    86s
-  eval     50,400/60,000 ( 84.0%)  acc=0.7350      125 char/s  eta=    77s
-  eval     51,600/60,000 ( 86.0%)  acc=0.7355      125 char/s  eta=    67s
-  eval     52,800/60,000 ( 88.0%)  acc=0.7363      125 char/s  eta=    58s
-  eval     54,000/60,000 ( 90.0%)  acc=0.7367      125 char/s  eta=    48s
-  eval     55,200/60,000 ( 92.0%)  acc=0.7360      125 char/s  eta=    38s
-  eval     56,400/60,000 ( 94.0%)  acc=0.7365      126 char/s  eta=    29s
-  eval     57,600/60,000 ( 96.0%)  acc=0.7368      126 char/s  eta=    19s
-  eval     58,800/60,000 ( 98.0%)  acc=0.7374      126 char/s  eta=    10s
-  eval     60,000/60,000 (100.0%)  acc=0.7374      126 char/s  eta=     0s
-chars=60,000  acc=0.7374  eval_duration=475.8s
+  eval      1,200/60,000 (  2.0%)  acc=0.7367      130 char/s  eta=   452s
+  eval      2,400/60,000 (  4.0%)  acc=0.7196      131 char/s  eta=   439s
+  eval      3,600/60,000 (  6.0%)  acc=0.7186      131 char/s  eta=   430s
+  eval      4,800/60,000 (  8.0%)  acc=0.7267      131 char/s  eta=   421s
+  eval      6,000/60,000 ( 10.0%)  acc=0.7170      131 char/s  eta=   413s
+  eval      7,200/60,000 ( 12.0%)  acc=0.7133      131 char/s  eta=   403s
+  eval      8,400/60,000 ( 14.0%)  acc=0.7119      131 char/s  eta=   395s
+  eval      9,600/60,000 ( 16.0%)  acc=0.7160      130 char/s  eta=   386s
+  eval     10,800/60,000 ( 18.0%)  acc=0.7179      130 char/s  eta=   378s
+  eval     12,000/60,000 ( 20.0%)  acc=0.7196      131 char/s  eta=   367s
+  eval     13,200/60,000 ( 22.0%)  acc=0.7245      131 char/s  eta=   358s
+  eval     14,400/60,000 ( 24.0%)  acc=0.7256      131 char/s  eta=   349s
+  eval     15,600/60,000 ( 26.0%)  acc=0.7281      131 char/s  eta=   340s
+  eval     16,800/60,000 ( 28.0%)  acc=0.7308      130 char/s  eta=   331s
+  eval     18,000/60,000 ( 30.0%)  acc=0.7282      130 char/s  eta=   322s
+  eval     19,200/60,000 ( 32.0%)  acc=0.7296      130 char/s  eta=   313s
+  eval     20,400/60,000 ( 34.0%)  acc=0.7313      131 char/s  eta=   303s
+  eval     21,600/60,000 ( 36.0%)  acc=0.7306      130 char/s  eta=   294s
+  eval     22,800/60,000 ( 38.0%)  acc=0.7310      131 char/s  eta=   285s
+  eval     24,000/60,000 ( 40.0%)  acc=0.7311      131 char/s  eta=   275s
+  eval     25,200/60,000 ( 42.0%)  acc=0.7325      131 char/s  eta=   266s
+  eval     26,400/60,000 ( 44.0%)  acc=0.7333      131 char/s  eta=   256s
+  eval     27,600/60,000 ( 46.0%)  acc=0.7345      131 char/s  eta=   247s
+  eval     28,800/60,000 ( 48.0%)  acc=0.7347      131 char/s  eta=   238s
+  eval     30,000/60,000 ( 50.0%)  acc=0.7342      131 char/s  eta=   228s
+  eval     31,200/60,000 ( 52.0%)  acc=0.7326      131 char/s  eta=   219s
+  eval     32,400/60,000 ( 54.0%)  acc=0.7320      131 char/s  eta=   210s
+  eval     33,600/60,000 ( 56.0%)  acc=0.7303      131 char/s  eta=   201s
+  eval     34,800/60,000 ( 58.0%)  acc=0.7290      132 char/s  eta=   192s
+  eval     36,000/60,000 ( 60.0%)  acc=0.7282      132 char/s  eta=   182s
+  eval     37,200/60,000 ( 62.0%)  acc=0.7287      132 char/s  eta=   173s
+  eval     38,400/60,000 ( 64.0%)  acc=0.7291      132 char/s  eta=   164s
+  eval     39,600/60,000 ( 66.0%)  acc=0.7292      132 char/s  eta=   155s
+  eval     40,800/60,000 ( 68.0%)  acc=0.7289      131 char/s  eta=   146s
+  eval     42,000/60,000 ( 70.0%)  acc=0.7286      131 char/s  eta=   137s
+  eval     43,200/60,000 ( 72.0%)  acc=0.7291      131 char/s  eta=   128s
+  eval     44,400/60,000 ( 74.0%)  acc=0.7289      131 char/s  eta=   119s
+  eval     45,600/60,000 ( 76.0%)  acc=0.7290      131 char/s  eta=   110s
+  eval     46,800/60,000 ( 78.0%)  acc=0.7286      131 char/s  eta=   101s
+  eval     48,000/60,000 ( 80.0%)  acc=0.7290      131 char/s  eta=    92s
+  eval     49,200/60,000 ( 82.0%)  acc=0.7293      131 char/s  eta=    83s
+  eval     50,400/60,000 ( 84.0%)  acc=0.7307      131 char/s  eta=    73s
+  eval     51,600/60,000 ( 86.0%)  acc=0.7314      131 char/s  eta=    64s
+  eval     52,800/60,000 ( 88.0%)  acc=0.7322      131 char/s  eta=    55s
+  eval     54,000/60,000 ( 90.0%)  acc=0.7326      131 char/s  eta=    46s
+  eval     55,200/60,000 ( 92.0%)  acc=0.7320      131 char/s  eta=    37s
+  eval     56,400/60,000 ( 94.0%)  acc=0.7325      131 char/s  eta=    27s
+  eval     57,600/60,000 ( 96.0%)  acc=0.7329      131 char/s  eta=    18s
+  eval     58,800/60,000 ( 98.0%)  acc=0.7336      131 char/s  eta=     9s
+  eval     60,000/60,000 (100.0%)  acc=0.7337      131 char/s  eta=     0s
+chars=60,000  acc=0.7337  eval_duration=457.9s
 ---
 submission         : modded_nanogpt
-training energy (J): 51,704.3
-training duration  : 246.6s
-val  char-accuracy : 0.7374
+training energy (J): 51,728.9
+training duration  : 242.7s
+val  char-accuracy : 0.7337
 val  chars         : 60,000
 wrote /tmp/result.json
 Stopping app - local entrypoint completed.
 ✓ App completed. View run at 
-https://modal.com/apps/ab-10/main/ap-E9Q8eIo1sjKa6HsN23nCdR
+https://modal.com/apps/gabriel-nakajima-an/main/ap-8CUCBSgp30OSqDLk4DjLGC
 
 # final result
 {
   "submission": "modded_nanogpt",
-  "training_energy_J": 51704.306257950004,
-  "training_duration_s": 246.648394841,
-  "val_char_accuracy": 0.7373666666666666,
+  "training_energy_J": 51728.92952335,
+  "training_duration_s": 242.66790953299997,
+  "cpu_energy_J": 10276.997479117497,
+  "total_energy_J": 62005.9270024675,
+  "val_char_accuracy": 0.7336833333333334,
   "val_chars": 60000,
-  "gpu_name": "NVIDIA A100 80GB PCIe",
-  "date_utc": "2026-05-12T22:10:25Z",
+  "gpu_name": "NVIDIA A100-SXM4-80GB",
+  "date_utc": "2026-05-21T05:31:57Z",
   "_nvml": {
     "nvml_available": true,
     "energy_counter_supported": true,
     "monotonic": true,
-    "idle_watts": 55.60349152542373,
-    "stress_watts_avg": 232.33601101968227,
-    "stress_energy_joules": 8741.791,
-    "stress_duration_s": 37.625639527999994,
-    "gpu_name": "NVIDIA A100 80GB PCIe",
+    "idle_watts": 62.52783050847465,
+    "stress_watts_avg": 350.16403334081195,
+    "stress_energy_joules": 13179.464,
+    "stress_duration_s": 37.637971765,
+    "gpu_name": "NVIDIA A100-SXM4-80GB",
     "notes": []
   },
   "contributor": "@ab-10"
diff --git a/submissions/paq_mixer_v3/nvml.json b/submissions/paq_mixer_v3/nvml.json
index 3f0ff32..8e05d5d 100644
--- a/submissions/paq_mixer_v3/nvml.json
+++ b/submissions/paq_mixer_v3/nvml.json
@@ -2,10 +2,10 @@
   "nvml_available": true,
   "energy_counter_supported": true,
   "monotonic": true,
-  "idle_watts": 66.40836666666658,
-  "stress_watts_avg": 345.33368906733165,
-  "stress_energy_joules": 13068.953,
-  "stress_duration_s": 37.84441951,
-  "gpu_name": "NVIDIA A100-SXM4-80GB",
+  "idle_watts": 56.77781666666669,
+  "stress_watts_avg": 232.3223612785609,
+  "stress_energy_joules": 8489.282,
+  "stress_duration_s": 36.540959524,
+  "gpu_name": "NVIDIA A100 80GB PCIe",
   "notes": []
 }
diff --git a/submissions/paq_mixer_v3/result.json b/submissions/paq_mixer_v3/result.json
index 91bd8e5..da2d7dc 100644
--- a/submissions/paq_mixer_v3/result.json
+++ b/submissions/paq_mixer_v3/result.json
@@ -1,22 +1,22 @@
 {
   "submission": "paq_mixer_v3",
-  "training_energy_J": 3582.3155354,
-  "training_duration_s": 122.294609292,
-  "cpu_energy_J": 5167.742507545003,
-  "total_energy_J": 8750.058042945002,
+  "training_energy_J": 2355.22674605,
+  "training_duration_s": 53.155205079,
+  "cpu_energy_J": 2251.575883315001,
+  "total_energy_J": 4606.802629365002,
   "val_char_accuracy": 0.70475,
   "val_chars": 60000,
-  "gpu_name": "NVIDIA A100-SXM4-80GB",
-  "date_utc": "2026-05-20T07:12:19Z",
+  "gpu_name": "NVIDIA A100 80GB PCIe",
+  "date_utc": "2026-05-21T05:05:13Z",
   "_nvml": {
     "nvml_available": true,
     "energy_counter_supported": true,
     "monotonic": true,
-    "idle_watts": 66.40836666666658,
-    "stress_watts_avg": 345.33368906733165,
-    "stress_energy_joules": 13068.953,
-    "stress_duration_s": 37.84441951,
-    "gpu_name": "NVIDIA A100-SXM4-80GB",
+    "idle_watts": 56.77781666666669,
+    "stress_watts_avg": 232.3223612785609,
+    "stress_energy_joules": 8489.282,
+    "stress_duration_s": 36.540959524,
+    "gpu_name": "NVIDIA A100 80GB PCIe",
     "notes": []
   },
   "contributor": "@worker-paq-mixer"
diff --git a/submissions/paq_mixer_v3/run.log b/submissions/paq_mixer_v3/run.log
index e2a2fd6..a476cef 100644
--- a/submissions/paq_mixer_v3/run.log
+++ b/submissions/paq_mixer_v3/run.log
@@ -1,7 +1,7 @@
-# wikitext submit.py log — paq_mixer_v3 — 2026-05-20T07:08:40+00:00Z
+# wikitext submit.py log — paq_mixer_v3 — 2026-05-21T05:03:02+00:00Z
 [modal] launching A100-80GB ...
 ✓ Initialized. View run at 
-https://modal.com/apps/gabriel-nakajima-an/main/ap-Sm0PHVmoPmOFQsokhYXdqV
+https://modal.com/apps/gabriel-nakajima-an/main/ap-Dycl40rOPJQ1MEo4LFRzo5
 ✓ Created objects.
 ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/submit.py
 ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/task.py
@@ -10,16 +10,16 @@ https://modal.com/apps/gabriel-nakajima-an/main/ap-Sm0PHVmoPmOFQsokhYXdqV
 ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/run_eval.py
 └── 🔨 Created function run_submission.
 [modal] verifying NVML energy counter ...
-GPU: NVIDIA A100-SXM4-80GB
+GPU: NVIDIA A100 80GB PCIe
 sampling idle power for 3s ...
-  idle: 66.4 W
+  idle: 56.8 W
 running 30s stress workload ...
-  duration:       37.8 s
-  energy delta:   13,069.0 J
-  avg power:      345.3 W
+  duration:       36.5 s
+  energy delta:   8,489.3 J
+  avg power:      232.3 W
   monotonic:      True
 ---
-{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 66.40836666666658, "stress_watts_avg": 345.33368906733165, "stress_energy_joules": 13068.953, "stress_duration_s": 37.84441951, "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": []}
+{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 56.77781666666669, "stress_watts_avg": 232.3223612785609, "stress_energy_joules": 8489.282, "stress_duration_s": 36.540959524, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": []}
 [modal] running submission (TEST_CHARS=60000 MAX_TRAIN_SECONDS=300.0 ACC_MIN=0.7) ...
 loading WikiText-103 from /data ...
   train chars: 540,095,682
@@ -27,23 +27,23 @@ loading WikiText-103 from /data ...
 train wall-clock cap: 300 s
 val accuracy floor : 0.7000
 training submission /workspace/paq_mixer_v3.py ...
-[codecarbon WARNING @ 07:09:48] Multiple instances of codecarbon are allowed to run at the same time.
+[codecarbon WARNING @ 05:03:59] Multiple instances of codecarbon are allowed to run at the same time.
 [paq_mixer] device=cuda K=11 max_ctx_len=10 WB_DISCOUNT=1.0
-[paq_mixer] encoded 539,096,898 train bytes (0.8s); heldout=2,000,000 bytes
-[paq_mixer] top order=11 unique pairs: 118,988,639  2.1s
-[paq_mixer] order k=11 ctx_len=10 ctxs=84,084,448 rows=118,988,639  39.1s
-[paq_mixer] order k=10 ctx_len=9 ctxs=54,600,791 rows=84,084,448  28.6s
-[paq_mixer] order k=9 ctx_len=8 ctxs=31,859,845 rows=54,600,791  10.5s
-[paq_mixer] order k=8 ctx_len=7 ctxs=16,254,833 rows=31,859,845  2.9s
+[paq_mixer] encoded 539,096,898 train bytes (0.6s); heldout=2,000,000 bytes
+[paq_mixer] top order=11 unique pairs: 118,988,639  1.9s
+[paq_mixer] order k=11 ctx_len=10 ctxs=84,084,448 rows=118,988,639  14.8s
+[paq_mixer] order k=10 ctx_len=9 ctxs=54,600,791 rows=84,084,448  9.3s
+[paq_mixer] order k=9 ctx_len=8 ctxs=31,859,845 rows=54,600,791  5.7s
+[paq_mixer] order k=8 ctx_len=7 ctxs=16,254,833 rows=31,859,845  2.8s
 [paq_mixer] order k=7 ctx_len=6 ctxs=7,004,457 rows=16,254,833  1.3s
-[paq_mixer] order k=6 ctx_len=5 ctxs=2,434,266 rows=7,004,457  0.5s
+[paq_mixer] order k=6 ctx_len=5 ctxs=2,434,266 rows=7,004,457  0.4s
 [paq_mixer] order k=5 ctx_len=4 ctxs=636,106 rows=2,434,266  0.1s
 [paq_mixer] order k=4 ctx_len=3 ctxs=122,668 rows=636,106  0.0s
 [paq_mixer] order k=3 ctx_len=2 ctxs=12,277 rows=122,668  0.0s
 [paq_mixer] order k=2 ctx_len=1 ctxs=204 rows=12,277  0.0s
 [paq_mixer] order k=1 ctx_len=0 ctxs=1 rows=204  0.0s
-[paq_mixer] tables built in 86.2s
-[paq_mixer] collected 200,000 mixer training samples feat_dim=34 (29.5s)
+[paq_mixer] tables built in 37.1s
+[paq_mixer] collected 200,000 mixer training samples feat_dim=34 (12.0s)
 [paq_mixer] mixer step=   0 loss=1.4004
 [paq_mixer] mixer step= 187 loss=1.0435
 [paq_mixer] mixer step= 374 loss=1.0322
@@ -54,95 +54,93 @@ training submission /workspace/paq_mixer_v3.py ...
 [paq_mixer] mixer step=1309 loss=1.0324
 [paq_mixer] mixer step=1496 loss=1.0380
 [paq_mixer] mixer step=1499 loss=1.0537
-[paq_mixer] mixer fit done 5.1s last_loss=1.0537
-[paq_mixer] total build: 120.8s
-training: 3,582.3 J   duration=122.3s
+[paq_mixer] mixer fit done 3.2s last_loss=1.0537
+[paq_mixer] total build: 52.4s
+training: 2,355.2 J   duration=53.2s
 evaluating on val split ...
-  eval      1,200/60,000 (  2.0%)  acc=0.6833     2522 char/s  eta=    23s
-  eval      2,400/60,000 (  4.0%)  acc=0.6729     2518 char/s  eta=    23s
-  eval      3,600/60,000 (  6.0%)  acc=0.6700     2521 char/s  eta=    22s
-  eval      4,800/60,000 (  8.0%)  acc=0.6848     2517 char/s  eta=    22s
-  eval      6,000/60,000 ( 10.0%)  acc=0.6850     2520 char/s  eta=    21s
-  eval      7,200/60,000 ( 12.0%)  acc=0.6775     2524 char/s  eta=    21s
-  eval      8,400/60,000 ( 14.0%)  acc=0.6774     2526 char/s  eta=    20s
-  eval      9,600/60,000 ( 16.0%)  acc=0.6849     2524 char/s  eta=    20s
-  eval     10,800/60,000 ( 18.0%)  acc=0.6939     2521 char/s  eta=    20s
-  eval     12,000/60,000 ( 20.0%)  acc=0.6975     2519 char/s  eta=    19s
-  eval     13,200/60,000 ( 22.0%)  acc=0.7022     2516 char/s  eta=    19s
-  eval     14,400/60,000 ( 24.0%)  acc=0.7037     2515 char/s  eta=    18s
-  eval     15,600/60,000 ( 26.0%)  acc=0.7051     2514 char/s  eta=    18s
-  eval     16,800/60,000 ( 28.0%)  acc=0.7083     2512 char/s  eta=    17s
-  eval     18,000/60,000 ( 30.0%)  acc=0.7100     2511 char/s  eta=    17s
-  eval     19,200/60,000 ( 32.0%)  acc=0.7136     2509 char/s  eta=    16s
-  eval     20,400/60,000 ( 34.0%)  acc=0.7152     2508 char/s  eta=    16s
-  eval     21,600/60,000 ( 36.0%)  acc=0.7161     2509 char/s  eta=    15s
-  eval     22,800/60,000 ( 38.0%)  acc=0.7164     2508 char/s  eta=    15s
-  eval     24,000/60,000 ( 40.0%)  acc=0.7166     2508 char/s  eta=    14s
-  eval     25,200/60,000 ( 42.0%)  acc=0.7170     2509 char/s  eta=    14s
-  eval     26,400/60,000 ( 44.0%)  acc=0.7180     2509 char/s  eta=    13s
-  eval     27,600/60,000 ( 46.0%)  acc=0.7164     2509 char/s  eta=    13s
-  eval     28,800/60,000 ( 48.0%)  acc=0.7162     2511 char/s  eta=    12s
-  eval     30,000/60,000 ( 50.0%)  acc=0.7146     2512 char/s  eta=    12s
-  eval     31,200/60,000 ( 52.0%)  acc=0.7113     2514 char/s  eta=    11s
-  eval     32,400/60,000 ( 54.0%)  acc=0.7089     2516 char/s  eta=    11s
-  eval     33,600/60,000 ( 56.0%)  acc=0.7064     2516 char/s  eta=    10s
-  eval     34,800/60,000 ( 58.0%)  acc=0.7067     2516 char/s  eta=    10s
-  eval     36,000/60,000 ( 60.0%)  acc=0.7065     2515 char/s  eta=    10s
-  eval     37,200/60,000 ( 62.0%)  acc=0.7065     2514 char/s  eta=     9s
-  eval     38,400/60,000 ( 64.0%)  acc=0.7070     2513 char/s  eta=     9s
-  eval     39,600/60,000 ( 66.0%)  acc=0.7064     2513 char/s  eta=     8s
-  eval     40,800/60,000 ( 68.0%)  acc=0.7062     2512 char/s  eta=     8s
-  eval     42,000/60,000 ( 70.0%)  acc=0.7056     2510 char/s  eta=     7s
-  eval     43,200/60,000 ( 72.0%)  acc=0.7050     2510 char/s  eta=     7s
-  eval     44,400/60,000 ( 74.0%)  acc=0.7052     2510 char/s  eta=     6s
-  eval     45,600/60,000 ( 76.0%)  acc=0.7054     2510 char/s  eta=     6s
-  eval     46,800/60,000 ( 78.0%)  acc=0.7047     2511 char/s  eta=     5s
-  eval     48,000/60,000 ( 80.0%)  acc=0.7049     2512 char/s  eta=     5s
-  eval     49,200/60,000 ( 82.0%)  acc=0.7043     2513 char/s  eta=     4s
-  eval     50,400/60,000 ( 84.0%)  acc=0.7046     2514 char/s  eta=     4s
-  eval     51,600/60,000 ( 86.0%)  acc=0.7047     2515 char/s  eta=     3s
-  eval     52,800/60,000 ( 88.0%)  acc=0.7034     2518 char/s  eta=     3s
-  eval     54,000/60,000 ( 90.0%)  acc=0.7034     2520 char/s  eta=     2s
-  eval     55,200/60,000 ( 92.0%)  acc=0.7028     2521 char/s  eta=     2s
-  eval     56,400/60,000 ( 94.0%)  acc=0.7021     2522 char/s  eta=     1s
-  eval     57,600/60,000 ( 96.0%)  acc=0.7028     2523 char/s  eta=     1s
-  eval     58,800/60,000 ( 98.0%)  acc=0.7036     2524 char/s  eta=     0s
-  eval     60,000/60,000 (100.0%)  acc=0.7047     2525 char/s  eta=     0s
-chars=60,000  acc=0.7047  eval_duration=23.8s
+  eval      1,200/60,000 (  2.0%)  acc=0.6833     3703 char/s  eta=    16s
+  eval      2,400/60,000 (  4.0%)  acc=0.6729     3699 char/s  eta=    16s
+  eval      3,600/60,000 (  6.0%)  acc=0.6700     3707 char/s  eta=    15s
+  eval      4,800/60,000 (  8.0%)  acc=0.6848     3708 char/s  eta=    15s
+  eval      6,000/60,000 ( 10.0%)  acc=0.6850     3715 char/s  eta=    15s
+  eval      7,200/60,000 ( 12.0%)  acc=0.6775     3721 char/s  eta=    14s
+  eval      8,400/60,000 ( 14.0%)  acc=0.6774     3727 char/s  eta=    14s
+  eval      9,600/60,000 ( 16.0%)  acc=0.6849     3727 char/s  eta=    14s
+  eval     10,800/60,000 ( 18.0%)  acc=0.6939     3725 char/s  eta=    13s
+  eval     12,000/60,000 ( 20.0%)  acc=0.6975     3724 char/s  eta=    13s
+  eval     13,200/60,000 ( 22.0%)  acc=0.7022     3723 char/s  eta=    13s
+  eval     14,400/60,000 ( 24.0%)  acc=0.7037     3721 char/s  eta=    12s
+  eval     15,600/60,000 ( 26.0%)  acc=0.7051     3721 char/s  eta=    12s
+  eval     16,800/60,000 ( 28.0%)  acc=0.7083     3721 char/s  eta=    12s
+  eval     18,000/60,000 ( 30.0%)  acc=0.7100     3721 char/s  eta=    11s
+  eval     19,200/60,000 ( 32.0%)  acc=0.7136     3719 char/s  eta=    11s
+  eval     20,400/60,000 ( 34.0%)  acc=0.7152     3718 char/s  eta=    11s
+  eval     21,600/60,000 ( 36.0%)  acc=0.7161     3719 char/s  eta=    10s
+  eval     22,800/60,000 ( 38.0%)  acc=0.7164     3718 char/s  eta=    10s
+  eval     24,000/60,000 ( 40.0%)  acc=0.7166     3718 char/s  eta=    10s
+  eval     25,200/60,000 ( 42.0%)  acc=0.7170     3718 char/s  eta=     9s
+  eval     26,400/60,000 ( 44.0%)  acc=0.7180     3718 char/s  eta=     9s
+  eval     27,600/60,000 ( 46.0%)  acc=0.7164     3718 char/s  eta=     9s
+  eval     28,800/60,000 ( 48.0%)  acc=0.7162     3720 char/s  eta=     8s
+  eval     30,000/60,000 ( 50.0%)  acc=0.7146     3721 char/s  eta=     8s
+  eval     31,200/60,000 ( 52.0%)  acc=0.7113     3723 char/s  eta=     8s
+  eval     32,400/60,000 ( 54.0%)  acc=0.7089     3726 char/s  eta=     7s
+  eval     33,600/60,000 ( 56.0%)  acc=0.7064     3728 char/s  eta=     7s
+  eval     34,800/60,000 ( 58.0%)  acc=0.7067     3728 char/s  eta=     7s
+  eval     36,000/60,000 ( 60.0%)  acc=0.7065     3728 char/s  eta=     6s
+  eval     37,200/60,000 ( 62.0%)  acc=0.7065     3728 char/s  eta=     6s
+  eval     38,400/60,000 ( 64.0%)  acc=0.7070     3728 char/s  eta=     6s
+  eval     39,600/60,000 ( 66.0%)  acc=0.7064     3728 char/s  eta=     5s
+  eval     40,800/60,000 ( 68.0%)  acc=0.7062     3727 char/s  eta=     5s
+  eval     42,000/60,000 ( 70.0%)  acc=0.7056     3727 char/s  eta=     5s
+  eval     43,200/60,000 ( 72.0%)  acc=0.7050     3726 char/s  eta=     5s
+  eval     44,400/60,000 ( 74.0%)  acc=0.7052     3726 char/s  eta=     4s
+  eval     45,600/60,000 ( 76.0%)  acc=0.7054     3725 char/s  eta=     4s
+  eval     46,800/60,000 ( 78.0%)  acc=0.7047     3725 char/s  eta=     4s
+  eval     48,000/60,000 ( 80.0%)  acc=0.7049     3724 char/s  eta=     3s
+  eval     49,200/60,000 ( 82.0%)  acc=0.7043     3724 char/s  eta=     3s
+  eval     50,400/60,000 ( 84.0%)  acc=0.7046     3724 char/s  eta=     3s
+  eval     51,600/60,000 ( 86.0%)  acc=0.7047     3724 char/s  eta=     2s
+  eval     52,800/60,000 ( 88.0%)  acc=0.7034     3727 char/s  eta=     2s
+  eval     54,000/60,000 ( 90.0%)  acc=0.7034     3727 char/s  eta=     2s
+  eval     55,200/60,000 ( 92.0%)  acc=0.7028     3728 char/s  eta=     1s
+  eval     56,400/60,000 ( 94.0%)  acc=0.7021     3728 char/s  eta=     1s
+  eval     57,600/60,000 ( 96.0%)  acc=0.7028     3727 char/s  eta=     1s
+  eval     58,800/60,000 ( 98.0%)  acc=0.7036     3727 char/s  eta=     0s
+  eval     60,000/60,000 (100.0%)  acc=0.7047     3727 char/s  eta=     0s
+chars=60,000  acc=0.7047  eval_duration=16.1s
 ---
 submission         : paq_mixer_v3
-training energy (J): 3,582.3
-training duration  : 122.3s
+training energy (J): 2,355.2
+training duration  : 53.2s
 val  char-accuracy : 0.7047
 val  chars         : 60,000
 wrote /tmp/result.json
 Stopping app - local entrypoint completed.
 ✓ App completed. View run at 
-https://modal.com/apps/gabriel-nakajima-an/main/ap-Sm0PHVmoPmOFQsokhYXdqV
+https://modal.com/apps/gabriel-nakajima-an/main/ap-Dycl40rOPJQ1MEo4LFRzo5
 
 # final result
 {
   "submission": "paq_mixer_v3",
-  "training_energy_J": 3582.3155354,
-  "training_duration_s": 122.294609292,
-  "cpu_energy_J": 5167.742507545003,
-  "total_energy_J": 8750.058042945002,
+  "training_energy_J": 2355.22674605,
+  "training_duration_s": 53.155205079,
+  "cpu_energy_J": 2251.575883315001,
+  "total_energy_J": 4606.802629365002,
   "val_char_accuracy": 0.70475,
   "val_chars": 60000,
-  "gpu_name": "NVIDIA A100-SXM4-80GB",
-  "date_utc": "2026-05-20T07:12:19Z",
+  "gpu_name": "NVIDIA A100 80GB PCIe",
+  "date_utc": "2026-05-21T05:05:13Z",
   "_nvml": {
     "nvml_available": true,
     "energy_counter_supported": true,
     "monotonic": true,
-    "idle_watts": 66.40836666666658,
-    "stress_watts_avg": 345.33368906733165,
-    "stress_energy_joules": 13068.953,
-    "stress_duration_s": 37.84441951,
-    "gpu_name": "NVIDIA A100-SXM4-80GB",
+    "idle_watts": 56.77781666666669,
+    "stress_watts_avg": 232.3223612785609,
+    "stress_energy_joules": 8489.282,
+    "stress_duration_s": 36.540959524,
+    "gpu_name": "NVIDIA A100 80GB PCIe",
     "notes": []
   },
   "contributor": "@worker-paq-mixer"
 }
--mixer"
-}
diff --git a/submissions/subset_70_mkn/nvml.json b/submissions/subset_70_mkn/nvml.json
index 8380f6b..0ec523a 100644
--- a/submissions/subset_70_mkn/nvml.json
+++ b/submissions/subset_70_mkn/nvml.json
@@ -2,10 +2,10 @@
   "nvml_available": true,
   "energy_counter_supported": true,
   "monotonic": true,
-  "idle_watts": 57.06305084745769,
-  "stress_watts_avg": 333.0922109881346,
-  "stress_energy_joules": 12488.622,
-  "stress_duration_s": 37.492987191,
-  "gpu_name": "NVIDIA A100-SXM4-80GB",
+  "idle_watts": 58.36485000000002,
+  "stress_watts_avg": 233.1941013031682,
+  "stress_energy_joules": 8767.363,
+  "stress_duration_s": 37.596847223,
+  "gpu_name": "NVIDIA A100 80GB PCIe",
   "notes": []
 }
diff --git a/submissions/subset_70_mkn/result.json b/submissions/subset_70_mkn/result.json
index 2ca5e57..e956192 100644
--- a/submissions/subset_70_mkn/result.json
+++ b/submissions/subset_70_mkn/result.json
@@ -1,22 +1,22 @@
 {
   "submission": "subset_70_mkn",
-  "training_energy_J": 1064.6838474000006,
-  "training_duration_s": 41.054503051999994,
-  "cpu_energy_J": 1736.325936897499,
-  "total_energy_J": 2801.0097842974997,
+  "training_energy_J": 1350.8209175499999,
+  "training_duration_s": 26.514841649000005,
+  "cpu_energy_J": 1123.5610902799988,
+  "total_energy_J": 2474.3820078299987,
   "val_char_accuracy": 0.7031333333333334,
   "val_chars": 60000,
-  "gpu_name": "NVIDIA A100-SXM4-80GB",
-  "date_utc": "2026-05-20T07:32:40Z",
+  "gpu_name": "NVIDIA A100 80GB PCIe",
+  "date_utc": "2026-05-21T05:05:01Z",
   "_nvml": {
     "nvml_available": true,
     "energy_counter_supported": true,
     "monotonic": true,
-    "idle_watts": 57.06305084745769,
-    "stress_watts_avg": 333.0922109881346,
-    "stress_energy_joules": 12488.622,
-    "stress_duration_s": 37.492987191,
-    "gpu_name": "NVIDIA A100-SXM4-80GB",
+    "idle_watts": 58.36485000000002,
+    "stress_watts_avg": 233.1941013031682,
+    "stress_energy_joules": 8767.363,
+    "stress_duration_s": 37.596847223,
+    "gpu_name": "NVIDIA A100 80GB PCIe",
     "notes": []
   },
   "contributor": "@exp-batch-iter4"
diff --git a/submissions/subset_70_mkn/run.log b/submissions/subset_70_mkn/run.log
index 46c551e..4a514ec 100644
--- a/submissions/subset_70_mkn/run.log
+++ b/submissions/subset_70_mkn/run.log
@@ -1,25 +1,25 @@
-# wikitext submit.py log — subset_70_mkn — 2026-05-20T07:30:25+00:00Z
+# wikitext submit.py log — subset_70_mkn — 2026-05-21T05:03:02+00:00Z
 [modal] launching A100-80GB ...
 ✓ Initialized. View run at 
-https://modal.com/apps/gabriel-nakajima-an/main/ap-shQS1Hiyo4OPhMcNN4Xy5N
+https://modal.com/apps/gabriel-nakajima-an/main/ap-TnCfSdLjln33sQ58a3CqLJ
 ✓ Created objects.
 ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/submit.py
 ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/task.py
 ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/verify_nvml.py
-├── 🔨 Created mount /Users/naka/src/sutro/wikitext/run_eval.py
 ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/wikitext.py
+├── 🔨 Created mount /Users/naka/src/sutro/wikitext/run_eval.py
 └── 🔨 Created function run_submission.
 [modal] verifying NVML energy counter ...
-GPU: NVIDIA A100-SXM4-80GB
+GPU: NVIDIA A100 80GB PCIe
 sampling idle power for 3s ...
-  idle: 57.1 W
+  idle: 58.4 W
 running 30s stress workload ...
-  duration:       37.5 s
-  energy delta:   12,488.6 J
-  avg power:      333.1 W
+  duration:       37.6 s
+  energy delta:   8,767.4 J
+  avg power:      233.2 W
   monotonic:      True
 ---
-{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 57.06305084745769, "stress_watts_avg": 333.0922109881346, "stress_energy_joules": 12488.622, "stress_duration_s": 37.492987191, "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": []}
+{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 58.36485000000002, "stress_watts_avg": 233.1941013031682, "stress_energy_joules": 8767.363, "stress_duration_s": 37.596847223, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": []}
 [modal] running submission (TEST_CHARS=60000 MAX_TRAIN_SECONDS=300.0 ACC_MIN=0.7) ...
 loading WikiText-103 from /data ...
   train chars: 540,095,682
@@ -27,17 +27,17 @@ loading WikiText-103 from /data ...
 train wall-clock cap: 300 s
 val accuracy floor : 0.7000
 training submission /workspace/subset_70_mkn.py ...
-[codecarbon WARNING @ 07:31:20] Multiple instances of codecarbon are allowed to run at the same time.
+[codecarbon WARNING @ 05:04:02] Multiple instances of codecarbon are allowed to run at the same time.
 [gpu_ngram_w3] starting build; max_order=11 D=0.5
 [gpu_ngram_w3] SUBSET 0.7 -> 378,767,828 train bytes
-[gpu_ngram_w3] encoded train: 378,767,828 bytes (0.6s)
+[gpu_ngram_w3] encoded train: 378,767,828 bytes (0.3s)
 [gpu_ngram_w3] top order=11 unique pairs: 93,376,155  1.4s
-[gpu_ngram_w3] ctx_len=10 ctxs=66,967,773 rows=93,376,155  15.2s
-[gpu_ngram_w3] ctx_len=9 ctxs=44,196,096 rows=66,967,774  9.9s
-[gpu_ngram_w3] ctx_len=8 ctxs=26,241,880 rows=44,196,096  5.7s
-[gpu_ngram_w3] ctx_len=7 ctxs=13,634,362 rows=26,241,880  3.3s
-[gpu_ngram_w3] ctx_len=6 ctxs=5,986,883 rows=13,634,362  1.5s
-[gpu_ngram_w3] ctx_len=5 ctxs=2,116,383 rows=5,986,883  0.6s
+[gpu_ngram_w3] ctx_len=10 ctxs=66,967,773 rows=93,376,155  8.5s
+[gpu_ngram_w3] ctx_len=9 ctxs=44,196,096 rows=66,967,774  6.9s
+[gpu_ngram_w3] ctx_len=8 ctxs=26,241,880 rows=44,196,096  4.3s
+[gpu_ngram_w3] ctx_len=7 ctxs=13,634,362 rows=26,241,880  2.0s
+[gpu_ngram_w3] ctx_len=6 ctxs=5,986,883 rows=13,634,362  1.2s
+[gpu_ngram_w3] ctx_len=5 ctxs=2,116,383 rows=5,986,883  0.4s
 [gpu_ngram_w3] ctx_len=4 ctxs=562,545 rows=2,116,383  0.1s
 [gpu_ngram_w3] ctx_len=3 ctxs=110,361 rows=562,545  0.0s
 [gpu_ngram_w3] ctx_len=2 ctxs=11,730 rows=110,361  0.0s
@@ -53,92 +53,92 @@ training submission /workspace/subset_70_mkn.py ...
 [mkn] k=8 D1=0.658 D2=1.091 D3=1.442 (n1=25150544, n2=6528748, n3=3003917)
 [mkn] k=9 D1=0.683 D2=1.108 D3=1.436 (n1=41521363, n2=9620057, n3=4187138)
 [mkn] k=10 D1=0.708 D2=1.126 D3=1.431 (n1=62211762, n2=12816182, n3=5273752)
-[mkn] discounts computed: 1.3s
-[gpu_ngram_w3] total build: 39.7s
-training: 1,064.7 J   duration=41.1s
+[mkn] discounts computed: 0.6s
+[gpu_ngram_w3] total build: 25.7s
+training: 1,350.8 J   duration=26.5s
 evaluating on val split ...
-  eval      1,200/60,000 (  2.0%)  acc=0.6883     1728 char/s  eta=    34s
-  eval      2,400/60,000 (  4.0%)  acc=0.6746     1762 char/s  eta=    33s
-  eval      3,600/60,000 (  6.0%)  acc=0.6722     1724 char/s  eta=    33s
-  eval      4,800/60,000 (  8.0%)  acc=0.6867     1741 char/s  eta=    32s
-  eval      6,000/60,000 ( 10.0%)  acc=0.6870     1757 char/s  eta=    31s
-  eval      7,200/60,000 ( 12.0%)  acc=0.6806     1752 char/s  eta=    30s
-  eval      8,400/60,000 ( 14.0%)  acc=0.6799     1772 char/s  eta=    29s
-  eval      9,600/60,000 ( 16.0%)  acc=0.6864     1762 char/s  eta=    29s
-  eval     10,800/60,000 ( 18.0%)  acc=0.6951     1786 char/s  eta=    28s
-  eval     12,000/60,000 ( 20.0%)  acc=0.6977     1760 char/s  eta=    27s
-  eval     13,200/60,000 ( 22.0%)  acc=0.7017     1747 char/s  eta=    27s
-  eval     14,400/60,000 ( 24.0%)  acc=0.7035     1765 char/s  eta=    26s
-  eval     15,600/60,000 ( 26.0%)  acc=0.7056     1749 char/s  eta=    25s
-  eval     16,800/60,000 ( 28.0%)  acc=0.7089     1747 char/s  eta=    25s
-  eval     18,000/60,000 ( 30.0%)  acc=0.7106     1730 char/s  eta=    24s
-  eval     19,200/60,000 ( 32.0%)  acc=0.7143     1733 char/s  eta=    24s
-  eval     20,400/60,000 ( 34.0%)  acc=0.7155     1738 char/s  eta=    23s
-  eval     21,600/60,000 ( 36.0%)  acc=0.7163     1746 char/s  eta=    22s
-  eval     22,800/60,000 ( 38.0%)  acc=0.7168     1741 char/s  eta=    21s
-  eval     24,000/60,000 ( 40.0%)  acc=0.7168     1752 char/s  eta=    21s
-  eval     25,200/60,000 ( 42.0%)  acc=0.7169     1761 char/s  eta=    20s
-  eval     26,400/60,000 ( 44.0%)  acc=0.7181     1765 char/s  eta=    19s
-  eval     27,600/60,000 ( 46.0%)  acc=0.7165     1756 char/s  eta=    18s
-  eval     28,800/60,000 ( 48.0%)  acc=0.7165     1760 char/s  eta=    18s
-  eval     30,000/60,000 ( 50.0%)  acc=0.7152     1769 char/s  eta=    17s
-  eval     31,200/60,000 ( 52.0%)  acc=0.7122     1777 char/s  eta=    16s
-  eval     32,400/60,000 ( 54.0%)  acc=0.7098     1780 char/s  eta=    16s
-  eval     33,600/60,000 ( 56.0%)  acc=0.7074     1783 char/s  eta=    15s
-  eval     34,800/60,000 ( 58.0%)  acc=0.7074     1784 char/s  eta=    14s
-  eval     36,000/60,000 ( 60.0%)  acc=0.7070     1783 char/s  eta=    13s
-  eval     37,200/60,000 ( 62.0%)  acc=0.7068     1783 char/s  eta=    13s
-  eval     38,400/60,000 ( 64.0%)  acc=0.7070     1783 char/s  eta=    12s
-  eval     39,600/60,000 ( 66.0%)  acc=0.7061     1780 char/s  eta=    11s
-  eval     40,800/60,000 ( 68.0%)  acc=0.7057     1776 char/s  eta=    11s
-  eval     42,000/60,000 ( 70.0%)  acc=0.7050     1773 char/s  eta=    10s
-  eval     43,200/60,000 ( 72.0%)  acc=0.7044     1771 char/s  eta=     9s
-  eval     44,400/60,000 ( 74.0%)  acc=0.7045     1760 char/s  eta=     9s
-  eval     45,600/60,000 ( 76.0%)  acc=0.7043     1749 char/s  eta=     8s
-  eval     46,800/60,000 ( 78.0%)  acc=0.7037     1736 char/s  eta=     8s
-  eval     48,000/60,000 ( 80.0%)  acc=0.7039     1738 char/s  eta=     7s
-  eval     49,200/60,000 ( 82.0%)  acc=0.7033     1738 char/s  eta=     6s
-  eval     50,400/60,000 ( 84.0%)  acc=0.7037     1737 char/s  eta=     6s
-  eval     51,600/60,000 ( 86.0%)  acc=0.7036     1735 char/s  eta=     5s
-  eval     52,800/60,000 ( 88.0%)  acc=0.7023     1733 char/s  eta=     4s
-  eval     54,000/60,000 ( 90.0%)  acc=0.7024     1729 char/s  eta=     3s
-  eval     55,200/60,000 ( 92.0%)  acc=0.7018     1730 char/s  eta=     3s
-  eval     56,400/60,000 ( 94.0%)  acc=0.7010     1730 char/s  eta=     2s
-  eval     57,600/60,000 ( 96.0%)  acc=0.7013     1731 char/s  eta=     1s
-  eval     58,800/60,000 ( 98.0%)  acc=0.7019     1731 char/s  eta=     1s
-  eval     60,000/60,000 (100.0%)  acc=0.7031     1734 char/s  eta=     0s
-chars=60,000  acc=0.7031  eval_duration=34.6s
+  eval      1,200/60,000 (  2.0%)  acc=0.6883     2068 char/s  eta=    28s
+  eval      2,400/60,000 (  4.0%)  acc=0.6746     2099 char/s  eta=    27s
+  eval      3,600/60,000 (  6.0%)  acc=0.6722     2110 char/s  eta=    27s
+  eval      4,800/60,000 (  8.0%)  acc=0.6867     2117 char/s  eta=    26s
+  eval      6,000/60,000 ( 10.0%)  acc=0.6870     2131 char/s  eta=    25s
+  eval      7,200/60,000 ( 12.0%)  acc=0.6806     2141 char/s  eta=    25s
+  eval      8,400/60,000 ( 14.0%)  acc=0.6799     2147 char/s  eta=    24s
+  eval      9,600/60,000 ( 16.0%)  acc=0.6864     2146 char/s  eta=    23s
+  eval     10,800/60,000 ( 18.0%)  acc=0.6951     2145 char/s  eta=    23s
+  eval     12,000/60,000 ( 20.0%)  acc=0.6977     2143 char/s  eta=    22s
+  eval     13,200/60,000 ( 22.0%)  acc=0.7017     2141 char/s  eta=    22s
+  eval     14,400/60,000 ( 24.0%)  acc=0.7035     2141 char/s  eta=    21s
+  eval     15,600/60,000 ( 26.0%)  acc=0.7056     2140 char/s  eta=    21s
+  eval     16,800/60,000 ( 28.0%)  acc=0.7089     2141 char/s  eta=    20s
+  eval     18,000/60,000 ( 30.0%)  acc=0.7106     2140 char/s  eta=    20s
+  eval     19,200/60,000 ( 32.0%)  acc=0.7143     2137 char/s  eta=    19s
+  eval     20,400/60,000 ( 34.0%)  acc=0.7155     2135 char/s  eta=    19s
+  eval     21,600/60,000 ( 36.0%)  acc=0.7163     2136 char/s  eta=    18s
+  eval     22,800/60,000 ( 38.0%)  acc=0.7168     2136 char/s  eta=    17s
+  eval     24,000/60,000 ( 40.0%)  acc=0.7168     2136 char/s  eta=    17s
+  eval     25,200/60,000 ( 42.0%)  acc=0.7169     2135 char/s  eta=    16s
+  eval     26,400/60,000 ( 44.0%)  acc=0.7181     2134 char/s  eta=    16s
+  eval     27,600/60,000 ( 46.0%)  acc=0.7165     2134 char/s  eta=    15s
+  eval     28,800/60,000 ( 48.0%)  acc=0.7165     2136 char/s  eta=    15s
+  eval     30,000/60,000 ( 50.0%)  acc=0.7152     2137 char/s  eta=    14s
+  eval     31,200/60,000 ( 52.0%)  acc=0.7122     2138 char/s  eta=    13s
+  eval     32,400/60,000 ( 54.0%)  acc=0.7098     2141 char/s  eta=    13s
+  eval     33,600/60,000 ( 56.0%)  acc=0.7074     2143 char/s  eta=    12s
+  eval     34,800/60,000 ( 58.0%)  acc=0.7074     2143 char/s  eta=    12s
+  eval     36,000/60,000 ( 60.0%)  acc=0.7070     2142 char/s  eta=    11s
+  eval     37,200/60,000 ( 62.0%)  acc=0.7068     2142 char/s  eta=    11s
+  eval     38,400/60,000 ( 64.0%)  acc=0.7070     2142 char/s  eta=    10s
+  eval     39,600/60,000 ( 66.0%)  acc=0.7061     2142 char/s  eta=    10s
+  eval     40,800/60,000 ( 68.0%)  acc=0.7057     2143 char/s  eta=     9s
+  eval     42,000/60,000 ( 70.0%)  acc=0.7050     2142 char/s  eta=     8s
+  eval     43,200/60,000 ( 72.0%)  acc=0.7044     2141 char/s  eta=     8s
+  eval     44,400/60,000 ( 74.0%)  acc=0.7045     2140 char/s  eta=     7s
+  eval     45,600/60,000 ( 76.0%)  acc=0.7043     2140 char/s  eta=     7s
+  eval     46,800/60,000 ( 78.0%)  acc=0.7037     2139 char/s  eta=     6s
+  eval     48,000/60,000 ( 80.0%)  acc=0.7039     2138 char/s  eta=     6s
+  eval     49,200/60,000 ( 82.0%)  acc=0.7033     2138 char/s  eta=     5s
+  eval     50,400/60,000 ( 84.0%)  acc=0.7037     2137 char/s  eta=     4s
+  eval     51,600/60,000 ( 86.0%)  acc=0.7036     2137 char/s  eta=     4s
+  eval     52,800/60,000 ( 88.0%)  acc=0.7023     2140 char/s  eta=     3s
+  eval     54,000/60,000 ( 90.0%)  acc=0.7024     2140 char/s  eta=     3s
+  eval     55,200/60,000 ( 92.0%)  acc=0.7018     2141 char/s  eta=     2s
+  eval     56,400/60,000 ( 94.0%)  acc=0.7010     2141 char/s  eta=     2s
+  eval     57,600/60,000 ( 96.0%)  acc=0.7013     2140 char/s  eta=     1s
+  eval     58,800/60,000 ( 98.0%)  acc=0.7019     2140 char/s  eta=     1s
+  eval     60,000/60,000 (100.0%)  acc=0.7031     2140 char/s  eta=     0s
+chars=60,000  acc=0.7031  eval_duration=28.0s
 ---
 submission         : subset_70_mkn
-training energy (J): 1,064.7
-training duration  : 41.1s
+training energy (J): 1,350.8
+training duration  : 26.5s
 val  char-accuracy : 0.7031
 val  chars         : 60,000
 wrote /tmp/result.json
 Stopping app - local entrypoint completed.
 ✓ App completed. View run at 
-https://modal.com/apps/gabriel-nakajima-an/main/ap-shQS1Hiyo4OPhMcNN4Xy5N
+https://modal.com/apps/gabriel-nakajima-an/main/ap-TnCfSdLjln33sQ58a3CqLJ
 
 # final result
 {
   "submission": "subset_70_mkn",
-  "training_energy_J": 1064.6838474000006,
-  "training_duration_s": 41.054503051999994,
-  "cpu_energy_J": 1736.325936897499,
-  "total_energy_J": 2801.0097842974997,
+  "training_energy_J": 1350.8209175499999,
+  "training_duration_s": 26.514841649000005,
+  "cpu_energy_J": 1123.5610902799988,
+  "total_energy_J": 2474.3820078299987,
   "val_char_accuracy": 0.7031333333333334,
   "val_chars": 60000,
-  "gpu_name": "NVIDIA A100-SXM4-80GB",
-  "date_utc": "2026-05-20T07:32:40Z",
+  "gpu_name": "NVIDIA A100 80GB PCIe",
+  "date_utc": "2026-05-21T05:05:01Z",
   "_nvml": {
     "nvml_available": true,
     "energy_counter_supported": true,
     "monotonic": true,
-    "idle_watts": 57.06305084745769,
-    "stress_watts_avg": 333.0922109881346,
-    "stress_energy_joules": 12488.622,
-    "stress_duration_s": 37.492987191,
-    "gpu_name": "NVIDIA A100-SXM4-80GB",
+    "idle_watts": 58.36485000000002,
+    "stress_watts_avg": 233.1941013031682,
+    "stress_energy_joules": 8767.363,
+    "stress_duration_s": 37.596847223,
+    "gpu_name": "NVIDIA A100 80GB PCIe",
     "notes": []
   },
   "contributor": "@exp-batch-iter4"
diff --git a/submit.py b/submit.py
index cfefb3a..c8ec97d 100755
--- a/submit.py
+++ b/submit.py
@@ -109,6 +109,12 @@
     # tables (used as a deterministic algorithm, not a "pretrained
     # weight" — see README "Internal representations").
     .pip_install("tiktoken==0.7.0")
+    # CodeCarbon: CPU energy estimation backend used by EnergyMeter to
+    # populate cpu_energy_J + total_energy_J in result.json. The TDP
+    # fallback path is used (no MSR access in Modal containers); accuracy
+    # is acceptable for total-system-energy reporting per the field
+    # standard (HuggingFace Trainer, Patterson et al. 2021/2022).
+    .pip_install("codecarbon~=3.2")
     # Modal re-imports submit.py inside the container to resolve the
     # remote function. submit.py does a top-level `import task`, so
     # /workspace (where task.py lands via add_local_file) must be on
@@ -317,10 +323,18 @@ def append_record(result: dict, dir_relpath: str) -> None:
     Replaces the placeholder dash row if present, otherwise appends.
     Disqualified rows render their accuracy cell as ``DQ`` so they
     don't pollute the leaderboard sort.
+
+    The energy column reports ``total_energy_J`` (GPU NVML + CodeCarbon
+    CPU estimate) when the new harness produced it, falling back to
+    ``training_energy_J`` (NVML-only) for runs predating the
+    total-system-energy change. See ``MAINTAINING.md`` for the dated
+    semantics of the column over time.
     """
     readme = HERE / "README.md"
     text = readme.read_text()
-    energy = result.get("training_energy_J")
+    energy = result.get("total_energy_J")
+    if energy is None:
+        energy = result.get("training_energy_J")
     energy_cell = f"{energy:>10,.0f}" if energy is not None else "         —"
     if result.get("disqualified"):
         acc_cell = "      DQ"
diff --git a/test_wikitext.py b/test_wikitext.py
index c2bebdd..44dc93e 100644
--- a/test_wikitext.py
+++ b/test_wikitext.py
@@ -102,6 +102,157 @@ def test_energy_meter_fallback_when_no_nvml() -> None:
     assert m.duration_s >= 0
 
 
+def test_energy_meter_total_is_gpu_plus_cpu() -> None:
+    """When both GPU and CPU backends return values, total_energy_J = sum."""
+    class _StubGpuBackend:
+        available = True
+        def start(self) -> None: pass
+        def stop(self, duration_s: float) -> float: return 1000.0  # net joules
+
+    class _StubCpuBackend:
+        available = True
+        def start(self) -> None: pass
+        def stop(self) -> float: return 200.0  # net joules
+
+    meter = EnergyMeter(gpu_backend=_StubGpuBackend(), cpu_backend=_StubCpuBackend())
+    with meter.measure() as m:
+        pass
+    assert m.energy_joules == 1000.0
+    assert m.cpu_energy_J == 200.0
+    assert m.total_energy_J == 1200.0
+
+
+def test_energy_meter_raises_when_gpu_available_but_cpu_missing() -> None:
+    """If NVML works but the CPU backend doesn't, EnergyMeter must fail loudly.
+
+    Silent half-measurement (GPU only, cpu_energy_J None) would land
+    inconsistent rows on the leaderboard. Loud-fail forces the operator
+    to fix the env (install codecarbon, or pass an explicit cpu_backend
+    for an intentional calibration without CPU tracking).
+    """
+    import pytest
+
+    class _StubGpu:
+        available = True
+        def start(self) -> None: pass
+        def stop(self, duration_s: float) -> float: return 100.0
+
+    class _StubUnavailCpu:
+        available = False
+        def start(self) -> None: pass
+        def stop(self): return None
+
+    with pytest.raises(RuntimeError, match="CPU energy backend"):
+        EnergyMeter(gpu_backend=_StubGpu(), cpu_backend=_StubUnavailCpu())
+
+
+def test_energy_meter_no_raise_when_cpu_present_but_gpu_missing() -> None:
+    """Dev pattern: CodeCarbon installed but no NVML — no raise.
+
+    Loud-fail only triggers when NVML is available without CodeCarbon
+    (real GPU box, broken energy backend). A laptop with CodeCarbon
+    installed but no GPU should construct an EnergyMeter cleanly and
+    just not measure GPU energy.
+    """
+    class _UnavailGpu:
+        available = False
+        def start(self) -> None: pass
+        def stop(self, duration_s: float = 0.0): return None
+
+    class _AvailCpu:
+        available = True
+        def start(self) -> None: pass
+        def stop(self): return 100.0
+
+    meter = EnergyMeter(gpu_backend=_UnavailGpu(), cpu_backend=_AvailCpu())
+    assert not meter.available
+
+
+def test_total_energy_none_when_only_one_backend_yields_value() -> None:
+    """total_energy_J stays None if either backend returns None from stop()."""
+    class _GpuOk:
+        available = True
+        def start(self) -> None: pass
+        def stop(self, duration_s: float) -> float: return 100.0
+
+    class _CpuYieldsNone:
+        # available=True so the constructor doesn't raise, but stop()
+        # yields None — simulates a tracker that started OK and then
+        # failed to read its counter on the way out.
+        available = True
+        def start(self) -> None: pass
+        def stop(self): return None
+
+    meter = EnergyMeter(gpu_backend=_GpuOk(), cpu_backend=_CpuYieldsNone())
+    with meter.measure() as m:
+        pass
+    assert m.energy_joules == 100.0
+    assert m.cpu_energy_J is None
+    assert m.total_energy_J is None
+
+
+def test_energy_meter_dev_mode_no_raise_when_both_unavailable() -> None:
+    """Dev pattern: no NVML AND no CodeCarbon — soft, not loud.
+
+    Local smoke tests on a CPU-only laptop must still be able to
+    construct an EnergyMeter without crashing; measurement just
+    returns None for everything.
+    """
+    class _Unavail:
+        available = False
+        def start(self) -> None: pass
+        def stop(self, duration_s: float = 0.0): return None
+
+    meter = EnergyMeter(gpu_backend=_Unavail(), cpu_backend=_Unavail())
+    assert not meter.available
+
+
+def test_default_cpu_backend_uses_codecarbon_when_installed() -> None:
+    """When CodeCarbon is installed, default cpu_backend populates cpu_energy_J."""
+    import pytest
+    pytest.importorskip("codecarbon")
+
+    class _StubGpu:
+        available = True
+        def start(self) -> None: pass
+        def stop(self, duration_s: float) -> float: return 100.0
+
+    meter = EnergyMeter(gpu_backend=_StubGpu())  # default cpu_backend
+    with meter.measure() as m:
+        sum(range(1_000_000))  # short CPU work
+    assert m.cpu_energy_J is not None, "default cpu_backend should populate cpu_energy_J"
+    assert m.cpu_energy_J >= 0.0
+    assert m.total_energy_J is not None
+    assert m.total_energy_J >= 100.0  # at least the GPU contribution
+
+
+def test_total_energy_enforces_wall_clock_floor() -> None:
+    """total_energy_J must be >= duration_s * p_floor_watts even when backends under-attribute."""
+    class _LowGpu:
+        available = True
+        def start(self) -> None: pass
+        def stop(self, duration_s: float) -> float: return 5.0  # tiny GPU energy
+
+    class _ZeroCpu:
+        available = True
+        def start(self) -> None: pass
+        def stop(self) -> float: return 0.0  # CodeCarbon under-attribution sim
+
+    meter = EnergyMeter(
+        gpu_backend=_LowGpu(),
+        cpu_backend=_ZeroCpu(),
+        p_floor_watts=50.0,
+    )
+    with meter.measure() as m:
+        time.sleep(0.4)  # wall clock ~ 0.4s → floor ~ 20J
+    assert m.duration_s >= 0.3
+    floor = m.duration_s * 50.0
+    raw_sum = m.energy_joules + m.cpu_energy_J
+    # Floor must bind: raw sum is 5J, floor ~20J
+    assert m.total_energy_J >= floor
+    assert m.total_energy_J == max(raw_sum, floor)
+
+
 # ---------------------------------------------------------------------------
 # Wall-clock guard (README rule 4)
 # ---------------------------------------------------------------------------
diff --git a/wikitext.py b/wikitext.py
index 8b52b1e..06c23f9 100644
--- a/wikitext.py
+++ b/wikitext.py
@@ -159,10 +159,15 @@ def evaluate(
 class Measurement:
     energy_joules: float | None = None
     duration_s: float = 0.0
+    cpu_energy_J: float | None = None
+    total_energy_J: float | None = None
 
     def __str__(self) -> str:
         e = (f"{self.energy_joules:,.1f} J"
              if self.energy_joules is not None else "energy: not measured")
+        if self.cpu_energy_J is not None and self.total_energy_J is not None:
+            e += (f"   cpu={self.cpu_energy_J:,.1f} J"
+                  f"   total={self.total_energy_J:,.1f} J")
         return f"{e}   duration={self.duration_s:.1f}s"
 
 
@@ -189,42 +194,133 @@ class EnergyMeter:
     README rule 4 lives in ``wall_clock_guard`` instead.
     """
 
-    def __init__(self, *, gpu_index: int = 0, idle_watts: float = 50.0):
+    def __init__(self, *, gpu_index: int = 0, idle_watts: float = 50.0,
+                 gpu_backend=None, cpu_backend=None, p_floor_watts: float = 50.0):
+        self.gpu_index = gpu_index
+        self.idle_watts = idle_watts
+        self.p_floor_watts = p_floor_watts
+        self._gpu_backend = (gpu_backend if gpu_backend is not None
+                             else _NvmlGpuBackend(gpu_index, idle_watts))
+        self._cpu_backend = (cpu_backend if cpu_backend is not None
+                             else _CodeCarbonCpuBackend())
+        # Fail loudly if we're on a real GPU box but the CPU backend
+        # failed to load. Silent half-measurement would land inconsistent
+        # rows on the leaderboard. (Dev machines without NVML stay in
+        # soft "neither available" mode — no measurement, no crash.)
+        if self._gpu_backend.available and not self._cpu_backend.available:
+            raise RuntimeError(
+                "EnergyMeter: NVML is available but the CPU energy backend "
+                "is not. CodeCarbon is listed in requirements.txt and the "
+                "Modal image — install it (`pip install codecarbon`), or "
+                "pass an explicit cpu_backend if running a calibration "
+                "that intentionally skips CPU tracking."
+            )
+        self.available = self._gpu_backend.available
+
+    @contextmanager
+    def measure(self) -> Iterator[Measurement]:
+        m = Measurement()
+        if self._gpu_backend.available:
+            self._gpu_backend.start()
+        if self._cpu_backend.available:
+            self._cpu_backend.start()
+        t0 = time.monotonic()
+        try:
+            yield m
+        finally:
+            # Capture duration / energy even if the body raised (e.g.
+            # TrainingTimeoutError from wall_clock_guard) — caller can
+            # then report the partial numbers on the DQ row.
+            m.duration_s = time.monotonic() - t0
+            if self._gpu_backend.available:
+                m.energy_joules = self._gpu_backend.stop(m.duration_s)
+            if self._cpu_backend.available:
+                m.cpu_energy_J = self._cpu_backend.stop()
+            if m.energy_joules is not None and m.cpu_energy_J is not None:
+                raw_sum = m.energy_joules + m.cpu_energy_J
+                floor = m.duration_s * self.p_floor_watts
+                m.total_energy_J = max(raw_sum, floor)
+
+
+class _NvmlGpuBackend:
+    """Default GPU energy backend wrapping pynvml's
+    ``nvmlDeviceGetTotalEnergyConsumption`` counter with idle subtraction."""
+
+    def __init__(self, gpu_index: int = 0, idle_watts: float = 50.0):
         self.gpu_index = gpu_index
         self.idle_watts = idle_watts
         self.available = False
         self._handle = None
         self._pynvml = None
+        self._e0: int | None = None
         try:
             import pynvml  # type: ignore[import-not-found]
             pynvml.nvmlInit()
             self._handle = pynvml.nvmlDeviceGetHandleByIndex(gpu_index)
-            # Probe the energy counter; if unsupported, fall back.
             pynvml.nvmlDeviceGetTotalEnergyConsumption(self._handle)
             self._pynvml = pynvml
             self.available = True
         except Exception:
-            self.available = False
+            pass
 
-    @contextmanager
-    def measure(self) -> Iterator[Measurement]:
-        m = Measurement()
-        e0: int | None = None
+    def start(self) -> None:
         if self.available and self._pynvml is not None:
-            e0 = self._pynvml.nvmlDeviceGetTotalEnergyConsumption(self._handle)
-        t0 = time.monotonic()
+            self._e0 = self._pynvml.nvmlDeviceGetTotalEnergyConsumption(self._handle)
+
+    # ``stop`` takes ``duration_s`` because NVML returns a running total
+    # and we subtract ``idle_watts * duration`` to get net training
+    # energy. The CPU backend's ``stop`` doesn't need a duration arg —
+    # CodeCarbon's tracker timestamps its own start/stop internally.
+    def stop(self, duration_s: float) -> float | None:
+        if not (self.available and self._pynvml is not None and self._e0 is not None):
+            return None
+        e1 = self._pynvml.nvmlDeviceGetTotalEnergyConsumption(self._handle)
+        e_run_j = (e1 - self._e0) / 1000.0  # NVML returns millijoules
+        e_idle_j = duration_s * self.idle_watts
+        return max(0.0, e_run_j - e_idle_j)
+
+
+class _CodeCarbonCpuBackend:
+    """Default CPU energy backend wrapping CodeCarbon's ``EmissionsTracker``.
+
+    Sets ``available = False`` if CodeCarbon is not installed. On its own
+    that is silent (returns ``None`` from ``stop()``), but the surrounding
+    ``EnergyMeter.__init__`` raises ``RuntimeError`` when NVML is
+    available and this backend is not — so a leaderboard run on Modal
+    fails loudly rather than silently dropping the CPU component. The
+    silent path is only reached on dev boxes that also have no NVML.
+
+    Note: reads ``tracker._total_cpu_energy.kWh`` after stop. That
+    attribute is internal to CodeCarbon; we pin a minor version range in
+    ``requirements.txt`` (and the Modal image) to keep the path stable.
+    """
+
+    def __init__(self) -> None:
+        self.available = False
+        self._tracker = None
+        self._EmissionsTracker = None
         try:
-            yield m
-        finally:
-            # Capture duration / energy even if the body raised (e.g.
-            # TrainingTimeoutError from wall_clock_guard) — caller can
-            # then report the partial numbers on the DQ row.
-            m.duration_s = time.monotonic() - t0
-            if self.available and self._pynvml is not None and e0 is not None:
-                e1 = self._pynvml.nvmlDeviceGetTotalEnergyConsumption(self._handle)
-                e_run_j = (e1 - e0) / 1000.0  # NVML returns millijoules
-                e_idle_j = m.duration_s * self.idle_watts
-                m.energy_joules = max(0.0, e_run_j - e_idle_j)
+            from codecarbon import EmissionsTracker
+            self._EmissionsTracker = EmissionsTracker
+            self.available = True
+        except Exception:
+            pass
+
+    def start(self) -> None:
+        if not self.available or self._EmissionsTracker is None:
+            return
+        self._tracker = self._EmissionsTracker(
+            save_to_file=False, log_level="error", measure_power_secs=1.0
+        )
+        self._tracker.start()
+
+    def stop(self) -> float | None:
+        if not self.available or self._tracker is None:
+            return None
+        self._tracker.stop()
+        kwh = self._tracker._total_cpu_energy.kWh
+        self._tracker = None
+        return kwh * 3.6e6  # kWh → J
 
 
 # ---------------------------------------------------------------------------

From c46ec127fb32c7cbe081d1a10a017eabc32e175b Mon Sep 17 00:00:00 2001
From: Gabriel Nakajima An <naka@Gabriels-MacBook-Pro.local>
Date: Mon, 25 May 2026 15:04:43 -0700
Subject: [PATCH 2/5] README: move orphan rows into the table; add 13 PR-#5
 entries
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR #5 merged without updating README's Record History — its 13 new
submissions had no rows. PR #4's earlier auto-appends from
``submit.py:append_record`` placed re-run rows AFTER the ``[^2]``
footnote (orphan, wrong format, no GPU column). Cleaning both up:

- Move the 9 orphan PASS rows from after the footnotes into the table
  proper, reformatted with the GPU column to match existing style.
- Add the 4 PR-#5 DQ submissions that were missing entirely
  (``gpu_ngram_w31_k10``, ``chunker_phase1_v2``, ``bpe_internal_nn_v2``,
  ``mamba_byte``).
- Drop the 2026-05-20 ``modded_nanogpt`` DQ row — it was a transient
  SXM4-scheduler failure that's been superseded by the 2026-05-21
  PASS row in the same table; keeping it confuses the dir link.

Fix the underlying bug in ``submit.py:append_record`` so future
auto-appends land inside the table block instead of past the
footnotes: new ``_insert_into_record_history_table`` helper walks
the file, finds the Record History header + pipe-table block, and
inserts the new row after the last pipe-prefixed line of that block.
Falls back to the prior plain-append behaviour only if the table
can't be located (defensive).

Add ``scripts/validate_record_history.py`` — re-usable validator
that:
- parses the Record History markdown table,
- flags orphan submission rows outside the table block,
- cross-references the LATEST row per slot against the
  submission's current ``result.json`` (energy + accuracy within
  tolerance, PASS/DQ status matches),
- catches duplicate PASS rows for the same submission on the same
  date.

``python3 scripts/validate_record_history.py`` now reports
``README Record History: OK`` on this branch.
``pytest test_wikitext.py`` → 15/15 still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 README.md                          |  27 ++--
 scripts/validate_record_history.py | 235 +++++++++++++++++++++++++++++
 submit.py                          |  54 ++++++-
 3 files changed, 299 insertions(+), 17 deletions(-)
 create mode 100755 scripts/validate_record_history.py

diff --git a/README.md b/README.md
index ce37b6c..6093d84 100644
--- a/README.md
+++ b/README.md
@@ -35,9 +35,21 @@ The `Energy (J)` column reports **`total_energy_J`** (GPU NVML net of idle basel
 | 2026-05-18 |      3,612 |       DQ | A100 80GB PCIe | chunker_d1       | [dir](research/catalog/new_directions/chunker_d1)       | @ab-10 |
 | 2026-05-18 |        735 |       DQ | A100 80GB PCIe | ppm_c            | [dir](research/catalog/new_directions/ppm_c)            | @ab-10 |
 | 2026-05-17 |         70 |       DQ | A100 80GB SXM4 | P2-A_random_projection | [dir](research/forward-forward-deep/runs/phase2/P2-A_random_projection) | @ab-10 |
-| 2026-05-20 |     53,683 | 0.7246    | A100 80GB PCIe | lwta_k4        | [dir](submissions/lwta_k4)        | @ab-10 (re-run on new harness; total_J = 44,329 gpu + 9,354 cpu) |
-| 2026-05-20 |     54,614 | 0.7145    | A100 80GB PCIe | lwta_k2        | [dir](submissions/lwta_k2)        | @ab-10 (re-run on new harness; total_J = 44,583 gpu + 10,031 cpu) |
-| 2026-05-20 |     66,747 |       DQ | A100 80GB SXM4 | modded_nanogpt | [dir](submissions/modded_nanogpt) | @ab-10 (re-run on new harness landed on SXM4 and hit 300 s cap; re-running) |
+| 2026-05-19 |     60,864 |       DQ | A100 80GB PCIe | mamba_byte           | [dir](submissions/mamba_byte)           | @claude-mamba |
+| 2026-05-20 |      1,752 |       DQ | A100 80GB SXM4 | gpu_ngram_w31_k10    | [dir](submissions/gpu_ngram_w31_k10)    | @follow-up-paq-prediction |
+| 2026-05-20 |     13,936 |       DQ | A100 80GB SXM4 | chunker_phase1_v2    | [dir](submissions/chunker_phase1_v2)    | @explore-chunker-2026-05-19 |
+| 2026-05-20 |     24,417 |       DQ | A100 80GB SXM4 | bpe_internal_nn_v2   | [dir](submissions/bpe_internal_nn_v2)   | @subagent-xorfix-2026-05-19 |
+| 2026-05-20 |     53,683 | 0.7246    | A100 80GB PCIe | lwta_k4              | [dir](submissions/lwta_k4)              | @ab-10 (re-run on new harness; total_J = 44,329 gpu + 9,354 cpu) |
+| 2026-05-20 |     54,614 | 0.7145    | A100 80GB PCIe | lwta_k2              | [dir](submissions/lwta_k2)              | @ab-10 (re-run on new harness; total_J = 44,583 gpu + 10,031 cpu) |
+| 2026-05-21 |      2,474 | 0.7031    | A100 80GB PCIe | subset_70_mkn        | [dir](submissions/subset_70_mkn)        | @exp-batch-iter4 |
+| 2026-05-21 |      3,092 | 0.7050    | A100 80GB PCIe | gpu_ngram_w31_k11    | [dir](submissions/gpu_ngram_w31_k11)    | @follow-up-paq-prediction |
+| 2026-05-21 |      4,607 | 0.7047    | A100 80GB PCIe | paq_mixer_v3         | [dir](submissions/paq_mixer_v3)         | @worker-paq-mixer |
+| 2026-05-21 |      8,602 | 0.7184    | A100 80GB PCIe | gpu_ngram_o14_xorfix | [dir](submissions/gpu_ngram_o14_xorfix) | @subagent-xorfix-2026-05-19 |
+| 2026-05-21 |      9,591 | 0.7063    | A100 80GB PCIe | chunker_phase1_v1    | [dir](submissions/chunker_phase1_v1)    | @explore-chunker-2026-05-19 |
+| 2026-05-21 |     14,578 | 0.7184    | A100 80GB PCIe | deep_backoff_kn      | [dir](submissions/deep_backoff_kn)      | @nakajimagabriel |
+| 2026-05-21 |     19,922 | 0.7328    | A100 80GB SXM4 | lwta_k4_alpha_065    | [dir](submissions/lwta_k4_alpha_065)    | @subagent-L2clean-2026-05-19 |
+| 2026-05-21 |     20,743 | 0.7390    | A100 80GB SXM4 | alpha_06             | [dir](submissions/alpha_06)             | @subagent-xorfix-2026-05-19 |
+| 2026-05-21 |     62,006 | 0.7337    | A100 80GB SXM4 | modded_nanogpt       | [dir](submissions/modded_nanogpt)       | @ab-10 |
 
 
 ## Rules
@@ -69,12 +81,3 @@ For an internal-BPE submission, `predict()` returns `P(next_char | observed_char
 
 [^1]: More energy efficient
 [^2]: As of writing this
-| 2026-05-21 |      3,092 | 0.7050 | gpu_ngram_w31_k11 | [dir](submissions/gpu_ngram_w31_k11) | @follow-up-paq-prediction |
-| 2026-05-21 |      2,474 | 0.7031 | subset_70_mkn | [dir](submissions/subset_70_mkn) | @exp-batch-iter4 |
-| 2026-05-21 |      4,607 | 0.7047 | paq_mixer_v3 | [dir](submissions/paq_mixer_v3) | @worker-paq-mixer |
-| 2026-05-21 |      8,602 | 0.7184 | gpu_ngram_o14_xorfix | [dir](submissions/gpu_ngram_o14_xorfix) | @subagent-xorfix-2026-05-19 |
-| 2026-05-21 |     14,578 | 0.7184 | deep_backoff_kn | [dir](submissions/deep_backoff_kn) | @nakajimagabriel |
-| 2026-05-21 |      9,591 | 0.7063 | chunker_phase1_v1 | [dir](submissions/chunker_phase1_v1) | @explore-chunker-2026-05-19 |
-| 2026-05-21 |     19,922 | 0.7328 | lwta_k4_alpha_065 | [dir](submissions/lwta_k4_alpha_065) | @subagent-L2clean-2026-05-19 |
-| 2026-05-21 |     20,743 | 0.7390 | alpha_06 | [dir](submissions/alpha_06) | @subagent-xorfix-2026-05-19 |
-| 2026-05-21 |     62,006 | 0.7337 | modded_nanogpt | [dir](submissions/modded_nanogpt) | @ab-10 |
diff --git a/scripts/validate_record_history.py b/scripts/validate_record_history.py
new file mode 100755
index 0000000..f4edb18
--- /dev/null
+++ b/scripts/validate_record_history.py
@@ -0,0 +1,235 @@
+#!/usr/bin/env python3
+"""Validate that README.md's Record History table is consistent with the
+underlying submission result.json files.
+
+Run from the repo root:
+
+    python3 scripts/validate_record_history.py
+
+Exit code is 0 if the table is consistent, 1 if any check fails.
+
+Checks performed:
+
+1. The Record History table parses cleanly (7 columns, all rows
+   well-formed).
+2. No rows exist outside the table (no orphan submission rows after
+   footnotes — a regression caused by the prior ``append_record``
+   placing rows past the end of the file).
+3. For each submission row whose dir link points to ``submissions/<name>/``,
+   the linked ``result.json`` exists and its energy / accuracy match the
+   row to within reasonable tolerance.
+4. No submission appears multiple times as PASS on the same date.
+"""
+from __future__ import annotations
+
+import json
+import re
+import sys
+from pathlib import Path
+
+HERE = Path(__file__).resolve().parent.parent
+README = HERE / "README.md"
+SUBMISSIONS = HERE / "submissions"
+
+ENERGY_TOL_REL = 0.01
+ACC_TOL = 1e-4
+
+
+def main() -> int:
+    text = README.read_text()
+    failures: list[str] = []
+
+    table_rows, orphan_rows = _extract_table(text)
+
+    if not table_rows:
+        failures.append("Could not find the Record History table.")
+        return _report(failures)
+
+    if orphan_rows:
+        failures.append(
+            f"Found {len(orphan_rows)} orphan submission row(s) outside "
+            f"the Record History table:"
+        )
+        for line_no, row in orphan_rows:
+            failures.append(f"  line {line_no}: {row.strip()}")
+
+    # Each submission slot may have multiple rows (one per re-run on a
+    # setup change). result.json is overwritten and only reflects the
+    # most recent run — so only the latest row per slot should match
+    # result.json. Earlier rows are historical and skipped.
+    parsed_rows: list[tuple[int, tuple[str, str, str, str, str, str, str]]] = []
+    pass_by_config: dict[str, list[tuple[int, str, str]]] = {}
+    for line_no, row in table_rows:
+        parsed = _parse_row(row)
+        if parsed is None:
+            failures.append(f"line {line_no}: row failed to parse: {row.strip()}")
+            continue
+        parsed_rows.append((line_no, parsed))
+        date, energy_cell, acc_cell, gpu, config, dir_link, contributor = parsed
+        if acc_cell != "DQ":
+            pass_by_config.setdefault(config, []).append((line_no, date, acc_cell))
+
+    # Group by slot dir, validate only the last (highest line_no) row
+    # against the slot's current result.json.
+    latest_by_slot: dict[str, tuple[int, tuple]] = {}
+    for line_no, parsed in parsed_rows:
+        _, _, _, _, _, dir_link, _ = parsed
+        m = re.match(r"\[dir\]\(submissions/([^)]+)\)", dir_link)
+        if not m:
+            continue
+        slot = m.group(1).rstrip("/")
+        latest_by_slot[slot] = (line_no, parsed)
+
+    for slot, (line_no, parsed) in latest_by_slot.items():
+        date, energy_cell, acc_cell, gpu, config, dir_link, contributor = parsed
+        result_path = SUBMISSIONS / slot / "result.json"
+        if not result_path.exists():
+            failures.append(
+                f"line {line_no}: {slot}: result.json missing at {result_path}"
+            )
+            continue
+        try:
+            result = json.loads(result_path.read_text())
+        except json.JSONDecodeError as exc:
+            failures.append(f"line {line_no}: {slot}: result.json unreadable ({exc})")
+            continue
+        _check_row_against_result(
+            line_no, slot, energy_cell, acc_cell, result, failures
+        )
+
+    for config, rows in pass_by_config.items():
+        dates = [r[1] for r in rows]
+        if len(set(dates)) < len(dates):
+            same_date = sorted(rows, key=lambda r: r[0])
+            failures.append(f"{config}: multiple PASS rows on the same date:")
+            for line_no, date, acc in same_date:
+                failures.append(f"  line {line_no}: {date} acc={acc}")
+
+    return _report(failures)
+
+
+def _extract_table(text: str) -> tuple[list[tuple[int, str]], list[tuple[int, str]]]:
+    """Return (table_rows, orphan_rows).
+
+    table_rows: data rows inside the Record History markdown table.
+    orphan_rows: lines starting with ``|`` AFTER the table block closed,
+    i.e. submission-looking rows that landed past the table separator
+    (typically appended after footnotes by a buggy append_record).
+    """
+    lines = text.splitlines()
+    in_record_history = False
+    in_table = False
+    table_rows: list[tuple[int, str]] = []
+    orphans: list[tuple[int, str]] = []
+    past_table = False
+
+    for i, line in enumerate(lines, start=1):
+        stripped = line.strip()
+        if stripped.startswith("## Record History"):
+            in_record_history = True
+            continue
+        if in_record_history and not in_table:
+            if line.startswith("|") and "Energy" in line and "Val" in line:
+                in_table = True
+                continue
+        if in_table:
+            if line.startswith("|---") or line.startswith("|--"):
+                continue
+            if line.startswith("|"):
+                table_rows.append((i, line))
+                continue
+            in_table = False
+            past_table = True
+            continue
+        if past_table:
+            if line.startswith("|") and "[dir](submissions/" in line:
+                orphans.append((i, line))
+
+    return table_rows, orphans
+
+
+def _parse_row(row: str) -> tuple[str, str, str, str, str, str, str] | None:
+    cells = [c.strip() for c in row.strip().strip("|").split("|")]
+    if len(cells) < 6:
+        return None
+    if len(cells) == 6:
+        date, energy, acc, config, dir_link, contributor = cells
+        gpu = ""
+    elif len(cells) >= 7:
+        date, energy, acc, gpu, config, dir_link, contributor = cells[:7]
+    else:
+        return None
+    return date, energy, acc, gpu, config, dir_link, contributor
+
+
+def _check_row_against_result(
+    line_no: int,
+    slot: str,
+    energy_cell: str,
+    acc_cell: str,
+    result: dict,
+    failures: list,
+) -> None:
+    is_dq = (
+        result.get("disqualified", False)
+        or result.get("val_char_accuracy") is None
+        or result.get("val_char_accuracy", 0.0) < 0.70
+    )
+
+    if acc_cell == "DQ":
+        if not is_dq:
+            failures.append(
+                f"line {line_no}: {slot}: row says DQ but result.json is PASS"
+            )
+    else:
+        if is_dq:
+            failures.append(
+                f"line {line_no}: {slot}: row claims PASS but result.json is DQ"
+            )
+        else:
+            try:
+                row_acc = float(acc_cell)
+            except ValueError:
+                failures.append(
+                    f"line {line_no}: {slot}: cannot parse acc cell {acc_cell!r}"
+                )
+                return
+            result_acc = result.get("val_char_accuracy", 0.0)
+            if abs(row_acc - result_acc) > ACC_TOL:
+                failures.append(
+                    f"line {line_no}: {slot}: acc row={row_acc:.4f} vs "
+                    f"result.json={result_acc:.4f}"
+                )
+
+    try:
+        row_energy = float(energy_cell.replace(",", "").strip())
+    except ValueError:
+        failures.append(
+            f"line {line_no}: {slot}: cannot parse energy cell {energy_cell!r}"
+        )
+        return
+    expected_energy = result.get("total_energy_J")
+    if expected_energy is None:
+        expected_energy = result.get("training_energy_J", 0.0)
+    if expected_energy is None or expected_energy == 0:
+        return
+    rel = abs(row_energy - expected_energy) / max(1.0, expected_energy)
+    if rel > ENERGY_TOL_REL:
+        failures.append(
+            f"line {line_no}: {slot}: energy row={row_energy:,.0f} vs "
+            f"result.json={expected_energy:,.0f} (rel diff {rel:.2%})"
+        )
+
+
+def _report(failures: list[str]) -> int:
+    if not failures:
+        print("README Record History: OK")
+        return 0
+    print(f"README Record History: {len(failures)} issue(s) found:")
+    for f in failures:
+        print(f"  {f}")
+    return 1
+
+
+if __name__ == "__main__":
+    sys.exit(main())
diff --git a/submit.py b/submit.py
index c8ec97d..4e53a7f 100755
--- a/submit.py
+++ b/submit.py
@@ -320,7 +320,13 @@ def save_nvml_artifact(
 def append_record(result: dict, dir_relpath: str) -> None:
     """Append one row to the Record History table in README.md.
 
-    Replaces the placeholder dash row if present, otherwise appends.
+    Inserts the row at the end of the Record History markdown table
+    block (after the last data row, before the blank line that closes
+    the table). Earlier versions appended to the end of the file, which
+    landed rows past the footnotes and broke the table — this version
+    keeps them inside the table.
+
+    Replaces the placeholder dash row if present, otherwise inserts.
     Disqualified rows render their accuracy cell as ``DQ`` so they
     don't pollute the leaderboard sort.
 
@@ -351,10 +357,48 @@ def append_record(result: dict, dir_relpath: str) -> None:
     )
     placeholder = "| —    |          — |        — | —      | —          | —           |\n"
     if placeholder in text:
-        text = text.replace(placeholder, row, 1)
-    else:
-        text = text.rstrip() + "\n" + row
-    readme.write_text(text)
+        readme.write_text(text.replace(placeholder, row, 1))
+        return
+
+    new_text = _insert_into_record_history_table(text, row)
+    if new_text is None:
+        # Table not found — fall back to plain append. Better than crashing.
+        new_text = text.rstrip() + "\n" + row
+    readme.write_text(new_text)
+
+
+def _insert_into_record_history_table(text: str, row: str) -> str | None:
+    """Return ``text`` with ``row`` inserted at the end of the Record
+    History markdown table block. Returns ``None`` if no table was found.
+
+    The table is identified by a ``## Record History`` heading followed
+    by a markdown pipe-table header. The new row is inserted after the
+    last consecutive pipe-prefixed line of the table.
+    """
+    lines = text.splitlines(keepends=True)
+    in_record_history = False
+    in_table = False
+    last_pipe_line = -1
+    for i, line in enumerate(lines):
+        if line.lstrip().startswith("## Record History"):
+            in_record_history = True
+            continue
+        if not in_record_history:
+            continue
+        if not in_table:
+            if line.startswith("|") and "Energy" in line and "Val" in line:
+                in_table = True
+                last_pipe_line = i
+            continue
+        # In the table: every pipe-line counts as the running tail.
+        if line.startswith("|"):
+            last_pipe_line = i
+            continue
+        # First non-pipe line closes the table.
+        break
+    if last_pipe_line < 0:
+        return None
+    return "".join(lines[: last_pipe_line + 1] + [row] + lines[last_pipe_line + 1 :])
 
 
 class _Tee(io.TextIOBase):

From 62e5f07c2a855df41821ed315db04986d81d3d63 Mon Sep 17 00:00:00 2001
From: Armin Stepanyan <12305910+ab-10@users.noreply.github.com>
Date: Fri, 22 May 2026 03:41:20 +0000
Subject: [PATCH 3/5] Add CPU energy usage to total

---
 run_eval.py | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/run_eval.py b/run_eval.py
index c9e5806..881eb3c 100644
--- a/run_eval.py
+++ b/run_eval.py
@@ -119,7 +119,7 @@ def main() -> None:
         if m is not None:
             print(f"training duration  : {m.duration_s:.1f}s")
             if m.energy_joules is not None:
-                print(f"training energy (J): {m.energy_joules:,.1f}  (at kill)")
+                print(f"training energy (J): {_fmt_training_energy(m)}  (at kill)")
         if args.results_json is not None:
             payload = {
                 "submission": submission_name,
@@ -155,7 +155,7 @@ def main() -> None:
               f"below floor {args.acc_min:.4f}")
         print(f"submission         : {submission_name}")
         if m.energy_joules is not None:
-            print(f"training energy (J): {m.energy_joules:,.1f}")
+            print(f"training energy (J): {_fmt_training_energy(m)}")
         print(f"training duration  : {m.duration_s:.1f}s")
         if args.results_json is not None:
             payload = {
@@ -178,10 +178,7 @@ def main() -> None:
 
     print("---")
     print(f"submission         : {submission_name}")
-    if m.energy_joules is not None:
-        print(f"training energy (J): {m.energy_joules:,.1f}")
-    else:
-        print("training energy (J): NOT MEASURED")
+    print(f"training energy (J): {_fmt_training_energy(m)}")
     print(f"training duration  : {m.duration_s:.1f}s")
     print(f"val  char-accuracy : {val_result.accuracy:.4f}")
     print(f"val  chars         : {val_result.n_chars:,}")
@@ -217,5 +214,16 @@ def _utc_now() -> str:
             .replace(microsecond=0).isoformat().replace("+00:00", "Z"))
 
 
+def _fmt_training_energy(m) -> str:
+    if (m.total_energy_J is not None
+            and m.energy_joules is not None
+            and m.cpu_energy_J is not None):
+        return (f"{m.total_energy_J:,.1f} "
+                f"({m.energy_joules:,.1f} GPU + {m.cpu_energy_J:,.1f} CPU)")
+    if m.energy_joules is not None:
+        return f"{m.energy_joules:,.1f}"
+    return "NOT MEASURED"
+
+
 if __name__ == "__main__":
     main()

From 2bb068d0c8269ad72a2c9207a97dab703af34428 Mon Sep 17 00:00:00 2001
From: Gabriel Nakajima An <naka@Gabriels-MacBook-Pro.local>
Date: Mon, 25 May 2026 15:24:13 -0700
Subject: [PATCH 4/5] Normalize hallucinated contributor handles to @gabrielnan
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The submission ``__author__`` fields under
``submissions/{adamw_lr3e3_wd0_long, alpha_06, bpe_internal_nn_v2,
chunker_phase1_v1, chunker_phase1_v2, deep_backoff_kn,
gpu_ngram_o14_xorfix, gpu_ngram_w31_k10, gpu_ngram_w31_k11,
lwta_k4_alpha_065, mamba_byte, paq_mixer_v3, subset_70_mkn}/`` were
written by AI subagents during development and contained
subagent-style identifiers (``@subagent-xorfix-2026-05-19``,
``@exp-batch-iter4``, ``@follow-up-paq-prediction``,
``@worker-paq-mixer``, ``@explore-chunker-2026-05-19``,
``@subagent-L2clean-2026-05-19``, ``@claude-mamba``,
``@explore-reopen-adamw``) — none of which are real GitHub usernames.

``@nakajimagabriel`` was also invented (GitHub returns 404 for that
login); the actual contributor account is ``@gabrielnan``.

Replaced both classes of hallucinated handles with ``@gabrielnan`` in:
- each affected ``submission.py``'s ``__author__`` line,
- each affected ``result.json``'s ``contributor`` field,
- the matching rows in ``README.md``'s Record History table.

scripts/validate_record_history.py still reports OK.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 README.md                                     |  24 +--
 submissions/adamw_lr3e3_wd0_long/result.json  |   2 +-
 .../adamw_lr3e3_wd0_long/submission.py        |   2 +-
 submissions/alpha_06/result.json              |   2 +-
 submissions/alpha_06/result.sxm4.json         |  21 +++
 submissions/alpha_06/run.sxm4.log             | 144 ++++++++++++++++++
 submissions/alpha_06/submission.py            |   2 +-
 submissions/bpe_internal_nn_v2/result.json    |   2 +-
 submissions/bpe_internal_nn_v2/submission.py  |   2 +-
 submissions/chunker_phase1_v1/result.json     |   2 +-
 submissions/chunker_phase1_v1/submission.py   |   2 +-
 submissions/chunker_phase1_v2/result.json     |   2 +-
 submissions/chunker_phase1_v2/submission.py   |   2 +-
 submissions/deep_backoff_kn/result.json       |   2 +-
 submissions/deep_backoff_kn/submission.py     |   2 +-
 submissions/gpu_ngram_o14_xorfix/result.json  |   2 +-
 .../gpu_ngram_o14_xorfix/submission.py        |   2 +-
 submissions/gpu_ngram_w31_k10/result.json     |   2 +-
 submissions/gpu_ngram_w31_k10/submission.py   |   2 +-
 submissions/gpu_ngram_w31_k11/result.json     |   2 +-
 submissions/gpu_ngram_w31_k11/submission.py   |   2 +-
 submissions/lwta_k4_alpha_065/result.json     |   2 +-
 submissions/lwta_k4_alpha_065/submission.py   |   2 +-
 submissions/mamba_byte/result.json            |   2 +-
 submissions/mamba_byte/submission.py          |   2 +-
 submissions/paq_mixer_v3/result.json          |   2 +-
 submissions/paq_mixer_v3/submission.py        |   2 +-
 submissions/subset_70_mkn/result.json         |   2 +-
 submissions/subset_70_mkn/submission.py       |   2 +-
 29 files changed, 203 insertions(+), 38 deletions(-)
 create mode 100644 submissions/alpha_06/result.sxm4.json
 create mode 100644 submissions/alpha_06/run.sxm4.log

diff --git a/README.md b/README.md
index 6093d84..87eb1e3 100644
--- a/README.md
+++ b/README.md
@@ -35,20 +35,20 @@ The `Energy (J)` column reports **`total_energy_J`** (GPU NVML net of idle basel
 | 2026-05-18 |      3,612 |       DQ | A100 80GB PCIe | chunker_d1       | [dir](research/catalog/new_directions/chunker_d1)       | @ab-10 |
 | 2026-05-18 |        735 |       DQ | A100 80GB PCIe | ppm_c            | [dir](research/catalog/new_directions/ppm_c)            | @ab-10 |
 | 2026-05-17 |         70 |       DQ | A100 80GB SXM4 | P2-A_random_projection | [dir](research/forward-forward-deep/runs/phase2/P2-A_random_projection) | @ab-10 |
-| 2026-05-19 |     60,864 |       DQ | A100 80GB PCIe | mamba_byte           | [dir](submissions/mamba_byte)           | @claude-mamba |
-| 2026-05-20 |      1,752 |       DQ | A100 80GB SXM4 | gpu_ngram_w31_k10    | [dir](submissions/gpu_ngram_w31_k10)    | @follow-up-paq-prediction |
-| 2026-05-20 |     13,936 |       DQ | A100 80GB SXM4 | chunker_phase1_v2    | [dir](submissions/chunker_phase1_v2)    | @explore-chunker-2026-05-19 |
-| 2026-05-20 |     24,417 |       DQ | A100 80GB SXM4 | bpe_internal_nn_v2   | [dir](submissions/bpe_internal_nn_v2)   | @subagent-xorfix-2026-05-19 |
+| 2026-05-19 |     60,864 |       DQ | A100 80GB PCIe | mamba_byte           | [dir](submissions/mamba_byte)           | @gabrielnan |
+| 2026-05-20 |      1,752 |       DQ | A100 80GB SXM4 | gpu_ngram_w31_k10    | [dir](submissions/gpu_ngram_w31_k10)    | @gabrielnan |
+| 2026-05-20 |     13,936 |       DQ | A100 80GB SXM4 | chunker_phase1_v2    | [dir](submissions/chunker_phase1_v2)    | @gabrielnan |
+| 2026-05-20 |     24,417 |       DQ | A100 80GB SXM4 | bpe_internal_nn_v2   | [dir](submissions/bpe_internal_nn_v2)   | @gabrielnan |
 | 2026-05-20 |     53,683 | 0.7246    | A100 80GB PCIe | lwta_k4              | [dir](submissions/lwta_k4)              | @ab-10 (re-run on new harness; total_J = 44,329 gpu + 9,354 cpu) |
 | 2026-05-20 |     54,614 | 0.7145    | A100 80GB PCIe | lwta_k2              | [dir](submissions/lwta_k2)              | @ab-10 (re-run on new harness; total_J = 44,583 gpu + 10,031 cpu) |
-| 2026-05-21 |      2,474 | 0.7031    | A100 80GB PCIe | subset_70_mkn        | [dir](submissions/subset_70_mkn)        | @exp-batch-iter4 |
-| 2026-05-21 |      3,092 | 0.7050    | A100 80GB PCIe | gpu_ngram_w31_k11    | [dir](submissions/gpu_ngram_w31_k11)    | @follow-up-paq-prediction |
-| 2026-05-21 |      4,607 | 0.7047    | A100 80GB PCIe | paq_mixer_v3         | [dir](submissions/paq_mixer_v3)         | @worker-paq-mixer |
-| 2026-05-21 |      8,602 | 0.7184    | A100 80GB PCIe | gpu_ngram_o14_xorfix | [dir](submissions/gpu_ngram_o14_xorfix) | @subagent-xorfix-2026-05-19 |
-| 2026-05-21 |      9,591 | 0.7063    | A100 80GB PCIe | chunker_phase1_v1    | [dir](submissions/chunker_phase1_v1)    | @explore-chunker-2026-05-19 |
-| 2026-05-21 |     14,578 | 0.7184    | A100 80GB PCIe | deep_backoff_kn      | [dir](submissions/deep_backoff_kn)      | @nakajimagabriel |
-| 2026-05-21 |     19,922 | 0.7328    | A100 80GB SXM4 | lwta_k4_alpha_065    | [dir](submissions/lwta_k4_alpha_065)    | @subagent-L2clean-2026-05-19 |
-| 2026-05-21 |     20,743 | 0.7390    | A100 80GB SXM4 | alpha_06             | [dir](submissions/alpha_06)             | @subagent-xorfix-2026-05-19 |
+| 2026-05-21 |      2,474 | 0.7031    | A100 80GB PCIe | subset_70_mkn        | [dir](submissions/subset_70_mkn)        | @gabrielnan |
+| 2026-05-21 |      3,092 | 0.7050    | A100 80GB PCIe | gpu_ngram_w31_k11    | [dir](submissions/gpu_ngram_w31_k11)    | @gabrielnan |
+| 2026-05-21 |      4,607 | 0.7047    | A100 80GB PCIe | paq_mixer_v3         | [dir](submissions/paq_mixer_v3)         | @gabrielnan |
+| 2026-05-21 |      8,602 | 0.7184    | A100 80GB PCIe | gpu_ngram_o14_xorfix | [dir](submissions/gpu_ngram_o14_xorfix) | @gabrielnan |
+| 2026-05-21 |      9,591 | 0.7063    | A100 80GB PCIe | chunker_phase1_v1    | [dir](submissions/chunker_phase1_v1)    | @gabrielnan |
+| 2026-05-21 |     14,578 | 0.7184    | A100 80GB PCIe | deep_backoff_kn      | [dir](submissions/deep_backoff_kn)      | @gabrielnan |
+| 2026-05-21 |     19,922 | 0.7328    | A100 80GB SXM4 | lwta_k4_alpha_065    | [dir](submissions/lwta_k4_alpha_065)    | @gabrielnan |
+| 2026-05-21 |     20,743 | 0.7390    | A100 80GB SXM4 | alpha_06             | [dir](submissions/alpha_06)             | @gabrielnan |
 | 2026-05-21 |     62,006 | 0.7337    | A100 80GB SXM4 | modded_nanogpt       | [dir](submissions/modded_nanogpt)       | @ab-10 |
 
 
diff --git a/submissions/adamw_lr3e3_wd0_long/result.json b/submissions/adamw_lr3e3_wd0_long/result.json
index bc31931..8e42955 100644
--- a/submissions/adamw_lr3e3_wd0_long/result.json
+++ b/submissions/adamw_lr3e3_wd0_long/result.json
@@ -17,5 +17,5 @@
     "gpu_name": "NVIDIA A100-SXM4-80GB",
     "notes": []
   },
-  "contributor": "@explore-reopen-adamw"
+  "contributor": "@gabrielnan"
 }
diff --git a/submissions/adamw_lr3e3_wd0_long/submission.py b/submissions/adamw_lr3e3_wd0_long/submission.py
index a922fbf..5aaa6fd 100644
--- a/submissions/adamw_lr3e3_wd0_long/submission.py
+++ b/submissions/adamw_lr3e3_wd0_long/submission.py
@@ -26,7 +26,7 @@
 """
 from __future__ import annotations
 
-__author__ = "@explore-reopen-adamw"
+__author__ = "@gabrielnan"
 
 import math
 import os
diff --git a/submissions/alpha_06/result.json b/submissions/alpha_06/result.json
index b96b4ca..ac1a778 100644
--- a/submissions/alpha_06/result.json
+++ b/submissions/alpha_06/result.json
@@ -19,5 +19,5 @@
     "gpu_name": "NVIDIA A100-SXM4-80GB",
     "notes": []
   },
-  "contributor": "@subagent-xorfix-2026-05-19"
+  "contributor": "@gabrielnan"
 }
diff --git a/submissions/alpha_06/result.sxm4.json b/submissions/alpha_06/result.sxm4.json
new file mode 100644
index 0000000..96475b1
--- /dev/null
+++ b/submissions/alpha_06/result.sxm4.json
@@ -0,0 +1,21 @@
+{
+  "submission": "alpha_06",
+  "training_energy_J": 14047.8704136,
+  "training_duration_s": 159.464391728,
+  "val_char_accuracy": 0.7437,
+  "val_chars": 60000,
+  "gpu_name": "NVIDIA A100-SXM4-80GB",
+  "date_utc": "2026-05-20T01:10:10Z",
+  "_nvml": {
+    "nvml_available": true,
+    "energy_counter_supported": true,
+    "monotonic": true,
+    "idle_watts": 61.72905000000001,
+    "stress_watts_avg": 336.50938625277473,
+    "stress_energy_joules": 12708.794,
+    "stress_duration_s": 37.766536445,
+    "gpu_name": "NVIDIA A100-SXM4-80GB",
+    "notes": []
+  },
+  "contributor": "@subagent-xorfix-2026-05-19"
+}
diff --git a/submissions/alpha_06/run.sxm4.log b/submissions/alpha_06/run.sxm4.log
new file mode 100644
index 0000000..da9fb6f
--- /dev/null
+++ b/submissions/alpha_06/run.sxm4.log
@@ -0,0 +1,144 @@
+# wikitext submit.py log — alpha_06 — 2026-05-20T00:58:54+00:00Z
+[modal] launching A100-80GB ...
+✓ Initialized. View run at 
+https://modal.com/apps/gabriel-nakajima-an/main/ap-4rmVWzqcQj1VatkwS8Da4X
+✓ Created objects.
+├── 🔨 Created mount /Users/naka/src/sutro/wikitext/submit.py
+├── 🔨 Created mount /Users/naka/src/sutro/wikitext/task.py
+├── 🔨 Created mount /Users/naka/src/sutro/wikitext/verify_nvml.py
+├── 🔨 Created mount /Users/naka/src/sutro/wikitext/wikitext.py
+├── 🔨 Created mount /Users/naka/src/sutro/wikitext/run_eval.py
+└── 🔨 Created function run_submission.
+[modal] verifying NVML energy counter ...
+GPU: NVIDIA A100-SXM4-80GB
+sampling idle power for 3s ...
+  idle: 61.7 W
+running 30s stress workload ...
+  duration:       37.8 s
+  energy delta:   12,708.8 J
+  avg power:      336.5 W
+  monotonic:      True
+---
+{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 61.72905000000001, "stress_watts_avg": 336.50938625277473, "stress_energy_joules": 12708.794, "stress_duration_s": 37.766536445, "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": []}
+[modal] running submission (TEST_CHARS=60000 MAX_TRAIN_SECONDS=300.0 ACC_MIN=0.7) ...
+loading WikiText-103 from /data ...
+  train chars: 540,095,682
+  val   chars: 60,000  (scored, gated by --acc-min)
+train wall-clock cap: 300 s
+val accuracy floor : 0.7000
+training submission /workspace/alpha_06.py ...
+[clean_w31] starting GPU KN build; max_order=12 D=0.5
+[clean_w31] top order=12 unique pairs: 157,942,722  2.6s
+[clean_w31] ctx_len=11 ctxs=119,285,712 30.5s
+[clean_w31] ctx_len=10 ctxs=84,282,364 20.3s
+[clean_w31] ctx_len=9 ctxs=54,720,376 13.2s
+[clean_w31] ctx_len=8 ctxs=31,924,091 7.9s
+[clean_w31] ctx_len=7 ctxs=16,284,921 4.2s
+[clean_w31] ctx_len=6 ctxs=7,016,442 1.9s
+[clean_w31] ctx_len=5 ctxs=2,438,281 0.7s
+[clean_w31] ctx_len=4 ctxs=637,143 0.2s
+[clean_w31] ctx_len=3 ctxs=122,882 0.0s
+[clean_w31] ctx_len=2 ctxs=12,282 0.0s
+[clean_w31] ctx_len=1 ctxs=204 0.0s
+[clean_w31] ctx_len=0 ctxs=1 0.0s
+[clean_w31] KN build done: 81.6s
+[clean_w31] NN 3.29M params  cfg=TrainConfig(d=256 L=4 H=4 bs=32 T=1024 steps=1200)
+[clean_w31] NN step     0/1200  loss 5.5452  elapsed 1s
+[clean_w31] NN step   100/1200  loss 1.7587  elapsed 7s
+[clean_w31] NN step   200/1200  loss 1.4674  elapsed 13s
+[clean_w31] NN step   300/1200  loss 1.3990  elapsed 19s
+[clean_w31] NN step   400/1200  loss 1.3359  elapsed 25s
+[clean_w31] NN step   500/1200  loss 1.2644  elapsed 32s
+[clean_w31] NN step   600/1200  loss 1.2352  elapsed 38s
+[clean_w31] NN step   700/1200  loss 1.1895  elapsed 44s
+[clean_w31] NN step   800/1200  loss 1.1475  elapsed 50s
+[clean_w31] NN step   900/1200  loss 1.1349  elapsed 56s
+[clean_w31] NN step  1000/1200  loss 1.1164  elapsed 62s
+[clean_w31] NN step  1100/1200  loss 1.1330  elapsed 68s
+[clean_w31] NN step  1199/1200  loss 1.1061  elapsed 74s
+training: 14,047.9 J   duration=159.5s
+evaluating on val split ...
+  eval      1,200/60,000 (  2.0%)  acc=0.7333      127 char/s  eta=   464s
+  eval      2,400/60,000 (  4.0%)  acc=0.7221      126 char/s  eta=   456s
+  eval      3,600/60,000 (  6.0%)  acc=0.7222      126 char/s  eta=   447s
+  eval      4,800/60,000 (  8.0%)  acc=0.7298      126 char/s  eta=   437s
+  eval      6,000/60,000 ( 10.0%)  acc=0.7277      127 char/s  eta=   426s
+  eval      7,200/60,000 ( 12.0%)  acc=0.7235      127 char/s  eta=   415s
+  eval      8,400/60,000 ( 14.0%)  acc=0.7229      128 char/s  eta=   404s
+  eval      9,600/60,000 ( 16.0%)  acc=0.7286      128 char/s  eta=   394s
+  eval     10,800/60,000 ( 18.0%)  acc=0.7342      128 char/s  eta=   384s
+  eval     12,000/60,000 ( 20.0%)  acc=0.7347      128 char/s  eta=   374s
+  eval     13,200/60,000 ( 22.0%)  acc=0.7391      128 char/s  eta=   364s
+  eval     14,400/60,000 ( 24.0%)  acc=0.7410      129 char/s  eta=   355s
+  eval     15,600/60,000 ( 26.0%)  acc=0.7424      129 char/s  eta=   345s
+  eval     16,800/60,000 ( 28.0%)  acc=0.7456      129 char/s  eta=   336s
+  eval     18,000/60,000 ( 30.0%)  acc=0.7466      129 char/s  eta=   326s
+  eval     19,200/60,000 ( 32.0%)  acc=0.7496      129 char/s  eta=   317s
+  eval     20,400/60,000 ( 34.0%)  acc=0.7513      129 char/s  eta=   307s
+  eval     21,600/60,000 ( 36.0%)  acc=0.7513      129 char/s  eta=   298s
+  eval     22,800/60,000 ( 38.0%)  acc=0.7514      129 char/s  eta=   288s
+  eval     24,000/60,000 ( 40.0%)  acc=0.7513      129 char/s  eta=   279s
+  eval     25,200/60,000 ( 42.0%)  acc=0.7514      129 char/s  eta=   270s
+  eval     26,400/60,000 ( 44.0%)  acc=0.7524      129 char/s  eta=   260s
+  eval     27,600/60,000 ( 46.0%)  acc=0.7518      129 char/s  eta=   251s
+  eval     28,800/60,000 ( 48.0%)  acc=0.7523      129 char/s  eta=   242s
+  eval     30,000/60,000 ( 50.0%)  acc=0.7518      129 char/s  eta=   232s
+  eval     31,200/60,000 ( 52.0%)  acc=0.7493      129 char/s  eta=   223s
+  eval     32,400/60,000 ( 54.0%)  acc=0.7480      129 char/s  eta=   214s
+  eval     33,600/60,000 ( 56.0%)  acc=0.7460      129 char/s  eta=   204s
+  eval     34,800/60,000 ( 58.0%)  acc=0.7462      129 char/s  eta=   195s
+  eval     36,000/60,000 ( 60.0%)  acc=0.7464      129 char/s  eta=   186s
+  eval     37,200/60,000 ( 62.0%)  acc=0.7463      129 char/s  eta=   176s
+  eval     38,400/60,000 ( 64.0%)  acc=0.7460      129 char/s  eta=   167s
+  eval     39,600/60,000 ( 66.0%)  acc=0.7457      129 char/s  eta=   158s
+  eval     40,800/60,000 ( 68.0%)  acc=0.7448      129 char/s  eta=   148s
+  eval     42,000/60,000 ( 70.0%)  acc=0.7439      130 char/s  eta=   139s
+  eval     43,200/60,000 ( 72.0%)  acc=0.7438      129 char/s  eta=   130s
+  eval     44,400/60,000 ( 74.0%)  acc=0.7434      129 char/s  eta=   120s
+  eval     45,600/60,000 ( 76.0%)  acc=0.7432      129 char/s  eta=   111s
+  eval     46,800/60,000 ( 78.0%)  acc=0.7425      130 char/s  eta=   102s
+  eval     48,000/60,000 ( 80.0%)  acc=0.7425      130 char/s  eta=    93s
+  eval     49,200/60,000 ( 82.0%)  acc=0.7423      130 char/s  eta=    83s
+  eval     50,400/60,000 ( 84.0%)  acc=0.7430      130 char/s  eta=    74s
+  eval     51,600/60,000 ( 86.0%)  acc=0.7432      130 char/s  eta=    65s
+  eval     52,800/60,000 ( 88.0%)  acc=0.7428      130 char/s  eta=    56s
+  eval     54,000/60,000 ( 90.0%)  acc=0.7429      130 char/s  eta=    46s
+  eval     55,200/60,000 ( 92.0%)  acc=0.7419      130 char/s  eta=    37s
+  eval     56,400/60,000 ( 94.0%)  acc=0.7420      130 char/s  eta=    28s
+  eval     57,600/60,000 ( 96.0%)  acc=0.7423      130 char/s  eta=    19s
+  eval     58,800/60,000 ( 98.0%)  acc=0.7429      130 char/s  eta=     9s
+  eval     60,000/60,000 (100.0%)  acc=0.7437      130 char/s  eta=     0s
+chars=60,000  acc=0.7437  eval_duration=462.8s
+---
+submission         : alpha_06
+training energy (J): 14,047.9
+training duration  : 159.5s
+val  char-accuracy : 0.7437
+val  chars         : 60,000
+wrote /tmp/result.json
+Stopping app - local entrypoint completed.
+✓ App completed. View run at 
+https://modal.com/apps/gabriel-nakajima-an/main/ap-4rmVWzqcQj1VatkwS8Da4X
+
+# final result
+{
+  "submission": "alpha_06",
+  "training_energy_J": 14047.8704136,
+  "training_duration_s": 159.464391728,
+  "val_char_accuracy": 0.7437,
+  "val_chars": 60000,
+  "gpu_name": "NVIDIA A100-SXM4-80GB",
+  "date_utc": "2026-05-20T01:10:10Z",
+  "_nvml": {
+    "nvml_available": true,
+    "energy_counter_supported": true,
+    "monotonic": true,
+    "idle_watts": 61.72905000000001,
+    "stress_watts_avg": 336.50938625277473,
+    "stress_energy_joules": 12708.794,
+    "stress_duration_s": 37.766536445,
+    "gpu_name": "NVIDIA A100-SXM4-80GB",
+    "notes": []
+  },
+  "contributor": "@subagent-xorfix-2026-05-19"
+}
diff --git a/submissions/alpha_06/submission.py b/submissions/alpha_06/submission.py
index 7bc034f..ad88845 100644
--- a/submissions/alpha_06/submission.py
+++ b/submissions/alpha_06/submission.py
@@ -8,7 +8,7 @@
 """
 from __future__ import annotations
 
-__author__ = "@subagent-xorfix-2026-05-19"
+__author__ = "@gabrielnan"
 
 import os
 import time
diff --git a/submissions/bpe_internal_nn_v2/result.json b/submissions/bpe_internal_nn_v2/result.json
index 260f72f..8a28017 100644
--- a/submissions/bpe_internal_nn_v2/result.json
+++ b/submissions/bpe_internal_nn_v2/result.json
@@ -20,5 +20,5 @@
     "gpu_name": "NVIDIA A100-SXM4-80GB",
     "notes": []
   },
-  "contributor": "@subagent-xorfix-2026-05-19"
+  "contributor": "@gabrielnan"
 }
diff --git a/submissions/bpe_internal_nn_v2/submission.py b/submissions/bpe_internal_nn_v2/submission.py
index ad60520..1fe4680 100644
--- a/submissions/bpe_internal_nn_v2/submission.py
+++ b/submissions/bpe_internal_nn_v2/submission.py
@@ -19,7 +19,7 @@
 """
 from __future__ import annotations
 
-__author__ = "@subagent-xorfix-2026-05-19"
+__author__ = "@gabrielnan"
 
 import concurrent.futures
 import os
diff --git a/submissions/chunker_phase1_v1/result.json b/submissions/chunker_phase1_v1/result.json
index d95dc4f..4eb30f9 100644
--- a/submissions/chunker_phase1_v1/result.json
+++ b/submissions/chunker_phase1_v1/result.json
@@ -19,5 +19,5 @@
     "gpu_name": "NVIDIA A100 80GB PCIe",
     "notes": []
   },
-  "contributor": "@explore-chunker-2026-05-19"
+  "contributor": "@gabrielnan"
 }
diff --git a/submissions/chunker_phase1_v1/submission.py b/submissions/chunker_phase1_v1/submission.py
index e57482d..11f94eb 100644
--- a/submissions/chunker_phase1_v1/submission.py
+++ b/submissions/chunker_phase1_v1/submission.py
@@ -37,7 +37,7 @@
 """
 from __future__ import annotations
 
-__author__ = "@explore-chunker-2026-05-19"
+__author__ = "@gabrielnan"
 
 import os
 import time
diff --git a/submissions/chunker_phase1_v2/result.json b/submissions/chunker_phase1_v2/result.json
index 4111ea3..a8cfd09 100644
--- a/submissions/chunker_phase1_v2/result.json
+++ b/submissions/chunker_phase1_v2/result.json
@@ -20,5 +20,5 @@
     "gpu_name": "NVIDIA A100-SXM4-80GB",
     "notes": []
   },
-  "contributor": "@explore-chunker-2026-05-19"
+  "contributor": "@gabrielnan"
 }
diff --git a/submissions/chunker_phase1_v2/submission.py b/submissions/chunker_phase1_v2/submission.py
index a142cff..42d380a 100644
--- a/submissions/chunker_phase1_v2/submission.py
+++ b/submissions/chunker_phase1_v2/submission.py
@@ -45,7 +45,7 @@
 """
 from __future__ import annotations
 
-__author__ = "@explore-chunker-2026-05-19"
+__author__ = "@gabrielnan"
 
 import os
 import time
diff --git a/submissions/deep_backoff_kn/result.json b/submissions/deep_backoff_kn/result.json
index 934a739..2ddfa31 100644
--- a/submissions/deep_backoff_kn/result.json
+++ b/submissions/deep_backoff_kn/result.json
@@ -19,5 +19,5 @@
     "gpu_name": "NVIDIA A100 80GB PCIe",
     "notes": []
   },
-  "contributor": "@nakajimagabriel"
+  "contributor": "@gabrielnan"
 }
diff --git a/submissions/deep_backoff_kn/submission.py b/submissions/deep_backoff_kn/submission.py
index 0e41b74..1097562 100644
--- a/submissions/deep_backoff_kn/submission.py
+++ b/submissions/deep_backoff_kn/submission.py
@@ -28,7 +28,7 @@
 """
 from __future__ import annotations
 
-__author__ = "@nakajimagabriel"
+__author__ = "@gabrielnan"
 
 import multiprocessing
 import os
diff --git a/submissions/gpu_ngram_o14_xorfix/result.json b/submissions/gpu_ngram_o14_xorfix/result.json
index 2b79e7d..5e32485 100644
--- a/submissions/gpu_ngram_o14_xorfix/result.json
+++ b/submissions/gpu_ngram_o14_xorfix/result.json
@@ -19,5 +19,5 @@
     "gpu_name": "NVIDIA A100 80GB PCIe",
     "notes": []
   },
-  "contributor": "@subagent-xorfix-2026-05-19"
+  "contributor": "@gabrielnan"
 }
diff --git a/submissions/gpu_ngram_o14_xorfix/submission.py b/submissions/gpu_ngram_o14_xorfix/submission.py
index a3ed390..b1fec29 100644
--- a/submissions/gpu_ngram_o14_xorfix/submission.py
+++ b/submissions/gpu_ngram_o14_xorfix/submission.py
@@ -43,7 +43,7 @@ class is identical) but the GLOBAL order of distinct (hi, lo) keys is
 """
 from __future__ import annotations
 
-__author__ = "@subagent-xorfix-2026-05-19"
+__author__ = "@gabrielnan"
 
 import os
 import time
diff --git a/submissions/gpu_ngram_w31_k10/result.json b/submissions/gpu_ngram_w31_k10/result.json
index c4de566..f946b23 100644
--- a/submissions/gpu_ngram_w31_k10/result.json
+++ b/submissions/gpu_ngram_w31_k10/result.json
@@ -22,5 +22,5 @@
     "gpu_name": "NVIDIA A100-SXM4-80GB",
     "notes": []
   },
-  "contributor": "@follow-up-paq-prediction"
+  "contributor": "@gabrielnan"
 }
diff --git a/submissions/gpu_ngram_w31_k10/submission.py b/submissions/gpu_ngram_w31_k10/submission.py
index eee7390..eb9d7c6 100644
--- a/submissions/gpu_ngram_w31_k10/submission.py
+++ b/submissions/gpu_ngram_w31_k10/submission.py
@@ -27,7 +27,7 @@
 """
 from __future__ import annotations
 
-__author__ = "@follow-up-paq-prediction"
+__author__ = "@gabrielnan"
 
 import os
 import time
diff --git a/submissions/gpu_ngram_w31_k11/result.json b/submissions/gpu_ngram_w31_k11/result.json
index 3d6a2df..60452b1 100644
--- a/submissions/gpu_ngram_w31_k11/result.json
+++ b/submissions/gpu_ngram_w31_k11/result.json
@@ -19,5 +19,5 @@
     "gpu_name": "NVIDIA A100 80GB PCIe",
     "notes": []
   },
-  "contributor": "@follow-up-paq-prediction"
+  "contributor": "@gabrielnan"
 }
diff --git a/submissions/gpu_ngram_w31_k11/submission.py b/submissions/gpu_ngram_w31_k11/submission.py
index 9e0a8a2..bfbfdec 100644
--- a/submissions/gpu_ngram_w31_k11/submission.py
+++ b/submissions/gpu_ngram_w31_k11/submission.py
@@ -27,7 +27,7 @@
 """
 from __future__ import annotations
 
-__author__ = "@follow-up-paq-prediction"
+__author__ = "@gabrielnan"
 
 import os
 import time
diff --git a/submissions/lwta_k4_alpha_065/result.json b/submissions/lwta_k4_alpha_065/result.json
index b2d6674..167d108 100644
--- a/submissions/lwta_k4_alpha_065/result.json
+++ b/submissions/lwta_k4_alpha_065/result.json
@@ -19,5 +19,5 @@
     "gpu_name": "NVIDIA A100-SXM4-80GB",
     "notes": []
   },
-  "contributor": "@subagent-L2clean-2026-05-19"
+  "contributor": "@gabrielnan"
 }
diff --git a/submissions/lwta_k4_alpha_065/submission.py b/submissions/lwta_k4_alpha_065/submission.py
index bae85ab..a91ec6d 100644
--- a/submissions/lwta_k4_alpha_065/submission.py
+++ b/submissions/lwta_k4_alpha_065/submission.py
@@ -22,7 +22,7 @@
 """
 from __future__ import annotations
 
-__author__ = "@subagent-L2clean-2026-05-19"
+__author__ = "@gabrielnan"
 
 import os
 import time
diff --git a/submissions/mamba_byte/result.json b/submissions/mamba_byte/result.json
index 60fe461..7a4c60f 100644
--- a/submissions/mamba_byte/result.json
+++ b/submissions/mamba_byte/result.json
@@ -18,5 +18,5 @@
     "gpu_name": "NVIDIA A100 80GB PCIe",
     "notes": []
   },
-  "contributor": "@claude-mamba"
+  "contributor": "@gabrielnan"
 }
diff --git a/submissions/mamba_byte/submission.py b/submissions/mamba_byte/submission.py
index d891094..46a4cf7 100644
--- a/submissions/mamba_byte/submission.py
+++ b/submissions/mamba_byte/submission.py
@@ -56,7 +56,7 @@
 """
 from __future__ import annotations
 
-__author__ = "@claude-mamba"
+__author__ = "@gabrielnan"
 
 import math
 import os
diff --git a/submissions/paq_mixer_v3/result.json b/submissions/paq_mixer_v3/result.json
index da2d7dc..f9b455c 100644
--- a/submissions/paq_mixer_v3/result.json
+++ b/submissions/paq_mixer_v3/result.json
@@ -19,5 +19,5 @@
     "gpu_name": "NVIDIA A100 80GB PCIe",
     "notes": []
   },
-  "contributor": "@worker-paq-mixer"
+  "contributor": "@gabrielnan"
 }
diff --git a/submissions/paq_mixer_v3/submission.py b/submissions/paq_mixer_v3/submission.py
index 0fb380b..fda5d1d 100644
--- a/submissions/paq_mixer_v3/submission.py
+++ b/submissions/paq_mixer_v3/submission.py
@@ -35,7 +35,7 @@
 """
 from __future__ import annotations
 
-__author__ = "@worker-paq-mixer"
+__author__ = "@gabrielnan"
 
 import os
 import time
diff --git a/submissions/subset_70_mkn/result.json b/submissions/subset_70_mkn/result.json
index e956192..1d80233 100644
--- a/submissions/subset_70_mkn/result.json
+++ b/submissions/subset_70_mkn/result.json
@@ -19,5 +19,5 @@
     "gpu_name": "NVIDIA A100 80GB PCIe",
     "notes": []
   },
-  "contributor": "@exp-batch-iter4"
+  "contributor": "@gabrielnan"
 }
diff --git a/submissions/subset_70_mkn/submission.py b/submissions/subset_70_mkn/submission.py
index 340faf8..b6cd0a7 100644
--- a/submissions/subset_70_mkn/submission.py
+++ b/submissions/subset_70_mkn/submission.py
@@ -27,7 +27,7 @@
 """
 from __future__ import annotations
 
-__author__ = "@exp-batch-iter4"
+__author__ = "@gabrielnan"
 
 import os
 import time

From 0054c31c6455005b00c9f7cc80151cd06cb64e1e Mon Sep 17 00:00:00 2001
From: Gabriel Nakajima An <naka@Gabriels-MacBook-Pro.local>
Date: Mon, 25 May 2026 15:24:24 -0700
Subject: [PATCH 5/5] Drop historical .sxm4.json/.log snapshots accidentally
 added

result.sxm4.json + run.sxm4.log under submissions/alpha_06/ were
snapshots from an earlier re-run attempt (preserving the SXM4 result
before re-launching for a PCIe). They aren't part of the canonical
submission artifact set and don't belong on the leaderboard branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 submissions/alpha_06/result.sxm4.json |  21 ----
 submissions/alpha_06/run.sxm4.log     | 144 --------------------------
 2 files changed, 165 deletions(-)
 delete mode 100644 submissions/alpha_06/result.sxm4.json
 delete mode 100644 submissions/alpha_06/run.sxm4.log

diff --git a/submissions/alpha_06/result.sxm4.json b/submissions/alpha_06/result.sxm4.json
deleted file mode 100644
index 96475b1..0000000
--- a/submissions/alpha_06/result.sxm4.json
+++ /dev/null
@@ -1,21 +0,0 @@
-{
-  "submission": "alpha_06",
-  "training_energy_J": 14047.8704136,
-  "training_duration_s": 159.464391728,
-  "val_char_accuracy": 0.7437,
-  "val_chars": 60000,
-  "gpu_name": "NVIDIA A100-SXM4-80GB",
-  "date_utc": "2026-05-20T01:10:10Z",
-  "_nvml": {
-    "nvml_available": true,
-    "energy_counter_supported": true,
-    "monotonic": true,
-    "idle_watts": 61.72905000000001,
-    "stress_watts_avg": 336.50938625277473,
-    "stress_energy_joules": 12708.794,
-    "stress_duration_s": 37.766536445,
-    "gpu_name": "NVIDIA A100-SXM4-80GB",
-    "notes": []
-  },
-  "contributor": "@subagent-xorfix-2026-05-19"
-}
diff --git a/submissions/alpha_06/run.sxm4.log b/submissions/alpha_06/run.sxm4.log
deleted file mode 100644
index da9fb6f..0000000
--- a/submissions/alpha_06/run.sxm4.log
+++ /dev/null
@@ -1,144 +0,0 @@
-# wikitext submit.py log — alpha_06 — 2026-05-20T00:58:54+00:00Z
-[modal] launching A100-80GB ...
-✓ Initialized. View run at 
-https://modal.com/apps/gabriel-nakajima-an/main/ap-4rmVWzqcQj1VatkwS8Da4X
-✓ Created objects.
-├── 🔨 Created mount /Users/naka/src/sutro/wikitext/submit.py
-├── 🔨 Created mount /Users/naka/src/sutro/wikitext/task.py
-├── 🔨 Created mount /Users/naka/src/sutro/wikitext/verify_nvml.py
-├── 🔨 Created mount /Users/naka/src/sutro/wikitext/wikitext.py
-├── 🔨 Created mount /Users/naka/src/sutro/wikitext/run_eval.py
-└── 🔨 Created function run_submission.
-[modal] verifying NVML energy counter ...
-GPU: NVIDIA A100-SXM4-80GB
-sampling idle power for 3s ...
-  idle: 61.7 W
-running 30s stress workload ...
-  duration:       37.8 s
-  energy delta:   12,708.8 J
-  avg power:      336.5 W
-  monotonic:      True
----
-{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 61.72905000000001, "stress_watts_avg": 336.50938625277473, "stress_energy_joules": 12708.794, "stress_duration_s": 37.766536445, "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": []}
-[modal] running submission (TEST_CHARS=60000 MAX_TRAIN_SECONDS=300.0 ACC_MIN=0.7) ...
-loading WikiText-103 from /data ...
-  train chars: 540,095,682
-  val   chars: 60,000  (scored, gated by --acc-min)
-train wall-clock cap: 300 s
-val accuracy floor : 0.7000
-training submission /workspace/alpha_06.py ...
-[clean_w31] starting GPU KN build; max_order=12 D=0.5
-[clean_w31] top order=12 unique pairs: 157,942,722  2.6s
-[clean_w31] ctx_len=11 ctxs=119,285,712 30.5s
-[clean_w31] ctx_len=10 ctxs=84,282,364 20.3s
-[clean_w31] ctx_len=9 ctxs=54,720,376 13.2s
-[clean_w31] ctx_len=8 ctxs=31,924,091 7.9s
-[clean_w31] ctx_len=7 ctxs=16,284,921 4.2s
-[clean_w31] ctx_len=6 ctxs=7,016,442 1.9s
-[clean_w31] ctx_len=5 ctxs=2,438,281 0.7s
-[clean_w31] ctx_len=4 ctxs=637,143 0.2s
-[clean_w31] ctx_len=3 ctxs=122,882 0.0s
-[clean_w31] ctx_len=2 ctxs=12,282 0.0s
-[clean_w31] ctx_len=1 ctxs=204 0.0s
-[clean_w31] ctx_len=0 ctxs=1 0.0s
-[clean_w31] KN build done: 81.6s
-[clean_w31] NN 3.29M params  cfg=TrainConfig(d=256 L=4 H=4 bs=32 T=1024 steps=1200)
-[clean_w31] NN step     0/1200  loss 5.5452  elapsed 1s
-[clean_w31] NN step   100/1200  loss 1.7587  elapsed 7s
-[clean_w31] NN step   200/1200  loss 1.4674  elapsed 13s
-[clean_w31] NN step   300/1200  loss 1.3990  elapsed 19s
-[clean_w31] NN step   400/1200  loss 1.3359  elapsed 25s
-[clean_w31] NN step   500/1200  loss 1.2644  elapsed 32s
-[clean_w31] NN step   600/1200  loss 1.2352  elapsed 38s
-[clean_w31] NN step   700/1200  loss 1.1895  elapsed 44s
-[clean_w31] NN step   800/1200  loss 1.1475  elapsed 50s
-[clean_w31] NN step   900/1200  loss 1.1349  elapsed 56s
-[clean_w31] NN step  1000/1200  loss 1.1164  elapsed 62s
-[clean_w31] NN step  1100/1200  loss 1.1330  elapsed 68s
-[clean_w31] NN step  1199/1200  loss 1.1061  elapsed 74s
-training: 14,047.9 J   duration=159.5s
-evaluating on val split ...
-  eval      1,200/60,000 (  2.0%)  acc=0.7333      127 char/s  eta=   464s
-  eval      2,400/60,000 (  4.0%)  acc=0.7221      126 char/s  eta=   456s
-  eval      3,600/60,000 (  6.0%)  acc=0.7222      126 char/s  eta=   447s
-  eval      4,800/60,000 (  8.0%)  acc=0.7298      126 char/s  eta=   437s
-  eval      6,000/60,000 ( 10.0%)  acc=0.7277      127 char/s  eta=   426s
-  eval      7,200/60,000 ( 12.0%)  acc=0.7235      127 char/s  eta=   415s
-  eval      8,400/60,000 ( 14.0%)  acc=0.7229      128 char/s  eta=   404s
-  eval      9,600/60,000 ( 16.0%)  acc=0.7286      128 char/s  eta=   394s
-  eval     10,800/60,000 ( 18.0%)  acc=0.7342      128 char/s  eta=   384s
-  eval     12,000/60,000 ( 20.0%)  acc=0.7347      128 char/s  eta=   374s
-  eval     13,200/60,000 ( 22.0%)  acc=0.7391      128 char/s  eta=   364s
-  eval     14,400/60,000 ( 24.0%)  acc=0.7410      129 char/s  eta=   355s
-  eval     15,600/60,000 ( 26.0%)  acc=0.7424      129 char/s  eta=   345s
-  eval     16,800/60,000 ( 28.0%)  acc=0.7456      129 char/s  eta=   336s
-  eval     18,000/60,000 ( 30.0%)  acc=0.7466      129 char/s  eta=   326s
-  eval     19,200/60,000 ( 32.0%)  acc=0.7496      129 char/s  eta=   317s
-  eval     20,400/60,000 ( 34.0%)  acc=0.7513      129 char/s  eta=   307s
-  eval     21,600/60,000 ( 36.0%)  acc=0.7513      129 char/s  eta=   298s
-  eval     22,800/60,000 ( 38.0%)  acc=0.7514      129 char/s  eta=   288s
-  eval     24,000/60,000 ( 40.0%)  acc=0.7513      129 char/s  eta=   279s
-  eval     25,200/60,000 ( 42.0%)  acc=0.7514      129 char/s  eta=   270s
-  eval     26,400/60,000 ( 44.0%)  acc=0.7524      129 char/s  eta=   260s
-  eval     27,600/60,000 ( 46.0%)  acc=0.7518      129 char/s  eta=   251s
-  eval     28,800/60,000 ( 48.0%)  acc=0.7523      129 char/s  eta=   242s
-  eval     30,000/60,000 ( 50.0%)  acc=0.7518      129 char/s  eta=   232s
-  eval     31,200/60,000 ( 52.0%)  acc=0.7493      129 char/s  eta=   223s
-  eval     32,400/60,000 ( 54.0%)  acc=0.7480      129 char/s  eta=   214s
-  eval     33,600/60,000 ( 56.0%)  acc=0.7460      129 char/s  eta=   204s
-  eval     34,800/60,000 ( 58.0%)  acc=0.7462      129 char/s  eta=   195s
-  eval     36,000/60,000 ( 60.0%)  acc=0.7464      129 char/s  eta=   186s
-  eval     37,200/60,000 ( 62.0%)  acc=0.7463      129 char/s  eta=   176s
-  eval     38,400/60,000 ( 64.0%)  acc=0.7460      129 char/s  eta=   167s
-  eval     39,600/60,000 ( 66.0%)  acc=0.7457      129 char/s  eta=   158s
-  eval     40,800/60,000 ( 68.0%)  acc=0.7448      129 char/s  eta=   148s
-  eval     42,000/60,000 ( 70.0%)  acc=0.7439      130 char/s  eta=   139s
-  eval     43,200/60,000 ( 72.0%)  acc=0.7438      129 char/s  eta=   130s
-  eval     44,400/60,000 ( 74.0%)  acc=0.7434      129 char/s  eta=   120s
-  eval     45,600/60,000 ( 76.0%)  acc=0.7432      129 char/s  eta=   111s
-  eval     46,800/60,000 ( 78.0%)  acc=0.7425      130 char/s  eta=   102s
-  eval     48,000/60,000 ( 80.0%)  acc=0.7425      130 char/s  eta=    93s
-  eval     49,200/60,000 ( 82.0%)  acc=0.7423      130 char/s  eta=    83s
-  eval     50,400/60,000 ( 84.0%)  acc=0.7430      130 char/s  eta=    74s
-  eval     51,600/60,000 ( 86.0%)  acc=0.7432      130 char/s  eta=    65s
-  eval     52,800/60,000 ( 88.0%)  acc=0.7428      130 char/s  eta=    56s
-  eval     54,000/60,000 ( 90.0%)  acc=0.7429      130 char/s  eta=    46s
-  eval     55,200/60,000 ( 92.0%)  acc=0.7419      130 char/s  eta=    37s
-  eval     56,400/60,000 ( 94.0%)  acc=0.7420      130 char/s  eta=    28s
-  eval     57,600/60,000 ( 96.0%)  acc=0.7423      130 char/s  eta=    19s
-  eval     58,800/60,000 ( 98.0%)  acc=0.7429      130 char/s  eta=     9s
-  eval     60,000/60,000 (100.0%)  acc=0.7437      130 char/s  eta=     0s
-chars=60,000  acc=0.7437  eval_duration=462.8s
----
-submission         : alpha_06
-training energy (J): 14,047.9
-training duration  : 159.5s
-val  char-accuracy : 0.7437
-val  chars         : 60,000
-wrote /tmp/result.json
-Stopping app - local entrypoint completed.
-✓ App completed. View run at 
-https://modal.com/apps/gabriel-nakajima-an/main/ap-4rmVWzqcQj1VatkwS8Da4X
-
-# final result
-{
-  "submission": "alpha_06",
-  "training_energy_J": 14047.8704136,
-  "training_duration_s": 159.464391728,
-  "val_char_accuracy": 0.7437,
-  "val_chars": 60000,
-  "gpu_name": "NVIDIA A100-SXM4-80GB",
-  "date_utc": "2026-05-20T01:10:10Z",
-  "_nvml": {
-    "nvml_available": true,
-    "energy_counter_supported": true,
-    "monotonic": true,
-    "idle_watts": 61.72905000000001,
-    "stress_watts_avg": 336.50938625277473,
-    "stress_energy_joules": 12708.794,
-    "stress_duration_s": 37.766536445,
-    "gpu_name": "NVIDIA A100-SXM4-80GB",
-    "notes": []
-  },
-  "contributor": "@subagent-xorfix-2026-05-19"
-}