From 91e1eb8e649dea1f95330bdf029c7fbbeaf19630 Mon Sep 17 00:00:00 2001 From: Gabriel Nakajima An Date: Wed, 20 May 2026 22:44:01 -0700 Subject: [PATCH 1/5] Add total-system-energy reporting via CodeCarbon CPU backend MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes the gap where NVML-only measurement missed host CPU work (raised by @yaroslavvb2 in Telegram on 2026-05-19: "Just total system energy, subject to time + accuracy constraint"; "not counting CPU utilization is a bit of leak"). Approach --- CodeCarbon as the CPU backend, TDP-fallback mode (no MSR / RAPL / ``/dev/cpu/*/msr`` needed — Modal containers can't access them). Field standard for cloud-ML energy reporting: HuggingFace ``Trainer`` auto-logs CodeCarbon when installed; Patterson et al. 2021/2022 used the same TDP-based estimate; ML.ENERGY (Michigan SymbioticLab Zeus) reports GPU-only because the same container constraint applies. Behaviour --- | NVML | CodeCarbon | Behaviour | |------|------------|------------------------------------------| | ✓ | ✓ | both fields populate, total = sum (floor)| | ✓ | ✗ | EnergyMeter() raises RuntimeError | | ✗ | ✓ | soft; both energy fields None | | ✗ | ✗ | soft; both energy fields None | Loud-fail on real-GPU-with-broken-CPU prevents silent half-measurement from landing inconsistent rows on the leaderboard. Dev-box patterns (no GPU) stay soft so local smoke tests on a laptop still work. Code changes --- - ``Measurement`` dataclass gains ``cpu_energy_J`` and ``total_energy_J`` (both ``float | None``, default ``None``). ``__str__`` includes them when populated. - ``EnergyMeter`` refactored to take pluggable ``gpu_backend`` / ``cpu_backend`` / ``p_floor_watts`` kwargs (dependency injection for testability). Default backends wrap pynvml and CodeCarbon. Raises RuntimeError if NVML is available but the CPU backend isn't. - ``measure()`` populates the new fields on the yielded Measurement; ``total_energy_J = max(gpu + cpu, duration_s * p_floor_watts)`` — floor protects against CodeCarbon under-attribution. - ``run_eval.py`` writes the new fields to ``result.json`` in all three exit paths (pass, DQ time, DQ acc). - ``submit.py`` adds ``codecarbon~=3.2`` to the Modal image, and ``append_record`` writes ``total_energy_J`` to README's Record History column when present, falling back to ``training_energy_J`` for pre-PR runs. - ``requirements.txt`` adds ``codecarbon~=3.2`` as a local dep (minor pinned because EnergyMeter reads CodeCarbon's internal ``tracker._total_cpu_energy.kWh``). - ``README.md`` adds a dated banner above the Record History noting that rows ≥ 2026-05-20 report ``total_energy_J``; earlier rows are kept as historical NVML-only readings. - New ``MAINTAINING.md`` at the repo root documents (a) the setup-change re-run rule (when the harness changes in a way that shifts where existing submissions land, re-run the leaderboard rows before merging to main) and (b) the ``main`` ↔ ``dev`` branching cadence (feature PRs target ``dev``; slow-cadence promotion PRs ``dev`` → ``main``). - ``.gitignore`` adds ``submissions/*/.CLAIMED`` (internal slot-claim metadata used by cross-session coordination scripts, not for upstream). Backward compatibility --- No existing field changes meaning, no existing test breaks. - ``energy_joules`` keeps its prior semantic (GPU NVML net of idle baseline). Older ``result.json`` files are interpreted identically. - ``EnergyMeter.available`` still reflects NVML availability only. - The new floor only applies to ``total_energy_J``. - ``submit.py:append_record`` falls back to ``training_energy_J`` for result.json files without the new fields. Tests --- TDD'd with 7 new unit tests, 8 pre-existing tests preserved unmodified: - ``test_energy_meter_total_is_gpu_plus_cpu`` (tracer) - ``test_total_energy_enforces_wall_clock_floor`` (floor binds) - ``test_default_cpu_backend_uses_codecarbon_when_installed`` (live) - ``test_energy_meter_raises_when_gpu_available_but_cpu_missing`` - ``test_energy_meter_no_raise_when_cpu_present_but_gpu_missing`` - ``test_total_energy_none_when_only_one_backend_yields_value`` - ``test_energy_meter_dev_mode_no_raise_when_both_unavailable`` 15/15 pass. Followed up with the anthropics/claude-plugins-official ``code-simplifier`` agent for a clarity pass (dead None-checks removed, redundant ``except: self.available = False`` collapsed, long ternaries wrapped). Leaderboard re-validation (per MAINTAINING.md) --- This PR is itself a setup change, so every leaderboard row on dev is re-run on the new harness before merging to main. All 11 rows landed (PCIe unless noted): | Submission | gpu_J | cpu_J | total_J | acc | |-------------------------|-------:|-------:|--------:|-------:| | subset_70_mkn | 1,351 | 1,124 | 2,474 | 0.7031 | | gpu_ngram_w31_k11 | 1,612 | 1,480 | 3,092 | 0.7050 | | paq_mixer_v3 | 2,355 | 2,252 | 4,607 | 0.7048 | | gpu_ngram_o14_xorfix | 3,981 | 4,621 | 8,602 | 0.7184 | | chunker_phase1_v1 | 5,570 | 4,021 | 9,591 | 0.7063 | | deep_backoff_kn | 963 | 12,338 | 14,578 | 0.7184 | | lwta_k4_alpha_065 (SXM4)| 13,751 | 6,170 | 19,922 | 0.7328 | | alpha_06 (SXM4)| 14,614 | 6,129 | 20,743 | 0.7390 | | lwta_k4 | 44,329 | 9,354 | 53,683 | 0.7246 | | lwta_k2 | 44,583 | 10,031 | 54,614 | 0.7145 | | modded_nanogpt (SXM4)| 51,729 | 10,277 | 62,006 | 0.7337 | Headline: subset_70_mkn lands at 2,474 J total / 0.7031 PCIe — the new clean J leader, 20% under gpu_ngram_w31_k11 (3,092 J / 0.7050) at the same accuracy band. On the prior NVML-only metric those two were a noise-floor tie; the CPU side resolves the tie cleanly because subset_70_mkn's 70%-data trick also cuts the CPU work proportionally. CPU-bound submissions rerank dramatically. deep_backoff_kn (prior NVML: 2,236 J) now reports 14,578 J total — its CPU energy is 12.8× its GPU reading because its n-gram tables are built single-threaded on the host. Now visible at full cost on the leaderboard. Open questions for maintainer review --- - Floor value: 50 W (default) vs 100 W per GPU-slot fair share. 50 W is conservative; 100 W matches dual-EPYC-7763 + DRAM fair-share for an 8-GPU host. One-line change. - Should ``total_energy_J`` be the new canonical ranking metric, or report both side-by-side? Co-Authored-By: Claude Opus 4.7 (1M context) --- .gitignore | 7 + MAINTAINING.md | 54 ++++ README.md | 14 + requirements.txt | 6 + run_eval.py | 6 + submissions/alpha_06/nvml.json | 8 +- submissions/alpha_06/result.json | 18 +- submissions/alpha_06/run.log | 185 +++++------ submissions/chunker_phase1_v1/nvml.json | 10 +- submissions/chunker_phase1_v1/result.json | 22 +- submissions/chunker_phase1_v1/run.log | 187 +++++------ submissions/deep_backoff_kn/nvml.json | 10 +- submissions/deep_backoff_kn/result.json | 27 +- submissions/deep_backoff_kn/run.log | 208 ++++++------- submissions/gpu_ngram_o14_xorfix/nvml.json | 8 +- submissions/gpu_ngram_o14_xorfix/result.json | 18 +- submissions/gpu_ngram_o14_xorfix/run.log | 168 +++++----- submissions/gpu_ngram_w31_k11/nvml.json | 8 +- submissions/gpu_ngram_w31_k11/result.json | 18 +- submissions/gpu_ngram_w31_k11/run.log | 308 +++++-------------- submissions/lwta_k2/nvml.json | 8 +- submissions/lwta_k2/result.json | 18 +- submissions/lwta_k2/run.log | 203 ++++++------ submissions/lwta_k4/nvml.json | 8 +- submissions/lwta_k4/result.json | 18 +- submissions/lwta_k4/run.log | 203 ++++++------ submissions/lwta_k4_alpha_065/nvml.json | 10 +- submissions/lwta_k4_alpha_065/result.json | 22 +- submissions/lwta_k4_alpha_065/run.log | 193 ++++++------ submissions/modded_nanogpt/nvml.json | 10 +- submissions/modded_nanogpt/result.json | 22 +- submissions/modded_nanogpt/run.log | 209 +++++++------ submissions/paq_mixer_v3/nvml.json | 10 +- submissions/paq_mixer_v3/result.json | 22 +- submissions/paq_mixer_v3/run.log | 174 ++++++----- submissions/subset_70_mkn/nvml.json | 10 +- submissions/subset_70_mkn/result.json | 22 +- submissions/subset_70_mkn/run.log | 170 +++++----- submit.py | 16 +- test_wikitext.py | 151 +++++++++ wikitext.py | 136 ++++++-- 41 files changed, 1555 insertions(+), 1370 deletions(-) create mode 100644 MAINTAINING.md diff --git a/.gitignore b/.gitignore index b20b8de..5a4b8a5 100644 --- a/.gitignore +++ b/.gitignore @@ -21,3 +21,10 @@ env/ # Env / OS .env .DS_Store + +# Local dev notes / scratch — explainers, plans, idea logs, experiment journals. +.scratch/ + +# Internal slot-claim metadata written by claim_slot.sh for cross-session +# coordination (session id + heartbeat). Not for upstream. +submissions/*/.CLAIMED diff --git a/MAINTAINING.md b/MAINTAINING.md new file mode 100644 index 0000000..c68abd9 --- /dev/null +++ b/MAINTAINING.md @@ -0,0 +1,54 @@ +# Maintaining the leaderboard + +Notes for whoever has push access to `cybertronai/wikitext`. + +## Branching + +- **`main`** — stable. Every row of `README.md`'s Record History was scored + under the same setup. +- **`dev`** — staging. Feature PRs (new submissions, new paradigms, harness + tweaks) target `dev` and merge as soon as review is green. +- **`dev` → `main`** promotion PRs happen on a slower cadence, only when + `dev` is internally consistent (see re-run rule below). + +## The setup-change re-run rule + +If a PR changes anything that can move where existing submissions land on +the leaderboard, the **prior leaderboard rows in `README.md` must be re-run +on the new setup before that PR merges to `main`**. Otherwise the +half-old/half-new comparison is meaningless. + +| Change | Triggers re-run? | +|---|---| +| `EnergyMeter` semantics, idle-baseline default, scoring formula | **Yes** | +| Hardware pin (PCIe ↔ SXM4, A100 ↔ H100) | **Yes** | +| `MAX_TRAIN_SECONDS`, `ACC_MIN`, eval window | **Yes** | +| Container-image bump with numerical drift | **Maybe** — re-run if anything visibly drifts | +| New submission, doc/typo, `.scratch/`, internal refactor | No | +| Additive optional field on `result.json` (existing semantics intact) | No — but new field is `null` on old entries; mention in PR | + +When in doubt, re-run. ~$0.50/submission on Modal A100 is cheaper than a +broken leaderboard. + +## Process + +1. Land the setup change on a branch (typically targeting `dev`); don't merge yet. +2. Re-run the rows currently in `README.md`'s Record History on the new + harness — `python submit.py submissions/ --yes`, fire in parallel + (Modal cap: 10 concurrent). +3. When `result.json` files all reflect the new setup, append the re-run + rows to `README.md` (old rows stay as history) and add a dated banner + above the table noting the schema change. +4. Restate the leaderboard table in the promotion PR body, confirming all + rows shown are under the new setup. Then merge. + +Don't: ship a half-new/half-old table; claim a new leader without re-running +the priors; silently overwrite old `result.json` files without a banner in +`README.md`. + +## Reference: setup-change events + +| Date | Change | PR | Re-ran upstream? | +|---|---|---|---| +| 2026-05-18 | Hardware pin: SXM4 → PCIe A100-80GB | (n/a) | partial — older SXM4 rows kept as history | +| 2026-05-19 | `EnergyMeter` gains `cpu_energy_J` + `total_energy_J` via CodeCarbon | #4 | yes — `lwta_k2`, `lwta_k4`, `modded_nanogpt` re-run | diff --git a/README.md b/README.md index cd99cd5..ce37b6c 100644 --- a/README.md +++ b/README.md @@ -23,6 +23,8 @@ python submit.py submissions/modded_nanogpt ## Record History +The `Energy (J)` column reports **`total_energy_J`** (GPU NVML net of idle baseline + CodeCarbon CPU estimate, floored at `duration_s × 50 W`) for rows dated **2026-05-20 and later**. Earlier rows report the prior NVML-only `training_energy_J`. The semantic change is the new total-system-energy rule per @yaroslavvb2's Telegram note; see `MAINTAINING.md` and the `EnergyMeter` source for details. Upstream-leaderboard rows from before the change have been re-run under the new harness — those re-runs appear below as the canonical entries for those submissions; the original rows are preserved for history. + | Date | Energy (J) | Val char-acc | GPU | Config | Submission | Contributor | |------|-----------:|-------------:|-----|--------|------------|-------------| | 2026-05-12 | 51,704 | 0.7374 | A100 80GB PCIe | modded_nanogpt | [dir](submissions/modded_nanogpt) | @KellerJordan | @@ -33,6 +35,9 @@ python submit.py submissions/modded_nanogpt | 2026-05-18 | 3,612 | DQ | A100 80GB PCIe | chunker_d1 | [dir](research/catalog/new_directions/chunker_d1) | @ab-10 | | 2026-05-18 | 735 | DQ | A100 80GB PCIe | ppm_c | [dir](research/catalog/new_directions/ppm_c) | @ab-10 | | 2026-05-17 | 70 | DQ | A100 80GB SXM4 | P2-A_random_projection | [dir](research/forward-forward-deep/runs/phase2/P2-A_random_projection) | @ab-10 | +| 2026-05-20 | 53,683 | 0.7246 | A100 80GB PCIe | lwta_k4 | [dir](submissions/lwta_k4) | @ab-10 (re-run on new harness; total_J = 44,329 gpu + 9,354 cpu) | +| 2026-05-20 | 54,614 | 0.7145 | A100 80GB PCIe | lwta_k2 | [dir](submissions/lwta_k2) | @ab-10 (re-run on new harness; total_J = 44,583 gpu + 10,031 cpu) | +| 2026-05-20 | 66,747 | DQ | A100 80GB SXM4 | modded_nanogpt | [dir](submissions/modded_nanogpt) | @ab-10 (re-run on new harness landed on SXM4 and hit 300 s cap; re-running) | ## Rules @@ -64,3 +69,12 @@ For an internal-BPE submission, `predict()` returns `P(next_char | observed_char [^1]: More energy efficient [^2]: As of writing this +| 2026-05-21 | 3,092 | 0.7050 | gpu_ngram_w31_k11 | [dir](submissions/gpu_ngram_w31_k11) | @follow-up-paq-prediction | +| 2026-05-21 | 2,474 | 0.7031 | subset_70_mkn | [dir](submissions/subset_70_mkn) | @exp-batch-iter4 | +| 2026-05-21 | 4,607 | 0.7047 | paq_mixer_v3 | [dir](submissions/paq_mixer_v3) | @worker-paq-mixer | +| 2026-05-21 | 8,602 | 0.7184 | gpu_ngram_o14_xorfix | [dir](submissions/gpu_ngram_o14_xorfix) | @subagent-xorfix-2026-05-19 | +| 2026-05-21 | 14,578 | 0.7184 | deep_backoff_kn | [dir](submissions/deep_backoff_kn) | @nakajimagabriel | +| 2026-05-21 | 9,591 | 0.7063 | chunker_phase1_v1 | [dir](submissions/chunker_phase1_v1) | @explore-chunker-2026-05-19 | +| 2026-05-21 | 19,922 | 0.7328 | lwta_k4_alpha_065 | [dir](submissions/lwta_k4_alpha_065) | @subagent-L2clean-2026-05-19 | +| 2026-05-21 | 20,743 | 0.7390 | alpha_06 | [dir](submissions/alpha_06) | @subagent-xorfix-2026-05-19 | +| 2026-05-21 | 62,006 | 0.7337 | modded_nanogpt | [dir](submissions/modded_nanogpt) | @ab-10 | diff --git a/requirements.txt b/requirements.txt index 0eef3e7..827b6f6 100644 --- a/requirements.txt +++ b/requirements.txt @@ -7,3 +7,9 @@ modal>=0.66 # Optional: tests run with stdlib if pytest is missing, but `pytest # test_wikitext.py` gives nicer output. pytest +# CodeCarbon: CPU energy estimation backend for EnergyMeter's +# total_energy_J field. EnergyMeter reads ``tracker._total_cpu_energy`` +# after stop, which is internal to CodeCarbon — pin a minor range to +# keep that path stable. Required on the leaderboard (raises if NVML is +# available and this isn't); optional on dev boxes without a GPU. +codecarbon~=3.2 diff --git a/run_eval.py b/run_eval.py index 270332e..c9e5806 100644 --- a/run_eval.py +++ b/run_eval.py @@ -128,6 +128,8 @@ def main() -> None: "max_train_seconds": args.max_train_seconds, "training_energy_J": m.energy_joules if m is not None else None, "training_duration_s": m.duration_s if m is not None else None, + "cpu_energy_J": m.cpu_energy_J if m is not None else None, + "total_energy_J": m.total_energy_J if m is not None else None, "gpu_name": _gpu_name(), "date_utc": _utc_now(), } @@ -165,6 +167,8 @@ def main() -> None: "val_chars": val_result.n_chars, "training_energy_J": m.energy_joules, "training_duration_s": m.duration_s, + "cpu_energy_J": m.cpu_energy_J, + "total_energy_J": m.total_energy_J, "gpu_name": _gpu_name(), "date_utc": _utc_now(), } @@ -187,6 +191,8 @@ def main() -> None: "submission": submission_name, "training_energy_J": m.energy_joules, "training_duration_s": m.duration_s, + "cpu_energy_J": m.cpu_energy_J, + "total_energy_J": m.total_energy_J, "val_char_accuracy": val_result.accuracy, "val_chars": val_result.n_chars, "gpu_name": _gpu_name(), diff --git a/submissions/alpha_06/nvml.json b/submissions/alpha_06/nvml.json index 61cc54d..cb06cb5 100644 --- a/submissions/alpha_06/nvml.json +++ b/submissions/alpha_06/nvml.json @@ -2,10 +2,10 @@ "nvml_available": true, "energy_counter_supported": true, "monotonic": true, - "idle_watts": 65.36740677966102, - "stress_watts_avg": 352.24618801540856, - "stress_energy_joules": 13133.406, - "stress_duration_s": 37.284735639000004, + "idle_watts": 63.85837288135594, + "stress_watts_avg": 352.31625628183394, + "stress_energy_joules": 13073.966, + "stress_duration_s": 37.108608436, "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": [] } diff --git a/submissions/alpha_06/result.json b/submissions/alpha_06/result.json index 5e95fea..b96b4ca 100644 --- a/submissions/alpha_06/result.json +++ b/submissions/alpha_06/result.json @@ -1,19 +1,21 @@ { "submission": "alpha_06", - "training_energy_J": 14731.7458852, - "training_duration_s": 140.096942296, - "val_char_accuracy": 0.7405, + "training_energy_J": 14613.997913750001, + "training_duration_s": 144.859441725, + "cpu_energy_J": 6129.255896584997, + "total_energy_J": 20743.253810334998, + "val_char_accuracy": 0.7390333333333333, "val_chars": 60000, "gpu_name": "NVIDIA A100-SXM4-80GB", - "date_utc": "2026-05-20T01:55:05Z", + "date_utc": "2026-05-21T05:29:17Z", "_nvml": { "nvml_available": true, "energy_counter_supported": true, "monotonic": true, - "idle_watts": 65.36740677966102, - "stress_watts_avg": 352.24618801540856, - "stress_energy_joules": 13133.406, - "stress_duration_s": 37.284735639000004, + "idle_watts": 63.85837288135594, + "stress_watts_avg": 352.31625628183394, + "stress_energy_joules": 13073.966, + "stress_duration_s": 37.108608436, "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": [] }, diff --git a/submissions/alpha_06/run.log b/submissions/alpha_06/run.log index 12d4a02..ecb9a9d 100644 --- a/submissions/alpha_06/run.log +++ b/submissions/alpha_06/run.log @@ -1,25 +1,25 @@ -# wikitext submit.py log — alpha_06 — 2026-05-20T01:45:46+00:00Z +# wikitext submit.py log — alpha_06 — 2026-05-21T05:19:18+00:00Z [modal] launching A100-80GB ... ✓ Initialized. View run at -https://modal.com/apps/gabriel-nakajima-an/main/ap-Msdp1r91xRTCvRAxIaShM8 +https://modal.com/apps/gabriel-nakajima-an/main/ap-UUVKwxlYo8DV3G5hw1WK3q ✓ Created objects. ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/submit.py +├── 🔨 Created mount /Users/naka/src/sutro/wikitext/verify_nvml.py ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/task.py ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/run_eval.py -├── 🔨 Created mount /Users/naka/src/sutro/wikitext/verify_nvml.py ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/wikitext.py └── 🔨 Created function run_submission. [modal] verifying NVML energy counter ... GPU: NVIDIA A100-SXM4-80GB sampling idle power for 3s ... - idle: 65.4 W + idle: 63.9 W running 30s stress workload ... - duration: 37.3 s - energy delta: 13,133.4 J - avg power: 352.2 W + duration: 37.1 s + energy delta: 13,074.0 J + avg power: 352.3 W monotonic: True --- -{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 65.36740677966102, "stress_watts_avg": 352.24618801540856, "stress_energy_joules": 13133.406, "stress_duration_s": 37.284735639000004, "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": []} +{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 63.85837288135594, "stress_watts_avg": 352.31625628183394, "stress_energy_joules": 13073.966, "stress_duration_s": 37.108608436, "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": []} [modal] running submission (TEST_CHARS=60000 MAX_TRAIN_SECONDS=300.0 ACC_MIN=0.7) ... loading WikiText-103 from /data ... train chars: 540,095,682 @@ -27,116 +27,119 @@ loading WikiText-103 from /data ... train wall-clock cap: 300 s val accuracy floor : 0.7000 training submission /workspace/alpha_06.py ... +[codecarbon WARNING @ 05:20:11] Multiple instances of codecarbon are allowed to run at the same time. [clean_w31] starting GPU KN build; max_order=12 D=0.5 [clean_w31] top order=12 unique pairs: 157,942,722 2.5s -[clean_w31] ctx_len=11 ctxs=119,285,712 24.2s -[clean_w31] ctx_len=10 ctxs=84,282,364 17.4s -[clean_w31] ctx_len=9 ctxs=54,720,376 11.1s -[clean_w31] ctx_len=8 ctxs=31,924,091 6.6s -[clean_w31] ctx_len=7 ctxs=16,284,921 3.5s -[clean_w31] ctx_len=6 ctxs=7,016,442 1.7s +[clean_w31] ctx_len=11 ctxs=119,285,712 25.5s +[clean_w31] ctx_len=10 ctxs=84,282,364 17.6s +[clean_w31] ctx_len=9 ctxs=54,720,376 12.0s +[clean_w31] ctx_len=8 ctxs=31,924,091 7.1s +[clean_w31] ctx_len=7 ctxs=16,284,921 3.6s +[clean_w31] ctx_len=6 ctxs=7,016,442 1.6s [clean_w31] ctx_len=5 ctxs=2,438,281 0.6s [clean_w31] ctx_len=4 ctxs=637,143 0.1s [clean_w31] ctx_len=3 ctxs=122,882 0.0s [clean_w31] ctx_len=2 ctxs=12,282 0.0s [clean_w31] ctx_len=1 ctxs=204 0.0s [clean_w31] ctx_len=0 ctxs=1 0.0s -[clean_w31] KN build done: 67.8s +[clean_w31] KN build done: 70.7s [clean_w31] NN 3.29M params cfg=TrainConfig(d=256 L=4 H=4 bs=32 T=1024 steps=1200) [clean_w31] NN step 0/1200 loss 5.5452 elapsed 1s -[clean_w31] NN step 100/1200 loss 1.8056 elapsed 7s -[clean_w31] NN step 200/1200 loss 1.4371 elapsed 12s -[clean_w31] NN step 300/1200 loss 1.4222 elapsed 18s -[clean_w31] NN step 400/1200 loss 1.3516 elapsed 24s -[clean_w31] NN step 500/1200 loss 1.2951 elapsed 29s -[clean_w31] NN step 600/1200 loss 1.2552 elapsed 35s -[clean_w31] NN step 700/1200 loss 1.2157 elapsed 41s -[clean_w31] NN step 800/1200 loss 1.1424 elapsed 46s -[clean_w31] NN step 900/1200 loss 1.1424 elapsed 52s -[clean_w31] NN step 1000/1200 loss 1.1414 elapsed 58s -[clean_w31] NN step 1100/1200 loss 1.1226 elapsed 63s -[clean_w31] NN step 1199/1200 loss 1.1011 elapsed 69s -training: 14,731.7 J duration=140.1s +[clean_w31] NN step 100/1200 loss 1.7487 elapsed 7s +[clean_w31] NN step 200/1200 loss 1.4373 elapsed 12s +[clean_w31] NN step 300/1200 loss 1.3942 elapsed 18s +[clean_w31] NN step 400/1200 loss 1.3113 elapsed 24s +[clean_w31] NN step 500/1200 loss 1.3140 elapsed 30s +[clean_w31] NN step 600/1200 loss 1.2792 elapsed 36s +[clean_w31] NN step 700/1200 loss 1.2438 elapsed 41s +[clean_w31] NN step 800/1200 loss 1.1484 elapsed 47s +[clean_w31] NN step 900/1200 loss 1.1466 elapsed 53s +[clean_w31] NN step 1000/1200 loss 1.1834 elapsed 59s +[clean_w31] NN step 1100/1200 loss 1.1053 elapsed 65s +[clean_w31] NN step 1199/1200 loss 1.1100 elapsed 71s +training: 14,614.0 J duration=144.9s evaluating on val split ... - eval 1,200/60,000 ( 2.0%) acc=0.7300 164 char/s eta= 359s - eval 2,400/60,000 ( 4.0%) acc=0.7167 166 char/s eta= 347s - eval 3,600/60,000 ( 6.0%) acc=0.7167 166 char/s eta= 339s - eval 4,800/60,000 ( 8.0%) acc=0.7260 165 char/s eta= 336s - eval 6,000/60,000 ( 10.0%) acc=0.7230 156 char/s eta= 346s - eval 7,200/60,000 ( 12.0%) acc=0.7190 158 char/s eta= 334s - eval 8,400/60,000 ( 14.0%) acc=0.7189 159 char/s eta= 325s - eval 9,600/60,000 ( 16.0%) acc=0.7250 160 char/s eta= 316s - eval 10,800/60,000 ( 18.0%) acc=0.7304 160 char/s eta= 307s - eval 12,000/60,000 ( 20.0%) acc=0.7304 161 char/s eta= 298s - eval 13,200/60,000 ( 22.0%) acc=0.7347 161 char/s eta= 290s - eval 14,400/60,000 ( 24.0%) acc=0.7361 162 char/s eta= 282s - eval 15,600/60,000 ( 26.0%) acc=0.7383 162 char/s eta= 274s - eval 16,800/60,000 ( 28.0%) acc=0.7412 162 char/s eta= 266s - eval 18,000/60,000 ( 30.0%) acc=0.7422 163 char/s eta= 258s - eval 19,200/60,000 ( 32.0%) acc=0.7455 163 char/s eta= 250s - eval 20,400/60,000 ( 34.0%) acc=0.7473 163 char/s eta= 243s - eval 21,600/60,000 ( 36.0%) acc=0.7475 163 char/s eta= 235s - eval 22,800/60,000 ( 38.0%) acc=0.7479 163 char/s eta= 228s - eval 24,000/60,000 ( 40.0%) acc=0.7473 163 char/s eta= 220s - eval 25,200/60,000 ( 42.0%) acc=0.7475 164 char/s eta= 213s - eval 26,400/60,000 ( 44.0%) acc=0.7485 164 char/s eta= 205s - eval 27,600/60,000 ( 46.0%) acc=0.7479 164 char/s eta= 198s - eval 28,800/60,000 ( 48.0%) acc=0.7487 164 char/s eta= 190s - eval 30,000/60,000 ( 50.0%) acc=0.7482 164 char/s eta= 183s - eval 31,200/60,000 ( 52.0%) acc=0.7457 164 char/s eta= 176s - eval 32,400/60,000 ( 54.0%) acc=0.7447 164 char/s eta= 169s - eval 33,600/60,000 ( 56.0%) acc=0.7423 162 char/s eta= 163s - eval 34,800/60,000 ( 58.0%) acc=0.7427 161 char/s eta= 156s - eval 36,000/60,000 ( 60.0%) acc=0.7429 161 char/s eta= 149s - eval 37,200/60,000 ( 62.0%) acc=0.7428 161 char/s eta= 141s - eval 38,400/60,000 ( 64.0%) acc=0.7429 161 char/s eta= 134s - eval 39,600/60,000 ( 66.0%) acc=0.7424 161 char/s eta= 126s - eval 40,800/60,000 ( 68.0%) acc=0.7417 162 char/s eta= 119s - eval 42,000/60,000 ( 70.0%) acc=0.7409 162 char/s eta= 111s - eval 43,200/60,000 ( 72.0%) acc=0.7410 162 char/s eta= 104s - eval 44,400/60,000 ( 74.0%) acc=0.7407 162 char/s eta= 96s - eval 45,600/60,000 ( 76.0%) acc=0.7405 162 char/s eta= 89s - eval 46,800/60,000 ( 78.0%) acc=0.7397 162 char/s eta= 81s - eval 48,000/60,000 ( 80.0%) acc=0.7398 162 char/s eta= 74s - eval 49,200/60,000 ( 82.0%) acc=0.7395 163 char/s eta= 66s - eval 50,400/60,000 ( 84.0%) acc=0.7402 163 char/s eta= 59s - eval 51,600/60,000 ( 86.0%) acc=0.7403 163 char/s eta= 52s - eval 52,800/60,000 ( 88.0%) acc=0.7398 163 char/s eta= 44s - eval 54,000/60,000 ( 90.0%) acc=0.7397 163 char/s eta= 37s - eval 55,200/60,000 ( 92.0%) acc=0.7389 163 char/s eta= 29s - eval 56,400/60,000 ( 94.0%) acc=0.7387 163 char/s eta= 22s - eval 57,600/60,000 ( 96.0%) acc=0.7390 163 char/s eta= 15s - eval 58,800/60,000 ( 98.0%) acc=0.7397 163 char/s eta= 7s - eval 60,000/60,000 (100.0%) acc=0.7405 163 char/s eta= 0s -chars=60,000 acc=0.7405 eval_duration=368.2s + eval 1,200/60,000 ( 2.0%) acc=0.7258 160 char/s eta= 367s + eval 2,400/60,000 ( 4.0%) acc=0.7146 161 char/s eta= 358s + eval 3,600/60,000 ( 6.0%) acc=0.7156 161 char/s eta= 350s + eval 4,800/60,000 ( 8.0%) acc=0.7248 160 char/s eta= 344s + eval 6,000/60,000 ( 10.0%) acc=0.7212 160 char/s eta= 337s + eval 7,200/60,000 ( 12.0%) acc=0.7168 159 char/s eta= 332s + eval 8,400/60,000 ( 14.0%) acc=0.7161 159 char/s eta= 325s + eval 9,600/60,000 ( 16.0%) acc=0.7234 158 char/s eta= 318s + eval 10,800/60,000 ( 18.0%) acc=0.7300 158 char/s eta= 311s + eval 12,000/60,000 ( 20.0%) acc=0.7307 158 char/s eta= 304s + eval 13,200/60,000 ( 22.0%) acc=0.7347 155 char/s eta= 302s + eval 14,400/60,000 ( 24.0%) acc=0.7371 153 char/s eta= 298s + eval 15,600/60,000 ( 26.0%) acc=0.7385 153 char/s eta= 290s + eval 16,800/60,000 ( 28.0%) acc=0.7408 151 char/s eta= 286s + eval 18,000/60,000 ( 30.0%) acc=0.7421 151 char/s eta= 278s + eval 19,200/60,000 ( 32.0%) acc=0.7450 152 char/s eta= 269s + eval 20,400/60,000 ( 34.0%) acc=0.7468 152 char/s eta= 260s + eval 21,600/60,000 ( 36.0%) acc=0.7468 152 char/s eta= 252s + eval 22,800/60,000 ( 38.0%) acc=0.7466 153 char/s eta= 243s + eval 24,000/60,000 ( 40.0%) acc=0.7462 153 char/s eta= 235s + eval 25,200/60,000 ( 42.0%) acc=0.7463 154 char/s eta= 226s + eval 26,400/60,000 ( 44.0%) acc=0.7470 154 char/s eta= 218s + eval 27,600/60,000 ( 46.0%) acc=0.7465 155 char/s eta= 210s + eval 28,800/60,000 ( 48.0%) acc=0.7469 155 char/s eta= 202s + eval 30,000/60,000 ( 50.0%) acc=0.7458 153 char/s eta= 196s + eval 31,200/60,000 ( 52.0%) acc=0.7433 153 char/s eta= 189s + eval 32,400/60,000 ( 54.0%) acc=0.7421 151 char/s eta= 183s + eval 33,600/60,000 ( 56.0%) acc=0.7401 149 char/s eta= 177s + eval 34,800/60,000 ( 58.0%) acc=0.7405 147 char/s eta= 172s + eval 36,000/60,000 ( 60.0%) acc=0.7405 147 char/s eta= 163s + eval 37,200/60,000 ( 62.0%) acc=0.7408 147 char/s eta= 155s + eval 38,400/60,000 ( 64.0%) acc=0.7405 148 char/s eta= 146s + eval 39,600/60,000 ( 66.0%) acc=0.7404 148 char/s eta= 138s + eval 40,800/60,000 ( 68.0%) acc=0.7398 148 char/s eta= 130s + eval 42,000/60,000 ( 70.0%) acc=0.7391 148 char/s eta= 121s + eval 43,200/60,000 ( 72.0%) acc=0.7390 148 char/s eta= 113s + eval 44,400/60,000 ( 74.0%) acc=0.7387 149 char/s eta= 105s + eval 45,600/60,000 ( 76.0%) acc=0.7386 149 char/s eta= 97s + eval 46,800/60,000 ( 78.0%) acc=0.7379 149 char/s eta= 89s + eval 48,000/60,000 ( 80.0%) acc=0.7381 149 char/s eta= 80s + eval 49,200/60,000 ( 82.0%) acc=0.7378 149 char/s eta= 72s + eval 50,400/60,000 ( 84.0%) acc=0.7388 150 char/s eta= 64s + eval 51,600/60,000 ( 86.0%) acc=0.7390 150 char/s eta= 56s + eval 52,800/60,000 ( 88.0%) acc=0.7381 150 char/s eta= 48s + eval 54,000/60,000 ( 90.0%) acc=0.7381 150 char/s eta= 40s + eval 55,200/60,000 ( 92.0%) acc=0.7372 150 char/s eta= 32s + eval 56,400/60,000 ( 94.0%) acc=0.7371 150 char/s eta= 24s + eval 57,600/60,000 ( 96.0%) acc=0.7375 150 char/s eta= 16s + eval 58,800/60,000 ( 98.0%) acc=0.7383 151 char/s eta= 8s + eval 60,000/60,000 (100.0%) acc=0.7390 151 char/s eta= 0s +chars=60,000 acc=0.7390 eval_duration=397.6s --- submission : alpha_06 -training energy (J): 14,731.7 -training duration : 140.1s -val char-accuracy : 0.7405 +training energy (J): 14,614.0 +training duration : 144.9s +val char-accuracy : 0.7390 val chars : 60,000 wrote /tmp/result.json Stopping app - local entrypoint completed. ✓ App completed. View run at -https://modal.com/apps/gabriel-nakajima-an/main/ap-Msdp1r91xRTCvRAxIaShM8 +https://modal.com/apps/gabriel-nakajima-an/main/ap-UUVKwxlYo8DV3G5hw1WK3q # final result { "submission": "alpha_06", - "training_energy_J": 14731.7458852, - "training_duration_s": 140.096942296, - "val_char_accuracy": 0.7405, + "training_energy_J": 14613.997913750001, + "training_duration_s": 144.859441725, + "cpu_energy_J": 6129.255896584997, + "total_energy_J": 20743.253810334998, + "val_char_accuracy": 0.7390333333333333, "val_chars": 60000, "gpu_name": "NVIDIA A100-SXM4-80GB", - "date_utc": "2026-05-20T01:55:05Z", + "date_utc": "2026-05-21T05:29:17Z", "_nvml": { "nvml_available": true, "energy_counter_supported": true, "monotonic": true, - "idle_watts": 65.36740677966102, - "stress_watts_avg": 352.24618801540856, - "stress_energy_joules": 13133.406, - "stress_duration_s": 37.284735639000004, + "idle_watts": 63.85837288135594, + "stress_watts_avg": 352.31625628183394, + "stress_energy_joules": 13073.966, + "stress_duration_s": 37.108608436, "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": [] }, diff --git a/submissions/chunker_phase1_v1/nvml.json b/submissions/chunker_phase1_v1/nvml.json index 08b4a9f..ed2d072 100644 --- a/submissions/chunker_phase1_v1/nvml.json +++ b/submissions/chunker_phase1_v1/nvml.json @@ -2,10 +2,10 @@ "nvml_available": true, "energy_counter_supported": true, "monotonic": true, - "idle_watts": 62.56199999999998, - "stress_watts_avg": 330.37581396177535, - "stress_energy_joules": 12495.641, - "stress_duration_s": 37.822505377, - "gpu_name": "NVIDIA A100-SXM4-80GB", + "idle_watts": 52.40838333333333, + "stress_watts_avg": 227.61565140099268, + "stress_energy_joules": 8488.717, + "stress_duration_s": 37.29408302, + "gpu_name": "NVIDIA A100 80GB PCIe", "notes": [] } diff --git a/submissions/chunker_phase1_v1/result.json b/submissions/chunker_phase1_v1/result.json index d6d16a9..d95dc4f 100644 --- a/submissions/chunker_phase1_v1/result.json +++ b/submissions/chunker_phase1_v1/result.json @@ -1,20 +1,22 @@ { "submission": "chunker_phase1_v1", - "training_energy_J": 5917.810853299999, - "training_duration_s": 98.94530293400001, - "val_char_accuracy": 0.7057333333333333, + "training_energy_J": 5569.715063649999, + "training_duration_s": 95.017418727, + "cpu_energy_J": 4020.9563191850025, + "total_energy_J": 9590.671382835002, + "val_char_accuracy": 0.7063, "val_chars": 60000, - "gpu_name": "NVIDIA A100-SXM4-80GB", - "date_utc": "2026-05-20T02:02:50Z", + "gpu_name": "NVIDIA A100 80GB PCIe", + "date_utc": "2026-05-21T05:27:51Z", "_nvml": { "nvml_available": true, "energy_counter_supported": true, "monotonic": true, - "idle_watts": 62.56199999999998, - "stress_watts_avg": 330.37581396177535, - "stress_energy_joules": 12495.641, - "stress_duration_s": 37.822505377, - "gpu_name": "NVIDIA A100-SXM4-80GB", + "idle_watts": 52.40838333333333, + "stress_watts_avg": 227.61565140099268, + "stress_energy_joules": 8488.717, + "stress_duration_s": 37.29408302, + "gpu_name": "NVIDIA A100 80GB PCIe", "notes": [] }, "contributor": "@explore-chunker-2026-05-19" diff --git a/submissions/chunker_phase1_v1/run.log b/submissions/chunker_phase1_v1/run.log index da1fdf5..e20d274 100644 --- a/submissions/chunker_phase1_v1/run.log +++ b/submissions/chunker_phase1_v1/run.log @@ -1,7 +1,7 @@ -# wikitext submit.py log — chunker_phase1_v1 — 2026-05-20T01:54:03+00:00Z +# wikitext submit.py log — chunker_phase1_v1 — 2026-05-21T05:19:18+00:00Z [modal] launching A100-80GB ... ✓ Initialized. View run at -https://modal.com/apps/gabriel-nakajima-an/main/ap-cKJi20KhEnnNVSFIPgeSmH +https://modal.com/apps/gabriel-nakajima-an/main/ap-FdmB8quO8669PzaHLSgb5f ✓ Created objects. ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/submit.py ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/task.py @@ -10,16 +10,16 @@ https://modal.com/apps/gabriel-nakajima-an/main/ap-cKJi20KhEnnNVSFIPgeSmH ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/wikitext.py └── 🔨 Created function run_submission. [modal] verifying NVML energy counter ... -GPU: NVIDIA A100-SXM4-80GB +GPU: NVIDIA A100 80GB PCIe sampling idle power for 3s ... - idle: 62.6 W + idle: 52.4 W running 30s stress workload ... - duration: 37.8 s - energy delta: 12,495.6 J - avg power: 330.4 W + duration: 37.3 s + energy delta: 8,488.7 J + avg power: 227.6 W monotonic: True --- -{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 62.56199999999998, "stress_watts_avg": 330.37581396177535, "stress_energy_joules": 12495.641, "stress_duration_s": 37.822505377, "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": []} +{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 52.40838333333333, "stress_watts_avg": 227.61565140099268, "stress_energy_joules": 8488.717, "stress_duration_s": 37.29408302, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": []} [modal] running submission (TEST_CHARS=60000 MAX_TRAIN_SECONDS=300.0 ACC_MIN=0.7) ... loading WikiText-103 from /data ... train chars: 540,095,682 @@ -27,116 +27,119 @@ loading WikiText-103 from /data ... train wall-clock cap: 300 s val accuracy floor : 0.7000 training submission /workspace/chunker_phase1_v1.py ... +[codecarbon WARNING @ 05:20:11] Multiple instances of codecarbon are allowed to run at the same time. [chunker] starting GPU KN build; max_order=12 D=0.5 -[chunker] top order=12 unique pairs: 157,942,722 2.6s -[chunker] ctx_len=11 ctxs=119,285,712 17.4s -[chunker] ctx_len=10 ctxs=84,282,364 12.4s -[chunker] ctx_len=9 ctxs=54,720,376 7.7s -[chunker] ctx_len=8 ctxs=31,924,091 4.6s -[chunker] ctx_len=7 ctxs=16,284,921 2.5s -[chunker] ctx_len=6 ctxs=7,016,442 1.2s -[chunker] ctx_len=5 ctxs=2,438,281 0.5s +[chunker] top order=12 unique pairs: 157,942,722 2.7s +[chunker] ctx_len=11 ctxs=119,285,712 15.6s +[chunker] ctx_len=10 ctxs=84,282,364 12.9s +[chunker] ctx_len=9 ctxs=54,720,376 8.7s +[chunker] ctx_len=8 ctxs=31,924,091 5.3s +[chunker] ctx_len=7 ctxs=16,284,921 2.4s +[chunker] ctx_len=6 ctxs=7,016,442 1.1s +[chunker] ctx_len=5 ctxs=2,438,281 0.6s [chunker] ctx_len=4 ctxs=637,143 0.1s [chunker] ctx_len=3 ctxs=122,882 0.0s [chunker] ctx_len=2 ctxs=12,282 0.0s [chunker] ctx_len=1 ctxs=204 0.0s [chunker] ctx_len=0 ctxs=1 0.0s -[chunker] KN build done: 49.0s +[chunker] KN build done: 49.4s [chunker] computing surprise mask (tau=0.3) ... [chunker] surprise pass k_ctx=4 done -[chunker] surprise computed in 2.6s: p_s = 0.4351 (235,445,737/541,096,898) +[chunker] surprise computed in 2.7s: p_s = 0.4351 (235,445,737/541,096,898) [chunker] H model: 1.88M params, surprise positions: 235,445,737/541,096,898 (43.5%) [chunker] H step 0/800 loss 5.5452 elapsed 1s -[chunker] H step 100/800 loss 2.7588 elapsed 6s -[chunker] H step 200/800 loss 2.6080 elapsed 12s -[chunker] H step 300/800 loss 2.4467 elapsed 17s -[chunker] H step 400/800 loss 2.3904 elapsed 22s -[chunker] H step 500/800 loss 2.3457 elapsed 28s -[chunker] H step 600/800 loss 2.3157 elapsed 33s -[chunker] H step 700/800 loss 2.2688 elapsed 38s -[chunker] H step 799/800 loss 2.2480 elapsed 44s -training: 5,917.8 J duration=98.9s +[chunker] H step 100/800 loss 2.7771 elapsed 6s +[chunker] H step 200/800 loss 2.5867 elapsed 11s +[chunker] H step 300/800 loss 2.4754 elapsed 15s +[chunker] H step 400/800 loss 2.4454 elapsed 20s +[chunker] H step 500/800 loss 2.3815 elapsed 25s +[chunker] H step 600/800 loss 2.3375 elapsed 30s +[chunker] H step 700/800 loss 2.3123 elapsed 35s +[chunker] H step 799/800 loss 2.2933 elapsed 40s +training: 5,569.7 J duration=95.0s evaluating on val split ... - eval 1,200/60,000 ( 2.0%) acc=0.6925 159 char/s eta= 370s - eval 2,400/60,000 ( 4.0%) acc=0.6767 160 char/s eta= 361s - eval 3,600/60,000 ( 6.0%) acc=0.6753 160 char/s eta= 353s - eval 4,800/60,000 ( 8.0%) acc=0.6887 160 char/s eta= 345s - eval 6,000/60,000 ( 10.0%) acc=0.6885 160 char/s eta= 338s - eval 7,200/60,000 ( 12.0%) acc=0.6831 160 char/s eta= 331s - eval 8,400/60,000 ( 14.0%) acc=0.6821 160 char/s eta= 323s - eval 9,600/60,000 ( 16.0%) acc=0.6892 160 char/s eta= 316s - eval 10,800/60,000 ( 18.0%) acc=0.6975 160 char/s eta= 308s - eval 12,000/60,000 ( 20.0%) acc=0.6993 160 char/s eta= 301s - eval 13,200/60,000 ( 22.0%) acc=0.7031 160 char/s eta= 293s - eval 14,400/60,000 ( 24.0%) acc=0.7050 160 char/s eta= 286s - eval 15,600/60,000 ( 26.0%) acc=0.7069 160 char/s eta= 278s - eval 16,800/60,000 ( 28.0%) acc=0.7104 160 char/s eta= 271s - eval 18,000/60,000 ( 30.0%) acc=0.7131 160 char/s eta= 263s - eval 19,200/60,000 ( 32.0%) acc=0.7177 160 char/s eta= 255s - eval 20,400/60,000 ( 34.0%) acc=0.7195 160 char/s eta= 248s - eval 21,600/60,000 ( 36.0%) acc=0.7201 160 char/s eta= 240s - eval 22,800/60,000 ( 38.0%) acc=0.7203 160 char/s eta= 233s - eval 24,000/60,000 ( 40.0%) acc=0.7200 160 char/s eta= 225s - eval 25,200/60,000 ( 42.0%) acc=0.7206 160 char/s eta= 218s - eval 26,400/60,000 ( 44.0%) acc=0.7215 160 char/s eta= 210s - eval 27,600/60,000 ( 46.0%) acc=0.7197 160 char/s eta= 203s - eval 28,800/60,000 ( 48.0%) acc=0.7196 160 char/s eta= 195s - eval 30,000/60,000 ( 50.0%) acc=0.7183 160 char/s eta= 188s - eval 31,200/60,000 ( 52.0%) acc=0.7150 160 char/s eta= 180s - eval 32,400/60,000 ( 54.0%) acc=0.7129 160 char/s eta= 173s - eval 33,600/60,000 ( 56.0%) acc=0.7103 160 char/s eta= 165s - eval 34,800/60,000 ( 58.0%) acc=0.7107 160 char/s eta= 158s - eval 36,000/60,000 ( 60.0%) acc=0.7107 160 char/s eta= 150s - eval 37,200/60,000 ( 62.0%) acc=0.7109 159 char/s eta= 143s - eval 38,400/60,000 ( 64.0%) acc=0.7111 159 char/s eta= 136s - eval 39,600/60,000 ( 66.0%) acc=0.7101 159 char/s eta= 128s - eval 40,800/60,000 ( 68.0%) acc=0.7096 159 char/s eta= 121s - eval 42,000/60,000 ( 70.0%) acc=0.7085 159 char/s eta= 113s - eval 43,200/60,000 ( 72.0%) acc=0.7078 159 char/s eta= 106s - eval 44,400/60,000 ( 74.0%) acc=0.7078 159 char/s eta= 98s - eval 45,600/60,000 ( 76.0%) acc=0.7075 159 char/s eta= 91s - eval 46,800/60,000 ( 78.0%) acc=0.7068 159 char/s eta= 83s - eval 48,000/60,000 ( 80.0%) acc=0.7066 159 char/s eta= 75s - eval 49,200/60,000 ( 82.0%) acc=0.7058 159 char/s eta= 68s - eval 50,400/60,000 ( 84.0%) acc=0.7060 159 char/s eta= 60s - eval 51,600/60,000 ( 86.0%) acc=0.7060 159 char/s eta= 53s - eval 52,800/60,000 ( 88.0%) acc=0.7046 159 char/s eta= 45s - eval 54,000/60,000 ( 90.0%) acc=0.7045 159 char/s eta= 38s - eval 55,200/60,000 ( 92.0%) acc=0.7040 159 char/s eta= 30s - eval 56,400/60,000 ( 94.0%) acc=0.7034 159 char/s eta= 23s - eval 57,600/60,000 ( 96.0%) acc=0.7038 159 char/s eta= 15s - eval 58,800/60,000 ( 98.0%) acc=0.7044 159 char/s eta= 8s - eval 60,000/60,000 (100.0%) acc=0.7057 159 char/s eta= 0s -chars=60,000 acc=0.7057 eval_duration=376.7s + eval 1,200/60,000 ( 2.0%) acc=0.6908 172 char/s eta= 342s + eval 2,400/60,000 ( 4.0%) acc=0.6767 172 char/s eta= 335s + eval 3,600/60,000 ( 6.0%) acc=0.6742 167 char/s eta= 338s + eval 4,800/60,000 ( 8.0%) acc=0.6879 166 char/s eta= 333s + eval 6,000/60,000 ( 10.0%) acc=0.6875 165 char/s eta= 327s + eval 7,200/60,000 ( 12.0%) acc=0.6833 165 char/s eta= 319s + eval 8,400/60,000 ( 14.0%) acc=0.6813 166 char/s eta= 312s + eval 9,600/60,000 ( 16.0%) acc=0.6887 166 char/s eta= 304s + eval 10,800/60,000 ( 18.0%) acc=0.6974 166 char/s eta= 296s + eval 12,000/60,000 ( 20.0%) acc=0.6994 166 char/s eta= 289s + eval 13,200/60,000 ( 22.0%) acc=0.7030 166 char/s eta= 282s + eval 14,400/60,000 ( 24.0%) acc=0.7051 166 char/s eta= 275s + eval 15,600/60,000 ( 26.0%) acc=0.7069 166 char/s eta= 267s + eval 16,800/60,000 ( 28.0%) acc=0.7100 166 char/s eta= 260s + eval 18,000/60,000 ( 30.0%) acc=0.7128 166 char/s eta= 253s + eval 19,200/60,000 ( 32.0%) acc=0.7178 166 char/s eta= 246s + eval 20,400/60,000 ( 34.0%) acc=0.7195 166 char/s eta= 239s + eval 21,600/60,000 ( 36.0%) acc=0.7203 166 char/s eta= 231s + eval 22,800/60,000 ( 38.0%) acc=0.7206 166 char/s eta= 224s + eval 24,000/60,000 ( 40.0%) acc=0.7207 166 char/s eta= 217s + eval 25,200/60,000 ( 42.0%) acc=0.7212 166 char/s eta= 210s + eval 26,400/60,000 ( 44.0%) acc=0.7223 166 char/s eta= 203s + eval 27,600/60,000 ( 46.0%) acc=0.7203 166 char/s eta= 195s + eval 28,800/60,000 ( 48.0%) acc=0.7201 166 char/s eta= 188s + eval 30,000/60,000 ( 50.0%) acc=0.7186 166 char/s eta= 181s + eval 31,200/60,000 ( 52.0%) acc=0.7152 166 char/s eta= 174s + eval 32,400/60,000 ( 54.0%) acc=0.7130 166 char/s eta= 166s + eval 33,600/60,000 ( 56.0%) acc=0.7103 166 char/s eta= 159s + eval 34,800/60,000 ( 58.0%) acc=0.7110 166 char/s eta= 152s + eval 36,000/60,000 ( 60.0%) acc=0.7110 166 char/s eta= 145s + eval 37,200/60,000 ( 62.0%) acc=0.7113 166 char/s eta= 137s + eval 38,400/60,000 ( 64.0%) acc=0.7115 166 char/s eta= 130s + eval 39,600/60,000 ( 66.0%) acc=0.7106 166 char/s eta= 123s + eval 40,800/60,000 ( 68.0%) acc=0.7101 166 char/s eta= 115s + eval 42,000/60,000 ( 70.0%) acc=0.7090 166 char/s eta= 108s + eval 43,200/60,000 ( 72.0%) acc=0.7082 166 char/s eta= 101s + eval 44,400/60,000 ( 74.0%) acc=0.7082 167 char/s eta= 94s + eval 45,600/60,000 ( 76.0%) acc=0.7081 166 char/s eta= 87s + eval 46,800/60,000 ( 78.0%) acc=0.7075 166 char/s eta= 79s + eval 48,000/60,000 ( 80.0%) acc=0.7072 166 char/s eta= 72s + eval 49,200/60,000 ( 82.0%) acc=0.7065 167 char/s eta= 65s + eval 50,400/60,000 ( 84.0%) acc=0.7068 167 char/s eta= 58s + eval 51,600/60,000 ( 86.0%) acc=0.7067 167 char/s eta= 50s + eval 52,800/60,000 ( 88.0%) acc=0.7051 166 char/s eta= 43s + eval 54,000/60,000 ( 90.0%) acc=0.7049 166 char/s eta= 36s + eval 55,200/60,000 ( 92.0%) acc=0.7045 166 char/s eta= 29s + eval 56,400/60,000 ( 94.0%) acc=0.7039 166 char/s eta= 22s + eval 57,600/60,000 ( 96.0%) acc=0.7044 166 char/s eta= 14s + eval 58,800/60,000 ( 98.0%) acc=0.7051 166 char/s eta= 7s + eval 60,000/60,000 (100.0%) acc=0.7063 166 char/s eta= 0s +chars=60,000 acc=0.7063 eval_duration=360.4s --- submission : chunker_phase1_v1 -training energy (J): 5,917.8 -training duration : 98.9s -val char-accuracy : 0.7057 +training energy (J): 5,569.7 +training duration : 95.0s +val char-accuracy : 0.7063 val chars : 60,000 wrote /tmp/result.json Stopping app - local entrypoint completed. ✓ App completed. View run at -https://modal.com/apps/gabriel-nakajima-an/main/ap-cKJi20KhEnnNVSFIPgeSmH +https://modal.com/apps/gabriel-nakajima-an/main/ap-FdmB8quO8669PzaHLSgb5f # final result { "submission": "chunker_phase1_v1", - "training_energy_J": 5917.810853299999, - "training_duration_s": 98.94530293400001, - "val_char_accuracy": 0.7057333333333333, + "training_energy_J": 5569.715063649999, + "training_duration_s": 95.017418727, + "cpu_energy_J": 4020.9563191850025, + "total_energy_J": 9590.671382835002, + "val_char_accuracy": 0.7063, "val_chars": 60000, - "gpu_name": "NVIDIA A100-SXM4-80GB", - "date_utc": "2026-05-20T02:02:50Z", + "gpu_name": "NVIDIA A100 80GB PCIe", + "date_utc": "2026-05-21T05:27:51Z", "_nvml": { "nvml_available": true, "energy_counter_supported": true, "monotonic": true, - "idle_watts": 62.56199999999998, - "stress_watts_avg": 330.37581396177535, - "stress_energy_joules": 12495.641, - "stress_duration_s": 37.822505377, - "gpu_name": "NVIDIA A100-SXM4-80GB", + "idle_watts": 52.40838333333333, + "stress_watts_avg": 227.61565140099268, + "stress_energy_joules": 8488.717, + "stress_duration_s": 37.29408302, + "gpu_name": "NVIDIA A100 80GB PCIe", "notes": [] }, "contributor": "@explore-chunker-2026-05-19" diff --git a/submissions/deep_backoff_kn/nvml.json b/submissions/deep_backoff_kn/nvml.json index d56941e..a93be86 100644 --- a/submissions/deep_backoff_kn/nvml.json +++ b/submissions/deep_backoff_kn/nvml.json @@ -2,10 +2,10 @@ "nvml_available": true, "energy_counter_supported": true, "monotonic": true, - "idle_watts": 63.14000000000002, - "stress_watts_avg": 339.34362349669493, - "stress_energy_joules": 12477.219, - "stress_duration_s": 36.768685592, - "gpu_name": "NVIDIA A100-SXM4-80GB", + "idle_watts": 52.31990000000003, + "stress_watts_avg": 231.05724561444927, + "stress_energy_joules": 8643.325, + "stress_duration_s": 37.407721091, + "gpu_name": "NVIDIA A100 80GB PCIe", "notes": [] } diff --git a/submissions/deep_backoff_kn/result.json b/submissions/deep_backoff_kn/result.json index 68c5005..934a739 100644 --- a/submissions/deep_backoff_kn/result.json +++ b/submissions/deep_backoff_kn/result.json @@ -1,23 +1,22 @@ { "submission": "deep_backoff_kn", - "disqualified": true, - "reason": "train_time_exceeded", - "max_train_seconds": 300.0, - "training_energy_J": 4789.383014900002, - "training_duration_s": 300.091439702, - "cpu_energy_J": 12692.014912755005, - "total_energy_J": 17481.39792765501, - "gpu_name": "NVIDIA A100-SXM4-80GB", - "date_utc": "2026-05-20T07:15:05Z", + "training_energy_J": 962.9188647999999, + "training_duration_s": 291.568102704, + "cpu_energy_J": 12338.053218035013, + "total_energy_J": 14578.4051352, + "val_char_accuracy": 0.7184166666666667, + "val_chars": 60000, + "gpu_name": "NVIDIA A100 80GB PCIe", + "date_utc": "2026-05-21T05:09:09Z", "_nvml": { "nvml_available": true, "energy_counter_supported": true, "monotonic": true, - "idle_watts": 63.14000000000002, - "stress_watts_avg": 339.34362349669493, - "stress_energy_joules": 12477.219, - "stress_duration_s": 36.768685592, - "gpu_name": "NVIDIA A100-SXM4-80GB", + "idle_watts": 52.31990000000003, + "stress_watts_avg": 231.05724561444927, + "stress_energy_joules": 8643.325, + "stress_duration_s": 37.407721091, + "gpu_name": "NVIDIA A100 80GB PCIe", "notes": [] }, "contributor": "@nakajimagabriel" diff --git a/submissions/deep_backoff_kn/run.log b/submissions/deep_backoff_kn/run.log index 5bd4715..5b64b23 100644 --- a/submissions/deep_backoff_kn/run.log +++ b/submissions/deep_backoff_kn/run.log @@ -1,25 +1,25 @@ -# wikitext submit.py log — deep_backoff_kn — 2026-05-20T07:08:43+00:00Z +# wikitext submit.py log — deep_backoff_kn — 2026-05-21T05:03:02+00:00Z [modal] launching A100-80GB ... ✓ Initialized. View run at -https://modal.com/apps/gabriel-nakajima-an/main/ap-45b2NtjIL0LErZ1xaqrUeX +https://modal.com/apps/gabriel-nakajima-an/main/ap-1cZpQht7xa0YYz3oXegD83 ✓ Created objects. ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/submit.py ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/task.py -├── 🔨 Created mount /Users/naka/src/sutro/wikitext/run_eval.py ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/verify_nvml.py +├── 🔨 Created mount /Users/naka/src/sutro/wikitext/run_eval.py ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/wikitext.py └── 🔨 Created function run_submission. [modal] verifying NVML energy counter ... -GPU: NVIDIA A100-SXM4-80GB +GPU: NVIDIA A100 80GB PCIe sampling idle power for 3s ... - idle: 63.1 W + idle: 52.3 W running 30s stress workload ... - duration: 36.8 s - energy delta: 12,477.2 J - avg power: 339.3 W + duration: 37.4 s + energy delta: 8,643.3 J + avg power: 231.1 W monotonic: True --- -{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 63.14000000000002, "stress_watts_avg": 339.34362349669493, "stress_energy_joules": 12477.219, "stress_duration_s": 36.768685592, "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": []} +{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 52.31990000000003, "stress_watts_avg": 231.05724561444927, "stress_energy_joules": 8643.325, "stress_duration_s": 37.407721091, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": []} [modal] running submission (TEST_CHARS=60000 MAX_TRAIN_SECONDS=300.0 ACC_MIN=0.7) ... loading WikiText-103 from /data ... train chars: 540,095,682 @@ -27,129 +27,109 @@ loading WikiText-103 from /data ... train wall-clock cap: 300 s val accuracy floor : 0.7000 training submission /workspace/deep_backoff_kn.py ... -[codecarbon WARNING @ 07:10:00] Multiple instances of codecarbon are allowed to run at the same time. +[codecarbon WARNING @ 05:03:59] Multiple instances of codecarbon are allowed to run at the same time. [deep-backoff-kn] starting build; max_ctx_len=13 D=0.5 -[deep-backoff-kn] encoded train: 541,096,898 bytes (0.7s)[[deep-backoff-kn] np.unique k=14: 238,387,519 pairs 113.0s (n_workers=auto) -[deep-backoff-kn] order=14 ctx_len=13 ctxs=198,300,622 rows=238,387,519 18.2s -[deep-backoff-kn] order=13 ctx_len=12 ctxs=157,942,721 rows=198,300,621 6045.7 MB 49.7s -[deep-backoff-kn] order=12 ctx_len=11 ctxs=119,285,711 rows=157,942,720 4487.6 MB 39.6s -[deep-backoff-kn] order=11 ctx_len=10 ctxs= 84,282,363 rows=119,285,710 3124.9 MB 29.6s -[deep-backoff-kn] order=10 ctx_len= 9 ctxs= 54,720,376 rows= 84,282,363 2008.3 MB 21.5s -[deep-backoff-kn] order= 9 ctx_len= 8 ctxs= 31,924,091 rows= 54,720,376 1167.5 MB 14.5s -[deep-backoff-kn] order= 8 ctx_len= 7 ctxs= 16,284,921 rows= 31,924,091 599.3 MB 9.0s ---- -DISQUALIFIED: training wall-clock budget exceeded (300.0 s) -submission : deep_backoff_kn -training duration : 300.1s -training energy (J): 4,789.4 (at kill) -wrote /tmp/result.json -Stopping app - local entrypoint completed. -✓ App completed. View run at -https://modal.com/apps/gabriel-nakajima-an/main/ap-45b2NtjIL0LErZ1xaqrUeX - -# final result -{ - "submission": "deep_backoff_kn", - "disqualified": true, - "reason": "train_time_exceeded", - "max_train_seconds": 300.0, - "training_energy_J": 4789.383014900002, - "training_duration_s": 300.091439702, - "cpu_energy_J": 12692.014912755005, - "total_energy_J": 17481.39792765501, - "gpu_name": "NVIDIA A100-SXM4-80GB", - "date_utc": "2026-05-20T07:15:05Z", - "_nvml": { - "nvml_available": true, - "energy_counter_supported": true, - "monotonic": true, - "idle_watts": 63.14000000000002, - "stress_watts_avg": 339.34362349669493, - "stress_energy_joules": 12477.219, - "stress_duration_s": 36.768685592, - "gpu_name": "NVIDIA A100-SXM4-80GB", - "notes": [] - }, - "contributor": "@nakajimagabriel" -} -0.6973 4389 char/s eta= 13s - eval 6,000/60,000 ( 10.0%) acc=0.6990 4443 char/s eta= 12s - eval 7,200/60,000 ( 12.0%) acc=0.6917 4487 char/s eta= 12s - eval 8,400/60,000 ( 14.0%) acc=0.6920 4521 char/s eta= 11s - eval 9,600/60,000 ( 16.0%) acc=0.6997 4527 char/s eta= 11s - eval 10,800/60,000 ( 18.0%) acc=0.7088 4532 char/s eta= 11s - eval 12,000/60,000 ( 20.0%) acc=0.7113 4538 char/s eta= 11s - eval 13,200/60,000 ( 22.0%) acc=0.7142 4539 char/s eta= 10s - eval 14,400/60,000 ( 24.0%) acc=0.7164 4545 char/s eta= 10s - eval 15,600/60,000 ( 26.0%) acc=0.7179 4548 char/s eta= 10s - eval 16,800/60,000 ( 28.0%) acc=0.7220 4552 char/s eta= 9s - eval 18,000/60,000 ( 30.0%) acc=0.7261 4554 char/s eta= 9s - eval 19,200/60,000 ( 32.0%) acc=0.7314 4551 char/s eta= 9s - eval 20,400/60,000 ( 34.0%) acc=0.7333 4554 char/s eta= 9s - eval 21,600/60,000 ( 36.0%) acc=0.7343 4561 char/s eta= 8s - eval 22,800/60,000 ( 38.0%) acc=0.7341 4563 char/s eta= 8s - eval 24,000/60,000 ( 40.0%) acc=0.7338 4566 char/s eta= 8s - eval 25,200/60,000 ( 42.0%) acc=0.7341 4567 char/s eta= 8s - eval 26,400/60,000 ( 44.0%) acc=0.7352 4568 char/s eta= 7s - eval 27,600/60,000 ( 46.0%) acc=0.7333 4572 char/s eta= 7s - eval 28,800/60,000 ( 48.0%) acc=0.7338 4577 char/s eta= 7s - eval 30,000/60,000 ( 50.0%) acc=0.7327 4582 char/s eta= 7s - eval 31,200/60,000 ( 52.0%) acc=0.7294 4589 char/s eta= 6s - eval 32,400/60,000 ( 54.0%) acc=0.7267 4596 char/s eta= 6s - eval 33,600/60,000 ( 56.0%) acc=0.7242 4602 char/s eta= 6s - eval 34,800/60,000 ( 58.0%) acc=0.7250 4604 char/s eta= 5s - eval 36,000/60,000 ( 60.0%) acc=0.7259 4604 char/s eta= 5s - eval 37,200/60,000 ( 62.0%) acc=0.7258 4604 char/s eta= 5s - eval 38,400/60,000 ( 64.0%) acc=0.7253 4603 char/s eta= 5s - eval 39,600/60,000 ( 66.0%) acc=0.7237 4605 char/s eta= 4s - eval 40,800/60,000 ( 68.0%) acc=0.7231 4606 char/s eta= 4s - eval 42,000/60,000 ( 70.0%) acc=0.7220 4606 char/s eta= 4s - eval 43,200/60,000 ( 72.0%) acc=0.7212 4607 char/s eta= 4s - eval 44,400/60,000 ( 74.0%) acc=0.7211 4605 char/s eta= 3s - eval 45,600/60,000 ( 76.0%) acc=0.7207 4604 char/s eta= 3s - eval 46,800/60,000 ( 78.0%) acc=0.7200 4604 char/s eta= 3s - eval 48,000/60,000 ( 80.0%) acc=0.7195 4603 char/s eta= 3s - eval 49,200/60,000 ( 82.0%) acc=0.7187 4603 char/s eta= 2s - eval 50,400/60,000 ( 84.0%) acc=0.7190 4603 char/s eta= 2s - eval 51,600/60,000 ( 86.0%) acc=0.7192 4604 char/s eta= 2s - eval 52,800/60,000 ( 88.0%) acc=0.7179 4612 char/s eta= 2s - eval 54,000/60,000 ( 90.0%) acc=0.7177 4613 char/s eta= 1s - eval 55,200/60,000 ( 92.0%) acc=0.7168 4614 char/s eta= 1s - eval 56,400/60,000 ( 94.0%) acc=0.7157 4616 char/s eta= 1s - eval 57,600/60,000 ( 96.0%) acc=0.7160 4616 char/s eta= 1s - eval 58,800/60,000 ( 98.0%) acc=0.7166 4616 char/s eta= 0s - eval 60,000/60,000 (100.0%) acc=0.7184 4615 char/s eta= 0s -chars=60,000 acc=0.7184 eval_duration=13.0s +[deep-backoff-kn] encoded train: 541,096,898 bytes (0.7s) +[deep-backoff-kn] np.unique k=14: 238,387,519 pairs 132.2s (n_workers=auto) +[deep-backoff-kn] order=14 ctx_len=13 ctxs=198,300,622 rows=238,387,519 13.7s +[deep-backoff-kn] order=13 ctx_len=12 ctxs=157,942,721 rows=198,300,621 6045.7 MB 40.4s +[deep-backoff-kn] order=12 ctx_len=11 ctxs=119,285,711 rows=157,942,720 4487.6 MB 32.6s +[deep-backoff-kn] order=11 ctx_len=10 ctxs= 84,282,363 rows=119,285,710 3124.9 MB 25.4s +[deep-backoff-kn] order=10 ctx_len= 9 ctxs= 54,720,376 rows= 84,282,363 2008.3 MB 18.6s +[deep-backoff-kn] order= 9 ctx_len= 8 ctxs= 31,924,091 rows= 54,720,376 1167.5 MB 12.7s +[deep-backoff-kn] order= 8 ctx_len= 7 ctxs= 16,284,921 rows= 31,924,091 599.3 MB 7.8s +[deep-backoff-kn] order= 7 ctx_len= 6 ctxs= 7,016,442 rows= 16,284,921 263.9 MB 4.3s +[deep-backoff-kn] order= 6 ctx_len= 5 ctxs= 2,438,281 rows= 7,016,442 96.0 MB 2.1s +[deep-backoff-kn] order= 5 ctx_len= 4 ctxs= 637,143 rows= 2,438,281 27.5 MB 0.8s +[deep-backoff-kn] order= 4 ctx_len= 3 ctxs= 122,882 rows= 637,143 6.0 MB 0.3s +[deep-backoff-kn] order= 3 ctx_len= 2 ctxs= 12,282 rows= 122,882 0.9 MB 0.1s +[deep-backoff-kn] order= 2 ctx_len= 1 ctxs= 204 rows= 12,282 0.1 MB 0.0s +[deep-backoff-kn] order= 1 ctx_len= 0 ctxs= 1 rows= 204 0.0 MB 0.0s +[deep-backoff-kn] continuation base: entropy=5.083 nats +[deep-backoff-kn] total build: 291.6s +training: 962.9 J duration=291.6s +evaluating on val split ... + eval 1,200/60,000 ( 2.0%) acc=0.7058 3616 char/s eta= 16s + eval 2,400/60,000 ( 4.0%) acc=0.6846 3861 char/s eta= 15s + eval 3,600/60,000 ( 6.0%) acc=0.6842 3973 char/s eta= 14s + eval 4,800/60,000 ( 8.0%) acc=0.6973 4017 char/s eta= 14s + eval 6,000/60,000 ( 10.0%) acc=0.6990 4069 char/s eta= 13s + eval 7,200/60,000 ( 12.0%) acc=0.6917 4142 char/s eta= 13s + eval 8,400/60,000 ( 14.0%) acc=0.6920 4201 char/s eta= 12s + eval 9,600/60,000 ( 16.0%) acc=0.6997 4215 char/s eta= 12s + eval 10,800/60,000 ( 18.0%) acc=0.7088 4255 char/s eta= 12s + eval 12,000/60,000 ( 20.0%) acc=0.7113 4251 char/s eta= 11s + eval 13,200/60,000 ( 22.0%) acc=0.7142 4248 char/s eta= 11s + eval 14,400/60,000 ( 24.0%) acc=0.7164 4260 char/s eta= 11s + eval 15,600/60,000 ( 26.0%) acc=0.7179 4288 char/s eta= 10s + eval 16,800/60,000 ( 28.0%) acc=0.7220 4297 char/s eta= 10s + eval 18,000/60,000 ( 30.0%) acc=0.7261 4304 char/s eta= 10s + eval 19,200/60,000 ( 32.0%) acc=0.7314 4304 char/s eta= 9s + eval 20,400/60,000 ( 34.0%) acc=0.7333 4311 char/s eta= 9s + eval 21,600/60,000 ( 36.0%) acc=0.7343 4318 char/s eta= 9s + eval 22,800/60,000 ( 38.0%) acc=0.7341 4319 char/s eta= 9s + eval 24,000/60,000 ( 40.0%) acc=0.7338 4321 char/s eta= 8s + eval 25,200/60,000 ( 42.0%) acc=0.7341 4327 char/s eta= 8s + eval 26,400/60,000 ( 44.0%) acc=0.7352 4341 char/s eta= 8s + eval 27,600/60,000 ( 46.0%) acc=0.7333 4360 char/s eta= 7s + eval 28,800/60,000 ( 48.0%) acc=0.7338 4364 char/s eta= 7s + eval 30,000/60,000 ( 50.0%) acc=0.7327 4365 char/s eta= 7s + eval 31,200/60,000 ( 52.0%) acc=0.7294 4370 char/s eta= 7s + eval 32,400/60,000 ( 54.0%) acc=0.7267 4374 char/s eta= 6s + eval 33,600/60,000 ( 56.0%) acc=0.7242 4382 char/s eta= 6s + eval 34,800/60,000 ( 58.0%) acc=0.7250 4386 char/s eta= 6s + eval 36,000/60,000 ( 60.0%) acc=0.7259 4388 char/s eta= 5s + eval 37,200/60,000 ( 62.0%) acc=0.7258 4393 char/s eta= 5s + eval 38,400/60,000 ( 64.0%) acc=0.7253 4396 char/s eta= 5s + eval 39,600/60,000 ( 66.0%) acc=0.7237 4394 char/s eta= 5s + eval 40,800/60,000 ( 68.0%) acc=0.7231 4395 char/s eta= 4s + eval 42,000/60,000 ( 70.0%) acc=0.7220 4395 char/s eta= 4s + eval 43,200/60,000 ( 72.0%) acc=0.7212 4398 char/s eta= 4s + eval 44,400/60,000 ( 74.0%) acc=0.7211 4398 char/s eta= 4s + eval 45,600/60,000 ( 76.0%) acc=0.7207 4400 char/s eta= 3s + eval 46,800/60,000 ( 78.0%) acc=0.7200 4397 char/s eta= 3s + eval 48,000/60,000 ( 80.0%) acc=0.7195 4392 char/s eta= 3s + eval 49,200/60,000 ( 82.0%) acc=0.7187 4392 char/s eta= 2s + eval 50,400/60,000 ( 84.0%) acc=0.7190 4392 char/s eta= 2s + eval 51,600/60,000 ( 86.0%) acc=0.7192 4391 char/s eta= 2s + eval 52,800/60,000 ( 88.0%) acc=0.7179 4401 char/s eta= 2s + eval 54,000/60,000 ( 90.0%) acc=0.7177 4405 char/s eta= 1s + eval 55,200/60,000 ( 92.0%) acc=0.7168 4411 char/s eta= 1s + eval 56,400/60,000 ( 94.0%) acc=0.7157 4411 char/s eta= 1s + eval 57,600/60,000 ( 96.0%) acc=0.7160 4407 char/s eta= 1s + eval 58,800/60,000 ( 98.0%) acc=0.7166 4403 char/s eta= 0s + eval 60,000/60,000 (100.0%) acc=0.7184 4405 char/s eta= 0s +chars=60,000 acc=0.7184 eval_duration=13.6s --- submission : deep_backoff_kn -training energy (J): 2,172.0 -training duration : 245.5s +training energy (J): 962.9 +training duration : 291.6s val char-accuracy : 0.7184 val chars : 60,000 wrote /tmp/result.json -Stopping app - local entrypoint completed. +Stopping app - local client disconnected. Use `modal run --detach` to keep apps running even if your local client disconnects. ✓ App completed. View run at -https://modal.com/apps/gabriel-nakajima-an/main/ap-wEz27zjOURQzDmbGKNMPqb +https://modal.com/apps/gabriel-nakajima-an/main/ap-1cZpQht7xa0YYz3oXegD83 # final result { "submission": "deep_backoff_kn", - "training_energy_J": 2172.0416936, - "training_duration_s": 245.475966128, - "cpu_energy_J": 10385.495287457501, - "total_energy_J": 12557.536981057501, + "training_energy_J": 962.9188647999999, + "training_duration_s": 291.568102704, + "cpu_energy_J": 12338.053218035013, + "total_energy_J": 14578.4051352, "val_char_accuracy": 0.7184166666666667, "val_chars": 60000, "gpu_name": "NVIDIA A100 80GB PCIe", - "date_utc": "2026-05-20T07:13:17Z", + "date_utc": "2026-05-21T05:09:09Z", "_nvml": { "nvml_available": true, "energy_counter_supported": true, "monotonic": true, - "idle_watts": 57.467733333333335, - "stress_watts_avg": 236.89272714598408, - "stress_energy_joules": 8693.929, - "stress_duration_s": 36.699856111, + "idle_watts": 52.31990000000003, + "stress_watts_avg": 231.05724561444927, + "stress_energy_joules": 8643.325, + "stress_duration_s": 37.407721091, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": [] }, diff --git a/submissions/gpu_ngram_o14_xorfix/nvml.json b/submissions/gpu_ngram_o14_xorfix/nvml.json index ef2d82e..decba91 100644 --- a/submissions/gpu_ngram_o14_xorfix/nvml.json +++ b/submissions/gpu_ngram_o14_xorfix/nvml.json @@ -2,10 +2,10 @@ "nvml_available": true, "energy_counter_supported": true, "monotonic": true, - "idle_watts": 53.04595000000001, - "stress_watts_avg": 222.8521356537582, - "stress_energy_joules": 8477.795, - "stress_duration_s": 38.042242562000006, + "idle_watts": 55.68680000000003, + "stress_watts_avg": 236.68642874335322, + "stress_energy_joules": 8655.791, + "stress_duration_s": 36.570711071, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": [] } diff --git a/submissions/gpu_ngram_o14_xorfix/result.json b/submissions/gpu_ngram_o14_xorfix/result.json index 96d8384..2b79e7d 100644 --- a/submissions/gpu_ngram_o14_xorfix/result.json +++ b/submissions/gpu_ngram_o14_xorfix/result.json @@ -1,21 +1,21 @@ { "submission": "gpu_ngram_o14_xorfix", - "training_energy_J": 3441.0376875, - "training_duration_s": 97.64232625, - "cpu_energy_J": 4134.604408382503, - "total_energy_J": 7575.642095882503, + "training_energy_J": 3981.1039870000004, + "training_duration_s": 109.11744026, + "cpu_energy_J": 4621.207360232503, + "total_energy_J": 8602.311347232502, "val_char_accuracy": 0.7184166666666667, "val_chars": 60000, "gpu_name": "NVIDIA A100 80GB PCIe", - "date_utc": "2026-05-20T07:11:46Z", + "date_utc": "2026-05-21T05:06:04Z", "_nvml": { "nvml_available": true, "energy_counter_supported": true, "monotonic": true, - "idle_watts": 53.04595000000001, - "stress_watts_avg": 222.8521356537582, - "stress_energy_joules": 8477.795, - "stress_duration_s": 38.042242562000006, + "idle_watts": 55.68680000000003, + "stress_watts_avg": 236.68642874335322, + "stress_energy_joules": 8655.791, + "stress_duration_s": 36.570711071, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": [] }, diff --git a/submissions/gpu_ngram_o14_xorfix/run.log b/submissions/gpu_ngram_o14_xorfix/run.log index ee3d8b8..997ca3c 100644 --- a/submissions/gpu_ngram_o14_xorfix/run.log +++ b/submissions/gpu_ngram_o14_xorfix/run.log @@ -1,7 +1,7 @@ -# wikitext submit.py log — gpu_ngram_o14_xorfix — 2026-05-20T07:08:39+00:00Z +# wikitext submit.py log — gpu_ngram_o14_xorfix — 2026-05-21T05:03:02+00:00Z [modal] launching A100-80GB ... ✓ Initialized. View run at -https://modal.com/apps/gabriel-nakajima-an/main/ap-i8ghm6z5tQ198XlJNnHqte +https://modal.com/apps/gabriel-nakajima-an/main/ap-Hgn8xfV74JBawHiLMba2fr ✓ Created objects. ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/submit.py ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/task.py @@ -12,14 +12,14 @@ https://modal.com/apps/gabriel-nakajima-an/main/ap-i8ghm6z5tQ198XlJNnHqte [modal] verifying NVML energy counter ... GPU: NVIDIA A100 80GB PCIe sampling idle power for 3s ... - idle: 53.0 W + idle: 55.7 W running 30s stress workload ... - duration: 38.0 s - energy delta: 8,477.8 J - avg power: 222.9 W + duration: 36.6 s + energy delta: 8,655.8 J + avg power: 236.7 W monotonic: True --- -{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 53.04595000000001, "stress_watts_avg": 222.8521356537582, "stress_energy_joules": 8477.795, "stress_duration_s": 38.042242562000006, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": []} +{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 55.68680000000003, "stress_watts_avg": 236.68642874335322, "stress_energy_joules": 8655.791, "stress_duration_s": 36.570711071, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": []} [modal] running submission (TEST_CHARS=60000 MAX_TRAIN_SECONDS=300.0 ACC_MIN=0.7) ... loading WikiText-103 from /data ... train chars: 540,095,682 @@ -27,112 +27,110 @@ loading WikiText-103 from /data ... train wall-clock cap: 300 s val accuracy floor : 0.7000 training submission /workspace/gpu_ngram_o14_xorfix.py ... -[codecarbon WARNING @ 07:09:50] Multiple instances of codecarbon are allowed to run at the same time. +[codecarbon WARNING @ 05:03:58] Multiple instances of codecarbon are allowed to run at the same time. [gpu_ngram_o14_xorfix] starting build; max_order=14 D=0.5 -[gpu_ngram_o14_xorfix] encoded train: 541,096,898 bytes (0.4s) -[gpu_ngram_o14_xorfix] top order=14 unique pairs: 238,387,519 3.5s -[gpu_ngram_o14_xorfix] ctx_len=13 ctxs=198,300,622 rows=238,387,519 26.6s -[gpu_ngram_o14_xorfix] ctx_len=12 ctxs=157,942,721 rows=198,300,621 20.2s -[gpu_ngram_o14_xorfix] ctx_len=11 ctxs=119,285,711 rows=157,942,720 18.2s -[gpu_ngram_o14_xorfix] ctx_len=10 ctxs=84,282,363 rows=119,285,710 11.0s -[gpu_ngram_o14_xorfix] ctx_len=9 ctxs=54,720,376 rows=84,282,363 7.3s -[gpu_ngram_o14_xorfix] ctx_len=8 ctxs=31,924,091 rows=54,720,376 4.5s -[gpu_ngram_o14_xorfix] ctx_len=7 ctxs=16,284,921 rows=31,924,091 3.0s -[gpu_ngram_o14_xorfix] ctx_len=6 ctxs=7,016,442 rows=16,284,921 1.4s +[gpu_ngram_o14_xorfix] encoded train: 541,096,898 bytes (0.3s) +[gpu_ngram_o14_xorfix] top order=14 unique pairs: 238,387,519 3.3s +[gpu_ngram_o14_xorfix] ctx_len=13 ctxs=198,300,622 rows=238,387,519 32.0s +[gpu_ngram_o14_xorfix] ctx_len=12 ctxs=157,942,721 rows=198,300,621 24.4s +[gpu_ngram_o14_xorfix] ctx_len=11 ctxs=119,285,711 rows=157,942,720 18.0s +[gpu_ngram_o14_xorfix] ctx_len=10 ctxs=84,282,363 rows=119,285,710 12.6s +[gpu_ngram_o14_xorfix] ctx_len=9 ctxs=54,720,376 rows=84,282,363 8.4s +[gpu_ngram_o14_xorfix] ctx_len=8 ctxs=31,924,091 rows=54,720,376 4.9s +[gpu_ngram_o14_xorfix] ctx_len=7 ctxs=16,284,921 rows=31,924,091 2.6s +[gpu_ngram_o14_xorfix] ctx_len=6 ctxs=7,016,442 rows=16,284,921 1.2s [gpu_ngram_o14_xorfix] ctx_len=5 ctxs=2,438,281 rows=7,016,442 0.5s [gpu_ngram_o14_xorfix] ctx_len=4 ctxs=637,143 rows=2,438,281 0.1s [gpu_ngram_o14_xorfix] ctx_len=3 ctxs=122,882 rows=637,143 0.0s [gpu_ngram_o14_xorfix] ctx_len=2 ctxs=12,282 rows=122,882 0.0s [gpu_ngram_o14_xorfix] ctx_len=1 ctxs=204 rows=12,282 0.0s [gpu_ngram_o14_xorfix] ctx_len=0 ctxs=1 rows=204 0.0s -[gpu_ngram_o14_xorfix] total build: 96.9s -training: 3,441.0 J duration=97.6s +[gpu_ngram_o14_xorfix] total build: 108.4s +training: 3,981.1 J duration=109.1s evaluating on val split ... - eval 1,200/60,000 ( 2.0%) acc=0.7058 3412 char/s eta= 17s - eval 2,400/60,000 ( 4.0%) acc=0.6846 3685 char/s eta= 16s - eval 3,600/60,000 ( 6.0%) acc=0.6842 3807 char/s eta= 15s - eval 4,800/60,000 ( 8.0%) acc=0.6973 3873 char/s eta= 14s - eval 6,000/60,000 ( 10.0%) acc=0.6990 3943 char/s eta= 14s - eval 7,200/60,000 ( 12.0%) acc=0.6917 3994 char/s eta= 13s - eval 8,400/60,000 ( 14.0%) acc=0.6920 4064 char/s eta= 13s - eval 9,600/60,000 ( 16.0%) acc=0.6997 4071 char/s eta= 12s - eval 10,800/60,000 ( 18.0%) acc=0.7088 4079 char/s eta= 12s - eval 12,000/60,000 ( 20.0%) acc=0.7113 4086 char/s eta= 12s - eval 13,200/60,000 ( 22.0%) acc=0.7142 4117 char/s eta= 11s - eval 14,400/60,000 ( 24.0%) acc=0.7164 4122 char/s eta= 11s - eval 15,600/60,000 ( 26.0%) acc=0.7179 4123 char/s eta= 11s - eval 16,800/60,000 ( 28.0%) acc=0.7220 4127 char/s eta= 10s - eval 18,000/60,000 ( 30.0%) acc=0.7261 4122 char/s eta= 10s - eval 19,200/60,000 ( 32.0%) acc=0.7314 4118 char/s eta= 10s - eval 20,400/60,000 ( 34.0%) acc=0.7333 4119 char/s eta= 10s - eval 21,600/60,000 ( 36.0%) acc=0.7343 4122 char/s eta= 9s - eval 22,800/60,000 ( 38.0%) acc=0.7341 4135 char/s eta= 9s - eval 24,000/60,000 ( 40.0%) acc=0.7338 4136 char/s eta= 9s - eval 25,200/60,000 ( 42.0%) acc=0.7341 4138 char/s eta= 8s - eval 26,400/60,000 ( 44.0%) acc=0.7352 4139 char/s eta= 8s - eval 27,600/60,000 ( 46.0%) acc=0.7333 4141 char/s eta= 8s - eval 28,800/60,000 ( 48.0%) acc=0.7338 4145 char/s eta= 8s - eval 30,000/60,000 ( 50.0%) acc=0.7327 4155 char/s eta= 7s - eval 31,200/60,000 ( 52.0%) acc=0.7294 4162 char/s eta= 7s - eval 32,400/60,000 ( 54.0%) acc=0.7267 4183 char/s eta= 7s - eval 33,600/60,000 ( 56.0%) acc=0.7242 4186 char/s eta= 6s - eval 34,800/60,000 ( 58.0%) acc=0.7250 4186 char/s eta= 6s - eval 36,000/60,000 ( 60.0%) acc=0.7259 4185 char/s eta= 6s - eval 37,200/60,000 ( 62.0%) acc=0.7258 4185 char/s eta= 5s - eval 38,400/60,000 ( 64.0%) acc=0.7253 4198 char/s eta= 5s - eval 39,600/60,000 ( 66.0%) acc=0.7237 4199 char/s eta= 5s - eval 40,800/60,000 ( 68.0%) acc=0.7231 4197 char/s eta= 5s - eval 42,000/60,000 ( 70.0%) acc=0.7220 4197 char/s eta= 4s - eval 43,200/60,000 ( 72.0%) acc=0.7212 4197 char/s eta= 4s - eval 44,400/60,000 ( 74.0%) acc=0.7211 4209 char/s eta= 4s - eval 45,600/60,000 ( 76.0%) acc=0.7207 4206 char/s eta= 3s - eval 46,800/60,000 ( 78.0%) acc=0.7200 4203 char/s eta= 3s - eval 48,000/60,000 ( 80.0%) acc=0.7195 4200 char/s eta= 3s - eval 49,200/60,000 ( 82.0%) acc=0.7187 4199 char/s eta= 3s - eval 50,400/60,000 ( 84.0%) acc=0.7190 4198 char/s eta= 2s - eval 51,600/60,000 ( 86.0%) acc=0.7192 4198 char/s eta= 2s - eval 52,800/60,000 ( 88.0%) acc=0.7179 4206 char/s eta= 2s - eval 54,000/60,000 ( 90.0%) acc=0.7177 4206 char/s eta= 1s - eval 55,200/60,000 ( 92.0%) acc=0.7168 4207 char/s eta= 1s - eval 56,400/60,000 ( 94.0%) acc=0.7157 4208 char/s eta= 1s - eval 57,600/60,000 ( 96.0%) acc=0.7160 4217 char/s eta= 1s - eval 58,800/60,000 ( 98.0%) acc=0.7166 4215 char/s eta= 0s - eval 60,000/60,000 (100.0%) acc=0.7184 4213 char/s eta= 0s -chars=60,000 acc=0.7184 eval_duration=14.2s + eval 1,200/60,000 ( 2.0%) acc=0.7058 4071 char/s eta= 14s + eval 2,400/60,000 ( 4.0%) acc=0.6846 4390 char/s eta= 13s + eval 3,600/60,000 ( 6.0%) acc=0.6842 4530 char/s eta= 12s + eval 4,800/60,000 ( 8.0%) acc=0.6973 4595 char/s eta= 12s + eval 6,000/60,000 ( 10.0%) acc=0.6990 4664 char/s eta= 12s + eval 7,200/60,000 ( 12.0%) acc=0.6917 4716 char/s eta= 11s + eval 8,400/60,000 ( 14.0%) acc=0.6920 4755 char/s eta= 11s + eval 9,600/60,000 ( 16.0%) acc=0.6997 4769 char/s eta= 11s + eval 10,800/60,000 ( 18.0%) acc=0.7088 4778 char/s eta= 10s + eval 12,000/60,000 ( 20.0%) acc=0.7113 4789 char/s eta= 10s + eval 13,200/60,000 ( 22.0%) acc=0.7142 4795 char/s eta= 10s + eval 14,400/60,000 ( 24.0%) acc=0.7164 4803 char/s eta= 9s + eval 15,600/60,000 ( 26.0%) acc=0.7179 4807 char/s eta= 9s + eval 16,800/60,000 ( 28.0%) acc=0.7220 4813 char/s eta= 9s + eval 18,000/60,000 ( 30.0%) acc=0.7261 4815 char/s eta= 9s + eval 19,200/60,000 ( 32.0%) acc=0.7314 4812 char/s eta= 8s + eval 20,400/60,000 ( 34.0%) acc=0.7333 4815 char/s eta= 8s + eval 21,600/60,000 ( 36.0%) acc=0.7343 4822 char/s eta= 8s + eval 22,800/60,000 ( 38.0%) acc=0.7341 4824 char/s eta= 8s + eval 24,000/60,000 ( 40.0%) acc=0.7338 4827 char/s eta= 7s + eval 25,200/60,000 ( 42.0%) acc=0.7341 4826 char/s eta= 7s + eval 26,400/60,000 ( 44.0%) acc=0.7352 4826 char/s eta= 7s + eval 27,600/60,000 ( 46.0%) acc=0.7333 4830 char/s eta= 7s + eval 28,800/60,000 ( 48.0%) acc=0.7338 4836 char/s eta= 6s + eval 30,000/60,000 ( 50.0%) acc=0.7327 4841 char/s eta= 6s + eval 31,200/60,000 ( 52.0%) acc=0.7294 4848 char/s eta= 6s + eval 32,400/60,000 ( 54.0%) acc=0.7267 4856 char/s eta= 6s + eval 33,600/60,000 ( 56.0%) acc=0.7242 4863 char/s eta= 5s + eval 34,800/60,000 ( 58.0%) acc=0.7250 4864 char/s eta= 5s + eval 36,000/60,000 ( 60.0%) acc=0.7259 4865 char/s eta= 5s + eval 37,200/60,000 ( 62.0%) acc=0.7258 4864 char/s eta= 5s + eval 38,400/60,000 ( 64.0%) acc=0.7253 4864 char/s eta= 4s + eval 39,600/60,000 ( 66.0%) acc=0.7237 4868 char/s eta= 4s + eval 40,800/60,000 ( 68.0%) acc=0.7231 4869 char/s eta= 4s + eval 42,000/60,000 ( 70.0%) acc=0.7220 4870 char/s eta= 4s + eval 43,200/60,000 ( 72.0%) acc=0.7212 4872 char/s eta= 3s + eval 44,400/60,000 ( 74.0%) acc=0.7211 4870 char/s eta= 3s + eval 45,600/60,000 ( 76.0%) acc=0.7207 4870 char/s eta= 3s + eval 46,800/60,000 ( 78.0%) acc=0.7200 4868 char/s eta= 3s + eval 48,000/60,000 ( 80.0%) acc=0.7195 4868 char/s eta= 2s + eval 49,200/60,000 ( 82.0%) acc=0.7187 4868 char/s eta= 2s + eval 50,400/60,000 ( 84.0%) acc=0.7190 4868 char/s eta= 2s + eval 51,600/60,000 ( 86.0%) acc=0.7192 4868 char/s eta= 2s + eval 52,800/60,000 ( 88.0%) acc=0.7179 4876 char/s eta= 1s + eval 54,000/60,000 ( 90.0%) acc=0.7177 4876 char/s eta= 1s + eval 55,200/60,000 ( 92.0%) acc=0.7168 4878 char/s eta= 1s + eval 56,400/60,000 ( 94.0%) acc=0.7157 4880 char/s eta= 1s + eval 57,600/60,000 ( 96.0%) acc=0.7160 4880 char/s eta= 0s + eval 58,800/60,000 ( 98.0%) acc=0.7166 4879 char/s eta= 0s + eval 60,000/60,000 (100.0%) acc=0.7184 4879 char/s eta= 0s +chars=60,000 acc=0.7184 eval_duration=12.3s --- submission : gpu_ngram_o14_xorfix -training energy (J): 3,441.0 -training duration : 97.6s +training energy (J): 3,981.1 +training duration : 109.1s val char-accuracy : 0.7184 val chars : 60,000 wrote /tmp/result.json Stopping app - local entrypoint completed. ✓ App completed. View run at -https://modal.com/apps/gabriel-nakajima-an/main/ap-i8ghm6z5tQ198XlJNnHqte +https://modal.com/apps/gabriel-nakajima-an/main/ap-Hgn8xfV74JBawHiLMba2fr # final result { "submission": "gpu_ngram_o14_xorfix", - "training_energy_J": 3441.0376875, - "training_duration_s": 97.64232625, - "cpu_energy_J": 4134.604408382503, - "total_energy_J": 7575.642095882503, + "training_energy_J": 3981.1039870000004, + "training_duration_s": 109.11744026, + "cpu_energy_J": 4621.207360232503, + "total_energy_J": 8602.311347232502, "val_char_accuracy": 0.7184166666666667, "val_chars": 60000, "gpu_name": "NVIDIA A100 80GB PCIe", - "date_utc": "2026-05-20T07:11:46Z", + "date_utc": "2026-05-21T05:06:04Z", "_nvml": { "nvml_available": true, "energy_counter_supported": true, "monotonic": true, - "idle_watts": 53.04595000000001, - "stress_watts_avg": 222.8521356537582, - "stress_energy_joules": 8477.795, - "stress_duration_s": 38.042242562000006, + "idle_watts": 55.68680000000003, + "stress_watts_avg": 236.68642874335322, + "stress_energy_joules": 8655.791, + "stress_duration_s": 36.570711071, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": [] }, "contributor": "@subagent-xorfix-2026-05-19" } -ix-2026-05-19" -} diff --git a/submissions/gpu_ngram_w31_k11/nvml.json b/submissions/gpu_ngram_w31_k11/nvml.json index a8d215a..f9eae32 100644 --- a/submissions/gpu_ngram_w31_k11/nvml.json +++ b/submissions/gpu_ngram_w31_k11/nvml.json @@ -2,10 +2,10 @@ "nvml_available": true, "energy_counter_supported": true, "monotonic": true, - "idle_watts": 52.723183333333296, - "stress_watts_avg": 226.95174796885868, - "stress_energy_joules": 8335.027, - "stress_duration_s": 36.72598724, + "idle_watts": 55.01793220338981, + "stress_watts_avg": 230.13301547418834, + "stress_energy_joules": 8413.972, + "stress_duration_s": 36.561342503000006, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": [] } diff --git a/submissions/gpu_ngram_w31_k11/result.json b/submissions/gpu_ngram_w31_k11/result.json index 54dd7c0..3d6a2df 100644 --- a/submissions/gpu_ngram_w31_k11/result.json +++ b/submissions/gpu_ngram_w31_k11/result.json @@ -1,21 +1,21 @@ { "submission": "gpu_ngram_w31_k11", - "training_energy_J": 1332.8045820499997, - "training_duration_s": 33.551668359000004, - "cpu_energy_J": 1420.9300898524978, - "total_energy_J": 2753.734671902497, + "training_energy_J": 1612.2052069500003, + "training_duration_s": 34.944975860999996, + "cpu_energy_J": 1479.7634497275012, + "total_energy_J": 3091.9686566775017, "val_char_accuracy": 0.7050333333333333, "val_chars": 60000, "gpu_name": "NVIDIA A100 80GB PCIe", - "date_utc": "2026-05-20T07:07:33Z", + "date_utc": "2026-05-21T05:04:49Z", "_nvml": { "nvml_available": true, "energy_counter_supported": true, "monotonic": true, - "idle_watts": 52.723183333333296, - "stress_watts_avg": 226.95174796885868, - "stress_energy_joules": 8335.027, - "stress_duration_s": 36.72598724, + "idle_watts": 55.01793220338981, + "stress_watts_avg": 230.13301547418834, + "stress_energy_joules": 8413.972, + "stress_duration_s": 36.561342503000006, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": [] }, diff --git a/submissions/gpu_ngram_w31_k11/run.log b/submissions/gpu_ngram_w31_k11/run.log index cabeef8..c1cc8fe 100644 --- a/submissions/gpu_ngram_w31_k11/run.log +++ b/submissions/gpu_ngram_w31_k11/run.log @@ -1,163 +1,7 @@ -# wikitext submit.py log — gpu_ngram_w31_k11 — 2026-05-20T07:05:37+00:00Z +# wikitext submit.py log — gpu_ngram_w31_k11 — 2026-05-21T05:03:02+00:00Z [modal] launching A100-80GB ... ✓ Initialized. View run at -https://modal.com/apps/gabriel-nakajima-an/main/ap-Xr2U1qCw3wvtCqAVizWeyd -Building image im-HqRgnUnflxE8oQRhywMp4D - -=> Step 0: FROM base - -=> Step 1: RUN python -m pip install codecarbon -Looking in indexes: http://pypi-mirror.modal.local:5555/simple -Collecting codecarbon - Downloading http://pypi-mirror.modal.local:5555/simple/codecarbon/codecarbon-3.2.7-py3-none-any.whl.metadata (9.7 kB) -Collecting arrow (from codecarbon) - Downloading http://pypi-mirror.modal.local:5555/simple/arrow/arrow-1.4.0-py3-none-any.whl.metadata (7.7 kB) -Collecting authlib>=1.2.1 (from codecarbon) - Downloading http://pypi-mirror.modal.local:5555/simple/authlib/authlib-1.7.2-py2.py3-none-any.whl.metadata (10 kB) -Collecting click (from codecarbon) - Downloading http://pypi-mirror.modal.local:5555/simple/click/click-8.4.0-py3-none-any.whl.metadata (2.6 kB) -Collecting pandas (from codecarbon) - Downloading http://pypi-mirror.modal.local:5555/simple/pandas/pandas-3.0.3-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.metadata (79 kB) - ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 79.5/79.5 kB 240.5 MB/s eta 0:00:00 -Collecting prometheus_client (from codecarbon) - Downloading http://pypi-mirror.modal.local:5555/simple/prometheus-client/prometheus_client-0.25.0-py3-none-any.whl.metadata (2.1 kB) -Collecting psutil>=6.0.0 (from codecarbon) - Downloading http://pypi-mirror.modal.local:5555/simple/psutil/psutil-7.2.2-cp36-abi3-manylinux2010_x86_64.manylinux_2_12_x86_64.manylinux_2_28_x86_64.whl.metadata (22 kB) -Collecting py-cpuinfo (from codecarbon) - Downloading http://pypi-mirror.modal.local:5555/simple/py-cpuinfo/py_cpuinfo-9.0.0-py3-none-any.whl.metadata (794 bytes) -Collecting pydantic (from codecarbon) - Downloading http://pypi-mirror.modal.local:5555/simple/pydantic/pydantic-2.13.4-py3-none-any.whl.metadata (109 kB) - ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 109.4/109.4 kB 233.4 MB/s eta 0:00:00 -Requirement already satisfied: nvidia-ml-py in /usr/local/lib/python3.11/site-packages (from codecarbon) (12.560.30) -Collecting rapidfuzz (from codecarbon) - Downloading http://pypi-mirror.modal.local:5555/simple/rapidfuzz/rapidfuzz-3.14.5-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (12 kB) -Requirement already satisfied: requests in /usr/local/lib/python3.11/site-packages (from codecarbon) (2.34.2) -Collecting questionary (from codecarbon) - Downloading http://pypi-mirror.modal.local:5555/simple/questionary/questionary-2.1.1-py3-none-any.whl.metadata (5.4 kB) -Collecting rich (from codecarbon) - Downloading http://pypi-mirror.modal.local:5555/simple/rich/rich-15.0.0-py3-none-any.whl.metadata (18 kB) -Collecting typer (from codecarbon) - Downloading http://pypi-mirror.modal.local:5555/simple/typer/typer-0.25.1-py3-none-any.whl.metadata (15 kB) -Collecting pycountry (from codecarbon) - Downloading http://pypi-mirror.modal.local:5555/simple/pycountry/pycountry-26.2.16-py3-none-any.whl.metadata (12 kB) -Collecting cryptography (from authlib>=1.2.1->codecarbon) - Downloading http://pypi-mirror.modal.local:5555/simple/cryptography/cryptography-48.0.0-cp311-abi3-manylinux_2_34_x86_64.whl.metadata (4.3 kB) -Collecting joserfc>=1.6.0 (from authlib>=1.2.1->codecarbon) - Downloading http://pypi-mirror.modal.local:5555/simple/joserfc/joserfc-1.6.5-py3-none-any.whl.metadata (3.2 kB) -Collecting python-dateutil>=2.7.0 (from arrow->codecarbon) - Downloading http://pypi-mirror.modal.local:5555/simple/python-dateutil/python_dateutil-2.9.0.post0-py2.py3-none-any.whl.metadata (8.4 kB) -Collecting tzdata (from arrow->codecarbon) - Downloading http://pypi-mirror.modal.local:5555/simple/tzdata/tzdata-2026.2-py2.py3-none-any.whl.metadata (1.4 kB) -Requirement already satisfied: numpy>=1.26.0 in /usr/local/lib/python3.11/site-packages (from pandas->codecarbon) (2.1.3) -Collecting annotated-types>=0.6.0 (from pydantic->codecarbon) - Downloading http://pypi-mirror.modal.local:5555/simple/annotated-types/annotated_types-0.7.0-py3-none-any.whl.metadata (15 kB) -Collecting pydantic-core==2.46.4 (from pydantic->codecarbon) - Downloading http://pypi-mirror.modal.local:5555/simple/pydantic-core/pydantic_core-2.46.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB) -Requirement already satisfied: typing-extensions>=4.14.1 in /usr/local/lib/python3.11/site-packages (from pydantic->codecarbon) (4.15.0) -Collecting typing-inspection>=0.4.2 (from pydantic->codecarbon) - Downloading http://pypi-mirror.modal.local:5555/simple/typing-inspection/typing_inspection-0.4.2-py3-none-any.whl.metadata (2.6 kB) -Collecting prompt_toolkit<4.0,>=2.0 (from questionary->codecarbon) - Downloading http://pypi-mirror.modal.local:5555/simple/prompt-toolkit/prompt_toolkit-3.0.52-py3-none-any.whl.metadata (6.4 kB) -Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.11/site-packages (from requests->codecarbon) (3.4.7) -Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/site-packages (from requests->codecarbon) (3.15) -Requirement already satisfied: urllib3<3,>=1.26 in /usr/local/lib/python3.11/site-packages (from requests->codecarbon) (2.7.0) -Requirement already satisfied: certifi>=2023.5.7 in /usr/local/lib/python3.11/site-packages (from requests->codecarbon) (2026.4.22) -Collecting markdown-it-py>=2.2.0 (from rich->codecarbon) - Downloading http://pypi-mirror.modal.local:5555/simple/markdown-it-py/markdown_it_py-4.2.0-py3-none-any.whl.metadata (7.4 kB) -Collecting pygments<3.0.0,>=2.13.0 (from rich->codecarbon) - Downloading http://pypi-mirror.modal.local:5555/simple/pygments/pygments-2.20.0-py3-none-any.whl.metadata (2.5 kB) -Collecting shellingham>=1.3.0 (from typer->codecarbon) - Downloading http://pypi-mirror.modal.local:5555/simple/shellingham/shellingham-1.5.4-py2.py3-none-any.whl.metadata (3.5 kB) -Collecting annotated-doc>=0.0.2 (from typer->codecarbon) - Downloading http://pypi-mirror.modal.local:5555/simple/annotated-doc/annotated_doc-0.0.4-py3-none-any.whl.metadata (6.6 kB) -Collecting cffi>=2.0.0 (from cryptography->authlib>=1.2.1->codecarbon) - Downloading http://pypi-mirror.modal.local:5555/simple/cffi/cffi-2.0.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.6 kB) -Collecting mdurl~=0.1 (from markdown-it-py>=2.2.0->rich->codecarbon) - Downloading http://pypi-mirror.modal.local:5555/simple/mdurl/mdurl-0.1.2-py3-none-any.whl.metadata (1.6 kB) -Collecting wcwidth (from prompt_toolkit<4.0,>=2.0->questionary->codecarbon) - Downloading http://pypi-mirror.modal.local:5555/simple/wcwidth/wcwidth-0.7.0-py3-none-any.whl.metadata (36 kB) -Collecting six>=1.5 (from python-dateutil>=2.7.0->arrow->codecarbon) - Downloading http://pypi-mirror.modal.local:5555/simple/six/six-1.17.0-py2.py3-none-any.whl.metadata (1.7 kB) -Collecting pycparser (from cffi>=2.0.0->cryptography->authlib>=1.2.1->codecarbon) - Downloading http://pypi-mirror.modal.local:5555/simple/pycparser/pycparser-3.0-py3-none-any.whl.metadata (8.2 kB) -Downloading http://pypi-mirror.modal.local:5555/simple/codecarbon/codecarbon-3.2.7-py3-none-any.whl (380 kB) - ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 380.5/380.5 kB 109.2 MB/s eta 0:00:00 -Downloading http://pypi-mirror.modal.local:5555/simple/authlib/authlib-1.7.2-py2.py3-none-any.whl (259 kB) - ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 259.5/259.5 kB 276.8 MB/s eta 0:00:00 -Downloading http://pypi-mirror.modal.local:5555/simple/psutil/psutil-7.2.2-cp36-abi3-manylinux2010_x86_64.manylinux_2_12_x86_64.manylinux_2_28_x86_64.whl (155 kB) - ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 155.6/155.6 kB 266.9 MB/s eta 0:00:00 -Downloading http://pypi-mirror.modal.local:5555/simple/arrow/arrow-1.4.0-py3-none-any.whl (68 kB) - ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 68.8/68.8 kB 235.6 MB/s eta 0:00:00 -Downloading http://pypi-mirror.modal.local:5555/simple/click/click-8.4.0-py3-none-any.whl (116 kB) - ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 116.1/116.1 kB 77.3 MB/s eta 0:00:00 -Downloading http://pypi-mirror.modal.local:5555/simple/pandas/pandas-3.0.3-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (11.3 MB) - ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.3/11.3 MB 259.3 MB/s eta 0:00:00 -Downloading http://pypi-mirror.modal.local:5555/simple/prometheus-client/prometheus_client-0.25.0-py3-none-any.whl (64 kB) - ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.2/64.2 kB 237.0 MB/s eta 0:00:00 -Downloading http://pypi-mirror.modal.local:5555/simple/py-cpuinfo/py_cpuinfo-9.0.0-py3-none-any.whl (22 kB) -Downloading http://pypi-mirror.modal.local:5555/simple/pycountry/pycountry-26.2.16-py3-none-any.whl (8.0 MB) - ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.0/8.0 MB 251.2 MB/s eta 0:00:00 -Downloading http://pypi-mirror.modal.local:5555/simple/pydantic/pydantic-2.13.4-py3-none-any.whl (472 kB) - ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 472.3/472.3 kB 283.6 MB/s eta 0:00:00 -Downloading http://pypi-mirror.modal.local:5555/simple/pydantic-core/pydantic_core-2.46.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB) - ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 229.6 MB/s eta 0:00:00 -Downloading http://pypi-mirror.modal.local:5555/simple/questionary/questionary-2.1.1-py3-none-any.whl (36 kB) -Downloading http://pypi-mirror.modal.local:5555/simple/rapidfuzz/rapidfuzz-3.14.5-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (3.2 MB) - ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.2/3.2 MB 249.4 MB/s eta 0:00:00 -Downloading http://pypi-mirror.modal.local:5555/simple/rich/rich-15.0.0-py3-none-any.whl (310 kB) - ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 310.7/310.7 kB 254.2 MB/s eta 0:00:00 -Downloading http://pypi-mirror.modal.local:5555/simple/typer/typer-0.25.1-py3-none-any.whl (58 kB) - ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.4/58.4 kB 235.9 MB/s eta 0:00:00 -Downloading http://pypi-mirror.modal.local:5555/simple/annotated-doc/annotated_doc-0.0.4-py3-none-any.whl (5.3 kB) -Downloading http://pypi-mirror.modal.local:5555/simple/annotated-types/annotated_types-0.7.0-py3-none-any.whl (13 kB) -Downloading http://pypi-mirror.modal.local:5555/simple/joserfc/joserfc-1.6.5-py3-none-any.whl (70 kB) - ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 70.5/70.5 kB 242.0 MB/s eta 0:00:00 -Downloading http://pypi-mirror.modal.local:5555/simple/cryptography/cryptography-48.0.0-cp311-abi3-manylinux_2_34_x86_64.whl (4.7 MB) - ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.7/4.7 MB 234.6 MB/s eta 0:00:00 -Downloading http://pypi-mirror.modal.local:5555/simple/markdown-it-py/markdown_it_py-4.2.0-py3-none-any.whl (91 kB) - ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 91.7/91.7 kB 260.7 MB/s eta 0:00:00 -Downloading http://pypi-mirror.modal.local:5555/simple/prompt-toolkit/prompt_toolkit-3.0.52-py3-none-any.whl (391 kB) - ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 391.4/391.4 kB 168.2 MB/s eta 0:00:00 -Downloading http://pypi-mirror.modal.local:5555/simple/pygments/pygments-2.20.0-py3-none-any.whl (1.2 MB) - ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 259.4 MB/s eta 0:00:00 -Downloading http://pypi-mirror.modal.local:5555/simple/python-dateutil/python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB) - ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 229.9/229.9 kB 265.0 MB/s eta 0:00:00 -Downloading http://pypi-mirror.modal.local:5555/simple/shellingham/shellingham-1.5.4-py2.py3-none-any.whl (9.8 kB) -Downloading http://pypi-mirror.modal.local:5555/simple/typing-inspection/typing_inspection-0.4.2-py3-none-any.whl (14 kB) -Downloading http://pypi-mirror.modal.local:5555/simple/tzdata/tzdata-2026.2-py2.py3-none-any.whl (349 kB) - ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 349.3/349.3 kB 256.1 MB/s eta 0:00:00 -Downloading http://pypi-mirror.modal.local:5555/simple/cffi/cffi-2.0.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (215 kB) - ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 215.6/215.6 kB 271.1 MB/s eta 0:00:00 -Downloading http://pypi-mirror.modal.local:5555/simple/mdurl/mdurl-0.1.2-py3-none-any.whl (10.0 kB) -Downloading http://pypi-mirror.modal.local:5555/simple/six/six-1.17.0-py2.py3-none-any.whl (11 kB) -Downloading http://pypi-mirror.modal.local:5555/simple/wcwidth/wcwidth-0.7.0-py3-none-any.whl (110 kB) - ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 110.8/110.8 kB 252.0 MB/s eta 0:00:00 -Downloading http://pypi-mirror.modal.local:5555/simple/pycparser/pycparser-3.0-py3-none-any.whl (48 kB) - ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 48.2/48.2 kB 235.8 MB/s eta 0:00:00 -Installing collected packages: py-cpuinfo, wcwidth, tzdata, typing-inspection, six, shellingham, rapidfuzz, pygments, pydantic-core, pycparser, pycountry, psutil, prometheus_client, mdurl, click, annotated-types, annotated-doc, python-dateutil, pydantic, prompt_toolkit, markdown-it-py, cffi, rich, questionary, pandas, cryptography, arrow, typer, joserfc, authlib, codecarbon -Successfully installed annotated-doc-0.0.4 annotated-types-0.7.0 arrow-1.4.0 authlib-1.7.2 cffi-2.0.0 click-8.4.0 codecarbon-3.2.7 cryptography-48.0.0 joserfc-1.6.5 markdown-it-py-4.2.0 mdurl-0.1.2 pandas-3.0.3 prometheus_client-0.25.0 prompt_toolkit-3.0.52 psutil-7.2.2 py-cpuinfo-9.0.0 pycountry-26.2.16 pycparser-3.0 pydantic-2.13.4 pydantic-core-2.46.4 pygments-2.20.0 python-dateutil-2.9.0.post0 questionary-2.1.1 rapidfuzz-3.14.5 rich-15.0.0 shellingham-1.5.4 six-1.17.0 typer-0.25.1 typing-inspection-0.4.2 tzdata-2026.2 wcwidth-0.7.0 - -[notice] A new release of pip is available: 24.0 -> 26.1.1 -[notice] To update, run: pip install --upgrade pip -Saving image... -Image saved, took 1.14s - -Built image im-HqRgnUnflxE8oQRhywMp4D in 14.22s - - -Building image im-BnlecuknJA8QM6WpMCGVmT - -=> Step 0: FROM base - -=> Step 1: ENV PYTHONPATH=/workspace - -=> Step 2: ENV PYTHONUNBUFFERED=1 -Saving image... -Image saved, took 602.25ms - -Built image im-BnlecuknJA8QM6WpMCGVmT in 3.23s - - +https://modal.com/apps/gabriel-nakajima-an/main/ap-vsSxbVNSQV79ZMlqcuvDGB ✓ Created objects. ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/submit.py ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/task.py @@ -168,14 +12,14 @@ Built image im-BnlecuknJA8QM6WpMCGVmT in 3.23s [modal] verifying NVML energy counter ... GPU: NVIDIA A100 80GB PCIe sampling idle power for 3s ... - idle: 52.7 W + idle: 55.0 W running 30s stress workload ... - duration: 36.7 s - energy delta: 8,335.0 J - avg power: 227.0 W + duration: 36.6 s + energy delta: 8,414.0 J + avg power: 230.1 W monotonic: True --- -{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 52.723183333333296, "stress_watts_avg": 226.95174796885868, "stress_energy_joules": 8335.027, "stress_duration_s": 36.72598724, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": []} +{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 55.01793220338981, "stress_watts_avg": 230.13301547418834, "stress_energy_joules": 8413.972, "stress_duration_s": 36.561342503000006, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": []} [modal] running submission (TEST_CHARS=60000 MAX_TRAIN_SECONDS=300.0 ACC_MIN=0.7) ... loading WikiText-103 from /data ... train chars: 540,095,682 @@ -183,13 +27,13 @@ loading WikiText-103 from /data ... train wall-clock cap: 300 s val accuracy floor : 0.7000 training submission /workspace/gpu_ngram_w31_k11.py ... -[codecarbon WARNING @ 07:06:46] Multiple instances of codecarbon are allowed to run at the same time. +[codecarbon WARNING @ 05:04:00] Multiple instances of codecarbon are allowed to run at the same time. [gpu_ngram_w3] starting build; max_order=11 D=0.5 [gpu_ngram_w3] encoded train: 541,096,898 bytes (0.4s) [gpu_ngram_w3] top order=11 unique pairs: 119,285,712 2.0s -[gpu_ngram_w3] ctx_len=10 ctxs=84,282,364 rows=119,285,712 12.8s -[gpu_ngram_w3] ctx_len=9 ctxs=54,720,376 rows=84,282,364 8.3s -[gpu_ngram_w3] ctx_len=8 ctxs=31,924,091 rows=54,720,376 4.9s +[gpu_ngram_w3] ctx_len=10 ctxs=84,282,364 rows=119,285,712 13.2s +[gpu_ngram_w3] ctx_len=9 ctxs=54,720,376 rows=84,282,364 9.2s +[gpu_ngram_w3] ctx_len=8 ctxs=31,924,091 rows=54,720,376 5.0s [gpu_ngram_w3] ctx_len=7 ctxs=16,284,921 rows=31,924,091 2.6s [gpu_ngram_w3] ctx_len=6 ctxs=7,016,442 rows=16,284,921 1.2s [gpu_ngram_w3] ctx_len=5 ctxs=2,438,281 rows=7,016,442 0.5s @@ -198,90 +42,90 @@ training submission /workspace/gpu_ngram_w31_k11.py ... [gpu_ngram_w3] ctx_len=2 ctxs=12,282 rows=122,882 0.0s [gpu_ngram_w3] ctx_len=1 ctxs=204 rows=12,282 0.0s [gpu_ngram_w3] ctx_len=0 ctxs=1 rows=204 0.0s -[gpu_ngram_w3] total build: 32.8s -training: 1,332.8 J duration=33.6s +[gpu_ngram_w3] total build: 34.2s +training: 1,612.2 J duration=34.9s evaluating on val split ... - eval 1,200/60,000 ( 2.0%) acc=0.6967 5689 char/s eta= 10s - eval 2,400/60,000 ( 4.0%) acc=0.6792 5867 char/s eta= 10s - eval 3,600/60,000 ( 6.0%) acc=0.6767 5955 char/s eta= 9s - eval 4,800/60,000 ( 8.0%) acc=0.6894 5997 char/s eta= 9s - eval 6,000/60,000 ( 10.0%) acc=0.6917 6031 char/s eta= 9s - eval 7,200/60,000 ( 12.0%) acc=0.6846 6057 char/s eta= 9s - eval 8,400/60,000 ( 14.0%) acc=0.6844 6077 char/s eta= 8s - eval 9,600/60,000 ( 16.0%) acc=0.6914 6081 char/s eta= 8s - eval 10,800/60,000 ( 18.0%) acc=0.7002 6079 char/s eta= 8s - eval 12,000/60,000 ( 20.0%) acc=0.7020 6085 char/s eta= 8s - eval 13,200/60,000 ( 22.0%) acc=0.7056 6085 char/s eta= 8s - eval 14,400/60,000 ( 24.0%) acc=0.7074 6091 char/s eta= 7s - eval 15,600/60,000 ( 26.0%) acc=0.7091 6094 char/s eta= 7s - eval 16,800/60,000 ( 28.0%) acc=0.7121 6099 char/s eta= 7s - eval 18,000/60,000 ( 30.0%) acc=0.7139 6102 char/s eta= 7s - eval 19,200/60,000 ( 32.0%) acc=0.7176 6101 char/s eta= 7s - eval 20,400/60,000 ( 34.0%) acc=0.7186 6101 char/s eta= 6s - eval 21,600/60,000 ( 36.0%) acc=0.7197 6105 char/s eta= 6s - eval 22,800/60,000 ( 38.0%) acc=0.7198 6105 char/s eta= 6s - eval 24,000/60,000 ( 40.0%) acc=0.7198 6108 char/s eta= 6s - eval 25,200/60,000 ( 42.0%) acc=0.7202 6109 char/s eta= 6s - eval 26,400/60,000 ( 44.0%) acc=0.7210 6111 char/s eta= 5s - eval 27,600/60,000 ( 46.0%) acc=0.7189 6114 char/s eta= 5s - eval 28,800/60,000 ( 48.0%) acc=0.7189 6120 char/s eta= 5s - eval 30,000/60,000 ( 50.0%) acc=0.7174 6125 char/s eta= 5s - eval 31,200/60,000 ( 52.0%) acc=0.7144 6131 char/s eta= 5s - eval 32,400/60,000 ( 54.0%) acc=0.7120 6138 char/s eta= 4s - eval 33,600/60,000 ( 56.0%) acc=0.7096 6144 char/s eta= 4s - eval 34,800/60,000 ( 58.0%) acc=0.7098 6146 char/s eta= 4s - eval 36,000/60,000 ( 60.0%) acc=0.7096 6146 char/s eta= 4s - eval 37,200/60,000 ( 62.0%) acc=0.7095 6146 char/s eta= 4s - eval 38,400/60,000 ( 64.0%) acc=0.7096 6145 char/s eta= 4s - eval 39,600/60,000 ( 66.0%) acc=0.7086 6147 char/s eta= 3s - eval 40,800/60,000 ( 68.0%) acc=0.7083 6148 char/s eta= 3s - eval 42,000/60,000 ( 70.0%) acc=0.7075 6148 char/s eta= 3s - eval 43,200/60,000 ( 72.0%) acc=0.7068 6148 char/s eta= 3s - eval 44,400/60,000 ( 74.0%) acc=0.7067 6148 char/s eta= 3s - eval 45,600/60,000 ( 76.0%) acc=0.7068 6148 char/s eta= 2s - eval 46,800/60,000 ( 78.0%) acc=0.7061 6148 char/s eta= 2s - eval 48,000/60,000 ( 80.0%) acc=0.7062 6148 char/s eta= 2s - eval 49,200/60,000 ( 82.0%) acc=0.7055 6149 char/s eta= 2s - eval 50,400/60,000 ( 84.0%) acc=0.7058 6149 char/s eta= 2s - eval 51,600/60,000 ( 86.0%) acc=0.7058 6150 char/s eta= 1s - eval 52,800/60,000 ( 88.0%) acc=0.7046 6157 char/s eta= 1s - eval 54,000/60,000 ( 90.0%) acc=0.7045 6157 char/s eta= 1s - eval 55,200/60,000 ( 92.0%) acc=0.7038 6159 char/s eta= 1s - eval 56,400/60,000 ( 94.0%) acc=0.7029 6160 char/s eta= 1s - eval 57,600/60,000 ( 96.0%) acc=0.7034 6160 char/s eta= 0s - eval 58,800/60,000 ( 98.0%) acc=0.7040 6160 char/s eta= 0s - eval 60,000/60,000 (100.0%) acc=0.7050 6161 char/s eta= 0s -chars=60,000 acc=0.7050 eval_duration=9.7s + eval 1,200/60,000 ( 2.0%) acc=0.6967 5720 char/s eta= 10s + eval 2,400/60,000 ( 4.0%) acc=0.6792 5938 char/s eta= 10s + eval 3,600/60,000 ( 6.0%) acc=0.6767 6036 char/s eta= 9s + eval 4,800/60,000 ( 8.0%) acc=0.6894 6078 char/s eta= 9s + eval 6,000/60,000 ( 10.0%) acc=0.6917 6119 char/s eta= 9s + eval 7,200/60,000 ( 12.0%) acc=0.6846 6153 char/s eta= 9s + eval 8,400/60,000 ( 14.0%) acc=0.6844 6179 char/s eta= 8s + eval 9,600/60,000 ( 16.0%) acc=0.6914 6187 char/s eta= 8s + eval 10,800/60,000 ( 18.0%) acc=0.7002 6194 char/s eta= 8s + eval 12,000/60,000 ( 20.0%) acc=0.7020 6199 char/s eta= 8s + eval 13,200/60,000 ( 22.0%) acc=0.7056 6200 char/s eta= 8s + eval 14,400/60,000 ( 24.0%) acc=0.7074 6204 char/s eta= 7s + eval 15,600/60,000 ( 26.0%) acc=0.7091 6204 char/s eta= 7s + eval 16,800/60,000 ( 28.0%) acc=0.7121 6206 char/s eta= 7s + eval 18,000/60,000 ( 30.0%) acc=0.7139 6207 char/s eta= 7s + eval 19,200/60,000 ( 32.0%) acc=0.7176 6206 char/s eta= 7s + eval 20,400/60,000 ( 34.0%) acc=0.7186 6207 char/s eta= 6s + eval 21,600/60,000 ( 36.0%) acc=0.7197 6212 char/s eta= 6s + eval 22,800/60,000 ( 38.0%) acc=0.7198 6213 char/s eta= 6s + eval 24,000/60,000 ( 40.0%) acc=0.7198 6216 char/s eta= 6s + eval 25,200/60,000 ( 42.0%) acc=0.7202 6218 char/s eta= 6s + eval 26,400/60,000 ( 44.0%) acc=0.7210 6220 char/s eta= 5s + eval 27,600/60,000 ( 46.0%) acc=0.7189 6222 char/s eta= 5s + eval 28,800/60,000 ( 48.0%) acc=0.7189 6228 char/s eta= 5s + eval 30,000/60,000 ( 50.0%) acc=0.7174 6230 char/s eta= 5s + eval 31,200/60,000 ( 52.0%) acc=0.7144 6235 char/s eta= 5s + eval 32,400/60,000 ( 54.0%) acc=0.7120 6239 char/s eta= 4s + eval 33,600/60,000 ( 56.0%) acc=0.7096 6243 char/s eta= 4s + eval 34,800/60,000 ( 58.0%) acc=0.7098 6244 char/s eta= 4s + eval 36,000/60,000 ( 60.0%) acc=0.7096 6244 char/s eta= 4s + eval 37,200/60,000 ( 62.0%) acc=0.7095 6244 char/s eta= 4s + eval 38,400/60,000 ( 64.0%) acc=0.7096 6242 char/s eta= 3s + eval 39,600/60,000 ( 66.0%) acc=0.7086 6243 char/s eta= 3s + eval 40,800/60,000 ( 68.0%) acc=0.7083 6243 char/s eta= 3s + eval 42,000/60,000 ( 70.0%) acc=0.7075 6242 char/s eta= 3s + eval 43,200/60,000 ( 72.0%) acc=0.7068 6242 char/s eta= 3s + eval 44,400/60,000 ( 74.0%) acc=0.7067 6240 char/s eta= 2s + eval 45,600/60,000 ( 76.0%) acc=0.7068 6240 char/s eta= 2s + eval 46,800/60,000 ( 78.0%) acc=0.7061 6239 char/s eta= 2s + eval 48,000/60,000 ( 80.0%) acc=0.7062 6237 char/s eta= 2s + eval 49,200/60,000 ( 82.0%) acc=0.7055 6236 char/s eta= 2s + eval 50,400/60,000 ( 84.0%) acc=0.7058 6236 char/s eta= 2s + eval 51,600/60,000 ( 86.0%) acc=0.7058 6237 char/s eta= 1s + eval 52,800/60,000 ( 88.0%) acc=0.7046 6243 char/s eta= 1s + eval 54,000/60,000 ( 90.0%) acc=0.7045 6244 char/s eta= 1s + eval 55,200/60,000 ( 92.0%) acc=0.7038 6245 char/s eta= 1s + eval 56,400/60,000 ( 94.0%) acc=0.7029 6246 char/s eta= 1s + eval 57,600/60,000 ( 96.0%) acc=0.7034 6246 char/s eta= 0s + eval 58,800/60,000 ( 98.0%) acc=0.7040 6247 char/s eta= 0s + eval 60,000/60,000 (100.0%) acc=0.7050 6247 char/s eta= 0s +chars=60,000 acc=0.7050 eval_duration=9.6s --- submission : gpu_ngram_w31_k11 -training energy (J): 1,332.8 -training duration : 33.6s +training energy (J): 1,612.2 +training duration : 34.9s val char-accuracy : 0.7050 val chars : 60,000 wrote /tmp/result.json Stopping app - local entrypoint completed. ✓ App completed. View run at -https://modal.com/apps/gabriel-nakajima-an/main/ap-Xr2U1qCw3wvtCqAVizWeyd +https://modal.com/apps/gabriel-nakajima-an/main/ap-vsSxbVNSQV79ZMlqcuvDGB # final result { "submission": "gpu_ngram_w31_k11", - "training_energy_J": 1332.8045820499997, - "training_duration_s": 33.551668359000004, - "cpu_energy_J": 1420.9300898524978, - "total_energy_J": 2753.734671902497, + "training_energy_J": 1612.2052069500003, + "training_duration_s": 34.944975860999996, + "cpu_energy_J": 1479.7634497275012, + "total_energy_J": 3091.9686566775017, "val_char_accuracy": 0.7050333333333333, "val_chars": 60000, "gpu_name": "NVIDIA A100 80GB PCIe", - "date_utc": "2026-05-20T07:07:33Z", + "date_utc": "2026-05-21T05:04:49Z", "_nvml": { "nvml_available": true, "energy_counter_supported": true, "monotonic": true, - "idle_watts": 52.723183333333296, - "stress_watts_avg": 226.95174796885868, - "stress_energy_joules": 8335.027, - "stress_duration_s": 36.72598724, + "idle_watts": 55.01793220338981, + "stress_watts_avg": 230.13301547418834, + "stress_energy_joules": 8413.972, + "stress_duration_s": 36.561342503000006, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": [] }, diff --git a/submissions/lwta_k2/nvml.json b/submissions/lwta_k2/nvml.json index d2be759..c341a02 100644 --- a/submissions/lwta_k2/nvml.json +++ b/submissions/lwta_k2/nvml.json @@ -2,10 +2,10 @@ "nvml_available": true, "energy_counter_supported": true, "monotonic": true, - "idle_watts": 53.31471666666669, - "stress_watts_avg": 235.04048546725176, - "stress_energy_joules": 8621.146, - "stress_duration_s": 36.679408583, + "idle_watts": 52.363366666666685, + "stress_watts_avg": 227.75408348564008, + "stress_energy_joules": 8575.362, + "stress_duration_s": 37.651847417, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": [] } diff --git a/submissions/lwta_k2/result.json b/submissions/lwta_k2/result.json index e675e37..6a5ae25 100644 --- a/submissions/lwta_k2/result.json +++ b/submissions/lwta_k2/result.json @@ -1,19 +1,21 @@ { "submission": "lwta_k2", - "training_energy_J": 46131.8433434, - "training_duration_s": 222.46477313199998, - "val_char_accuracy": 0.7145833333333333, + "training_energy_J": 44582.984815150005, + "training_duration_s": 237.14902369700002, + "cpu_energy_J": 10030.720041672495, + "total_energy_J": 54613.7048568225, + "val_char_accuracy": 0.7145333333333334, "val_chars": 60000, "gpu_name": "NVIDIA A100 80GB PCIe", - "date_utc": "2026-05-18T18:04:06Z", + "date_utc": "2026-05-20T22:57:38Z", "_nvml": { "nvml_available": true, "energy_counter_supported": true, "monotonic": true, - "idle_watts": 53.31471666666669, - "stress_watts_avg": 235.04048546725176, - "stress_energy_joules": 8621.146, - "stress_duration_s": 36.679408583, + "idle_watts": 52.363366666666685, + "stress_watts_avg": 227.75408348564008, + "stress_energy_joules": 8575.362, + "stress_duration_s": 37.651847417, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": [] }, diff --git a/submissions/lwta_k2/run.log b/submissions/lwta_k2/run.log index f7bfcac..d4788c8 100644 --- a/submissions/lwta_k2/run.log +++ b/submissions/lwta_k2/run.log @@ -1,141 +1,140 @@ -# wikitext submit.py log — lwta_k2 — 2026-05-18T17:52:37+00:00Z +# wikitext submit.py log — lwta_k2 — 2026-05-20T22:44:26+00:00Z [modal] launching A100-80GB ... ✓ Initialized. View run at -https://modal.com/apps/ab-10/main/ap-XpG4oyoioa8EfEnrW23Vzh +https://modal.com/apps/gabriel-nakajima-an/main/ap-cyrfdrD3yrTAPYiz98tDZZ ✓ Created objects. -├── 🔨 Created mount /home/seneca/wikitext/submit.py -├── 🔨 Created mount /home/seneca/wikitext/task.py -├── 🔨 Created mount /home/seneca/wikitext/verify_nvml.py -├── 🔨 Created mount /home/seneca/wikitext/run_eval.py -├── 🔨 Created mount /home/seneca/wikitext/wikitext.py +├── 🔨 Created mount /Users/naka/src/sutro/wikitext/submit.py +├── 🔨 Created mount /Users/naka/src/sutro/wikitext/task.py +├── 🔨 Created mount /Users/naka/src/sutro/wikitext/verify_nvml.py +├── 🔨 Created mount /Users/naka/src/sutro/wikitext/run_eval.py +├── 🔨 Created mount /Users/naka/src/sutro/wikitext/wikitext.py └── 🔨 Created function run_submission. [modal] verifying NVML energy counter ... GPU: NVIDIA A100 80GB PCIe sampling idle power for 3s ... - idle: 53.3 W + idle: 52.4 W running 30s stress workload ... -/usr/local/lib/python3.11/site-packages/torch/_subclasses/functional_tensor.py:295: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.) - cpu = _conversion_method_template(device=torch.device("cpu")) - duration: 36.7 s - energy delta: 8,621.1 J - avg power: 235.0 W + duration: 37.7 s + energy delta: 8,575.4 J + avg power: 227.8 W monotonic: True --- -{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 53.31471666666669, "stress_watts_avg": 235.04048546725176, "stress_energy_joules": 8621.146, "stress_duration_s": 36.679408583, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": []} +{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 52.363366666666685, "stress_watts_avg": 227.75408348564008, "stress_energy_joules": 8575.362, "stress_duration_s": 37.651847417, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": []} [modal] running submission (TEST_CHARS=60000 MAX_TRAIN_SECONDS=300.0 ACC_MIN=0.7) ... loading WikiText-103 from /data ... train chars: 540,095,682 val chars: 60,000 (scored, gated by --acc-min) train wall-clock cap: 300 s val accuracy floor : 0.7000 -/usr/local/lib/python3.11/site-packages/torch/_subclasses/functional_tensor.py:295: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.) - cpu = _conversion_method_template(device=torch.device("cpu")) training submission /workspace/lwta_k2.py ... +[codecarbon WARNING @ 22:45:17] Multiple instances of codecarbon are allowed to run at the same time. [lwta_k2] 10.84M params cfg=TrainConfig(d=384 L=6 H=6 bs=32 T=1024 steps=2150 lwta_k=2) [lwta_k2] step 0/2150 loss 5.5452 elapsed 1s -[lwta_k2] step 100/2150 loss 1.6783 elapsed 11s -[lwta_k2] step 200/2150 loss 1.4527 elapsed 21s -[lwta_k2] step 300/2150 loss 1.4018 elapsed 31s -[lwta_k2] step 400/2150 loss 1.2794 elapsed 41s -[lwta_k2] step 500/2150 loss 1.2397 elapsed 51s -[lwta_k2] step 600/2150 loss 1.1819 elapsed 61s -[lwta_k2] step 700/2150 loss 1.1676 elapsed 71s -[lwta_k2] step 800/2150 loss 1.1465 elapsed 82s -[lwta_k2] step 900/2150 loss 1.1261 elapsed 92s -[lwta_k2] step 1000/2150 loss 1.1568 elapsed 102s -[lwta_k2] step 1100/2150 loss 1.0974 elapsed 112s -[lwta_k2] step 1200/2150 loss 1.1161 elapsed 123s -[lwta_k2] step 1300/2150 loss 1.0829 elapsed 133s -[lwta_k2] step 1400/2150 loss 1.0580 elapsed 143s -[lwta_k2] step 1500/2150 loss 1.0709 elapsed 154s -[lwta_k2] step 1600/2150 loss 1.0651 elapsed 164s -[lwta_k2] step 1700/2150 loss 1.0906 elapsed 174s -[lwta_k2] step 1800/2150 loss 1.0662 elapsed 184s -[lwta_k2] step 1900/2150 loss 1.0359 elapsed 195s -[lwta_k2] step 2000/2150 loss 0.9915 elapsed 205s -[lwta_k2] step 2100/2150 loss 1.0221 elapsed 215s -[lwta_k2] step 2149/2150 loss 1.0155 elapsed 220s -training: 46,131.8 J duration=222.5s +[lwta_k2] step 100/2150 loss 1.6789 elapsed 12s +[lwta_k2] step 200/2150 loss 1.4378 elapsed 23s +[lwta_k2] step 300/2150 loss 1.3555 elapsed 34s +[lwta_k2] step 400/2150 loss 1.2876 elapsed 45s +[lwta_k2] step 500/2150 loss 1.2187 elapsed 55s +[lwta_k2] step 600/2150 loss 1.1854 elapsed 66s +[lwta_k2] step 700/2150 loss 1.1995 elapsed 77s +[lwta_k2] step 800/2150 loss 1.1438 elapsed 88s +[lwta_k2] step 900/2150 loss 1.1942 elapsed 99s +[lwta_k2] step 1000/2150 loss 1.1058 elapsed 110s +[lwta_k2] step 1100/2150 loss 1.1240 elapsed 120s +[lwta_k2] step 1200/2150 loss 1.0758 elapsed 131s +[lwta_k2] step 1300/2150 loss 1.0643 elapsed 142s +[lwta_k2] step 1400/2150 loss 1.0712 elapsed 153s +[lwta_k2] step 1500/2150 loss 1.0610 elapsed 164s +[lwta_k2] step 1600/2150 loss 1.0715 elapsed 175s +[lwta_k2] step 1700/2150 loss 0.9912 elapsed 185s +[lwta_k2] step 1800/2150 loss 1.0097 elapsed 196s +[lwta_k2] step 1900/2150 loss 1.0175 elapsed 207s +[lwta_k2] step 2000/2150 loss 1.0429 elapsed 218s +[lwta_k2] step 2100/2150 loss 1.0159 elapsed 228s +[lwta_k2] step 2149/2150 loss 1.0382 elapsed 234s +training: 44,583.0 J duration=237.1s evaluating on val split ... - eval 1,200/60,000 ( 2.0%) acc=0.7125 142 char/s eta= 413s - eval 2,400/60,000 ( 4.0%) acc=0.7004 142 char/s eta= 405s - eval 3,600/60,000 ( 6.0%) acc=0.6981 143 char/s eta= 395s - eval 4,800/60,000 ( 8.0%) acc=0.7075 143 char/s eta= 385s - eval 6,000/60,000 ( 10.0%) acc=0.6957 143 char/s eta= 377s - eval 7,200/60,000 ( 12.0%) acc=0.6915 143 char/s eta= 368s - eval 8,400/60,000 ( 14.0%) acc=0.6918 143 char/s eta= 360s - eval 9,600/60,000 ( 16.0%) acc=0.6957 143 char/s eta= 352s - eval 10,800/60,000 ( 18.0%) acc=0.6971 143 char/s eta= 344s - eval 12,000/60,000 ( 20.0%) acc=0.7002 143 char/s eta= 335s - eval 13,200/60,000 ( 22.0%) acc=0.7043 143 char/s eta= 327s - eval 14,400/60,000 ( 24.0%) acc=0.7053 143 char/s eta= 319s - eval 15,600/60,000 ( 26.0%) acc=0.7074 143 char/s eta= 310s - eval 16,800/60,000 ( 28.0%) acc=0.7101 143 char/s eta= 302s - eval 18,000/60,000 ( 30.0%) acc=0.7083 143 char/s eta= 294s - eval 19,200/60,000 ( 32.0%) acc=0.7095 143 char/s eta= 285s - eval 20,400/60,000 ( 34.0%) acc=0.7108 143 char/s eta= 277s - eval 21,600/60,000 ( 36.0%) acc=0.7105 143 char/s eta= 268s - eval 22,800/60,000 ( 38.0%) acc=0.7105 143 char/s eta= 260s - eval 24,000/60,000 ( 40.0%) acc=0.7109 143 char/s eta= 251s - eval 25,200/60,000 ( 42.0%) acc=0.7118 143 char/s eta= 243s - eval 26,400/60,000 ( 44.0%) acc=0.7130 143 char/s eta= 235s - eval 27,600/60,000 ( 46.0%) acc=0.7142 143 char/s eta= 226s - eval 28,800/60,000 ( 48.0%) acc=0.7142 143 char/s eta= 218s - eval 30,000/60,000 ( 50.0%) acc=0.7132 143 char/s eta= 210s - eval 31,200/60,000 ( 52.0%) acc=0.7117 143 char/s eta= 201s - eval 32,400/60,000 ( 54.0%) acc=0.7111 143 char/s eta= 193s - eval 33,600/60,000 ( 56.0%) acc=0.7092 143 char/s eta= 184s - eval 34,800/60,000 ( 58.0%) acc=0.7080 143 char/s eta= 176s - eval 36,000/60,000 ( 60.0%) acc=0.7071 143 char/s eta= 168s - eval 37,200/60,000 ( 62.0%) acc=0.7074 143 char/s eta= 159s - eval 38,400/60,000 ( 64.0%) acc=0.7079 143 char/s eta= 151s - eval 39,600/60,000 ( 66.0%) acc=0.7082 143 char/s eta= 143s - eval 40,800/60,000 ( 68.0%) acc=0.7075 143 char/s eta= 134s - eval 42,000/60,000 ( 70.0%) acc=0.7078 143 char/s eta= 126s - eval 43,200/60,000 ( 72.0%) acc=0.7083 143 char/s eta= 117s - eval 44,400/60,000 ( 74.0%) acc=0.7087 143 char/s eta= 109s - eval 45,600/60,000 ( 76.0%) acc=0.7088 143 char/s eta= 101s - eval 46,800/60,000 ( 78.0%) acc=0.7085 143 char/s eta= 92s - eval 48,000/60,000 ( 80.0%) acc=0.7087 143 char/s eta= 84s - eval 49,200/60,000 ( 82.0%) acc=0.7092 143 char/s eta= 75s - eval 50,400/60,000 ( 84.0%) acc=0.7110 143 char/s eta= 67s - eval 51,600/60,000 ( 86.0%) acc=0.7117 143 char/s eta= 59s - eval 52,800/60,000 ( 88.0%) acc=0.7129 144 char/s eta= 50s - eval 54,000/60,000 ( 90.0%) acc=0.7133 144 char/s eta= 42s - eval 55,200/60,000 ( 92.0%) acc=0.7128 144 char/s eta= 33s - eval 56,400/60,000 ( 94.0%) acc=0.7136 144 char/s eta= 25s - eval 57,600/60,000 ( 96.0%) acc=0.7140 144 char/s eta= 17s - eval 58,800/60,000 ( 98.0%) acc=0.7146 144 char/s eta= 8s - eval 60,000/60,000 (100.0%) acc=0.7146 144 char/s eta= 0s -chars=60,000 acc=0.7146 eval_duration=417.8s + eval 1,200/60,000 ( 2.0%) acc=0.7075 124 char/s eta= 475s + eval 2,400/60,000 ( 4.0%) acc=0.6992 122 char/s eta= 474s + eval 3,600/60,000 ( 6.0%) acc=0.6978 121 char/s eta= 465s + eval 4,800/60,000 ( 8.0%) acc=0.7063 122 char/s eta= 454s + eval 6,000/60,000 ( 10.0%) acc=0.6957 122 char/s eta= 443s + eval 7,200/60,000 ( 12.0%) acc=0.6922 122 char/s eta= 432s + eval 8,400/60,000 ( 14.0%) acc=0.6908 122 char/s eta= 423s + eval 9,600/60,000 ( 16.0%) acc=0.6931 122 char/s eta= 414s + eval 10,800/60,000 ( 18.0%) acc=0.6941 121 char/s eta= 406s + eval 12,000/60,000 ( 20.0%) acc=0.6957 121 char/s eta= 396s + eval 13,200/60,000 ( 22.0%) acc=0.7012 121 char/s eta= 386s + eval 14,400/60,000 ( 24.0%) acc=0.7022 121 char/s eta= 377s + eval 15,600/60,000 ( 26.0%) acc=0.7049 121 char/s eta= 366s + eval 16,800/60,000 ( 28.0%) acc=0.7071 121 char/s eta= 357s + eval 18,000/60,000 ( 30.0%) acc=0.7055 121 char/s eta= 348s + eval 19,200/60,000 ( 32.0%) acc=0.7071 121 char/s eta= 338s + eval 20,400/60,000 ( 34.0%) acc=0.7088 120 char/s eta= 329s + eval 21,600/60,000 ( 36.0%) acc=0.7083 120 char/s eta= 319s + eval 22,800/60,000 ( 38.0%) acc=0.7089 120 char/s eta= 310s + eval 24,000/60,000 ( 40.0%) acc=0.7095 120 char/s eta= 300s + eval 25,200/60,000 ( 42.0%) acc=0.7110 120 char/s eta= 290s + eval 26,400/60,000 ( 44.0%) acc=0.7123 120 char/s eta= 280s + eval 27,600/60,000 ( 46.0%) acc=0.7135 120 char/s eta= 270s + eval 28,800/60,000 ( 48.0%) acc=0.7137 120 char/s eta= 260s + eval 30,000/60,000 ( 50.0%) acc=0.7130 120 char/s eta= 250s + eval 31,200/60,000 ( 52.0%) acc=0.7113 120 char/s eta= 241s + eval 32,400/60,000 ( 54.0%) acc=0.7112 120 char/s eta= 231s + eval 33,600/60,000 ( 56.0%) acc=0.7095 119 char/s eta= 221s + eval 34,800/60,000 ( 58.0%) acc=0.7082 119 char/s eta= 211s + eval 36,000/60,000 ( 60.0%) acc=0.7072 119 char/s eta= 201s + eval 37,200/60,000 ( 62.0%) acc=0.7073 119 char/s eta= 191s + eval 38,400/60,000 ( 64.0%) acc=0.7077 119 char/s eta= 181s + eval 39,600/60,000 ( 66.0%) acc=0.7079 119 char/s eta= 171s + eval 40,800/60,000 ( 68.0%) acc=0.7076 119 char/s eta= 161s + eval 42,000/60,000 ( 70.0%) acc=0.7074 119 char/s eta= 151s + eval 43,200/60,000 ( 72.0%) acc=0.7079 119 char/s eta= 141s + eval 44,400/60,000 ( 74.0%) acc=0.7084 119 char/s eta= 131s + eval 45,600/60,000 ( 76.0%) acc=0.7086 119 char/s eta= 121s + eval 46,800/60,000 ( 78.0%) acc=0.7082 119 char/s eta= 111s + eval 48,000/60,000 ( 80.0%) acc=0.7088 120 char/s eta= 100s + eval 49,200/60,000 ( 82.0%) acc=0.7096 120 char/s eta= 90s + eval 50,400/60,000 ( 84.0%) acc=0.7111 120 char/s eta= 80s + eval 51,600/60,000 ( 86.0%) acc=0.7117 120 char/s eta= 70s + eval 52,800/60,000 ( 88.0%) acc=0.7130 120 char/s eta= 60s + eval 54,000/60,000 ( 90.0%) acc=0.7134 120 char/s eta= 50s + eval 55,200/60,000 ( 92.0%) acc=0.7130 120 char/s eta= 40s + eval 56,400/60,000 ( 94.0%) acc=0.7137 120 char/s eta= 30s + eval 57,600/60,000 ( 96.0%) acc=0.7141 120 char/s eta= 20s + eval 58,800/60,000 ( 98.0%) acc=0.7146 120 char/s eta= 10s + eval 60,000/60,000 (100.0%) acc=0.7145 120 char/s eta= 0s +chars=60,000 acc=0.7145 eval_duration=498.9s --- submission : lwta_k2 -training energy (J): 46,131.8 -training duration : 222.5s -val char-accuracy : 0.7146 +training energy (J): 44,583.0 +training duration : 237.1s +val char-accuracy : 0.7145 val chars : 60,000 wrote /tmp/result.json Stopping app - local entrypoint completed. ✓ App completed. View run at -https://modal.com/apps/ab-10/main/ap-XpG4oyoioa8EfEnrW23Vzh +https://modal.com/apps/gabriel-nakajima-an/main/ap-cyrfdrD3yrTAPYiz98tDZZ # final result { "submission": "lwta_k2", - "training_energy_J": 46131.8433434, - "training_duration_s": 222.46477313199998, - "val_char_accuracy": 0.7145833333333333, + "training_energy_J": 44582.984815150005, + "training_duration_s": 237.14902369700002, + "cpu_energy_J": 10030.720041672495, + "total_energy_J": 54613.7048568225, + "val_char_accuracy": 0.7145333333333334, "val_chars": 60000, "gpu_name": "NVIDIA A100 80GB PCIe", - "date_utc": "2026-05-18T18:04:06Z", + "date_utc": "2026-05-20T22:57:38Z", "_nvml": { "nvml_available": true, "energy_counter_supported": true, "monotonic": true, - "idle_watts": 53.31471666666669, - "stress_watts_avg": 235.04048546725176, - "stress_energy_joules": 8621.146, - "stress_duration_s": 36.679408583, + "idle_watts": 52.363366666666685, + "stress_watts_avg": 227.75408348564008, + "stress_energy_joules": 8575.362, + "stress_duration_s": 37.651847417, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": [] }, diff --git a/submissions/lwta_k4/nvml.json b/submissions/lwta_k4/nvml.json index f273c88..c15557c 100644 --- a/submissions/lwta_k4/nvml.json +++ b/submissions/lwta_k4/nvml.json @@ -2,10 +2,10 @@ "nvml_available": true, "energy_counter_supported": true, "monotonic": true, - "idle_watts": 52.98316666666664, - "stress_watts_avg": 227.19281000664682, - "stress_energy_joules": 8522.132, - "stress_duration_s": 37.510570866, + "idle_watts": 53.112000000000016, + "stress_watts_avg": 229.4340394814706, + "stress_energy_joules": 8404.0, + "stress_duration_s": 36.629263988, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": [] } diff --git a/submissions/lwta_k4/result.json b/submissions/lwta_k4/result.json index caffb8c..cc235c1 100644 --- a/submissions/lwta_k4/result.json +++ b/submissions/lwta_k4/result.json @@ -1,19 +1,21 @@ { "submission": "lwta_k4", - "training_energy_J": 46222.22882105, - "training_duration_s": 236.750003579, - "val_char_accuracy": 0.72375, + "training_energy_J": 44328.7028316, + "training_duration_s": 221.249343368, + "cpu_energy_J": 9354.22259884751, + "total_energy_J": 53682.92543044751, + "val_char_accuracy": 0.72455, "val_chars": 60000, "gpu_name": "NVIDIA A100 80GB PCIe", - "date_utc": "2026-05-18T18:06:13Z", + "date_utc": "2026-05-20T22:55:41Z", "_nvml": { "nvml_available": true, "energy_counter_supported": true, "monotonic": true, - "idle_watts": 52.98316666666664, - "stress_watts_avg": 227.19281000664682, - "stress_energy_joules": 8522.132, - "stress_duration_s": 37.510570866, + "idle_watts": 53.112000000000016, + "stress_watts_avg": 229.4340394814706, + "stress_energy_joules": 8404.0, + "stress_duration_s": 36.629263988, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": [] }, diff --git a/submissions/lwta_k4/run.log b/submissions/lwta_k4/run.log index 8e70573..801d461 100644 --- a/submissions/lwta_k4/run.log +++ b/submissions/lwta_k4/run.log @@ -1,141 +1,140 @@ -# wikitext submit.py log — lwta_k4 — 2026-05-18T17:52:38+00:00Z +# wikitext submit.py log — lwta_k4 — 2026-05-20T22:44:26+00:00Z [modal] launching A100-80GB ... ✓ Initialized. View run at -https://modal.com/apps/ab-10/main/ap-rRKkJJfugmtJNujFsbVrP3 +https://modal.com/apps/gabriel-nakajima-an/main/ap-hyBCSy0XG5jwP218GI6bNc ✓ Created objects. -├── 🔨 Created mount /home/seneca/wikitext/submit.py -├── 🔨 Created mount /home/seneca/wikitext/task.py -├── 🔨 Created mount /home/seneca/wikitext/verify_nvml.py -├── 🔨 Created mount /home/seneca/wikitext/run_eval.py -├── 🔨 Created mount /home/seneca/wikitext/wikitext.py +├── 🔨 Created mount /Users/naka/src/sutro/wikitext/submit.py +├── 🔨 Created mount /Users/naka/src/sutro/wikitext/task.py +├── 🔨 Created mount /Users/naka/src/sutro/wikitext/verify_nvml.py +├── 🔨 Created mount /Users/naka/src/sutro/wikitext/run_eval.py +├── 🔨 Created mount /Users/naka/src/sutro/wikitext/wikitext.py └── 🔨 Created function run_submission. [modal] verifying NVML energy counter ... GPU: NVIDIA A100 80GB PCIe sampling idle power for 3s ... - idle: 53.0 W + idle: 53.1 W running 30s stress workload ... -/usr/local/lib/python3.11/site-packages/torch/_subclasses/functional_tensor.py:295: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.) - cpu = _conversion_method_template(device=torch.device("cpu")) - duration: 37.5 s - energy delta: 8,522.1 J - avg power: 227.2 W + duration: 36.6 s + energy delta: 8,404.0 J + avg power: 229.4 W monotonic: True --- -{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 52.98316666666664, "stress_watts_avg": 227.19281000664682, "stress_energy_joules": 8522.132, "stress_duration_s": 37.510570866, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": []} +{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 53.112000000000016, "stress_watts_avg": 229.4340394814706, "stress_energy_joules": 8404.0, "stress_duration_s": 36.629263988, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": []} [modal] running submission (TEST_CHARS=60000 MAX_TRAIN_SECONDS=300.0 ACC_MIN=0.7) ... loading WikiText-103 from /data ... train chars: 540,095,682 val chars: 60,000 (scored, gated by --acc-min) train wall-clock cap: 300 s val accuracy floor : 0.7000 -/usr/local/lib/python3.11/site-packages/torch/_subclasses/functional_tensor.py:295: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.) - cpu = _conversion_method_template(device=torch.device("cpu")) training submission /workspace/lwta_k4.py ... +[codecarbon WARNING @ 22:45:15] Multiple instances of codecarbon are allowed to run at the same time. [lwta_k4] 10.84M params cfg=TrainConfig(d=384 L=6 H=6 bs=32 T=1024 steps=2150 lwta_k=4) [lwta_k4] step 0/2150 loss 5.5452 elapsed 1s -[lwta_k4] step 100/2150 loss 1.6675 elapsed 12s -[lwta_k4] step 200/2150 loss 1.4276 elapsed 23s -[lwta_k4] step 300/2150 loss 1.2935 elapsed 34s -[lwta_k4] step 400/2150 loss 1.2449 elapsed 45s -[lwta_k4] step 500/2150 loss 1.1959 elapsed 55s -[lwta_k4] step 600/2150 loss 1.2452 elapsed 66s -[lwta_k4] step 700/2150 loss 1.1693 elapsed 77s -[lwta_k4] step 800/2150 loss 1.1698 elapsed 88s -[lwta_k4] step 900/2150 loss 1.1456 elapsed 99s -[lwta_k4] step 1000/2150 loss 1.0783 elapsed 109s -[lwta_k4] step 1100/2150 loss 1.1223 elapsed 120s -[lwta_k4] step 1200/2150 loss 1.0742 elapsed 131s -[lwta_k4] step 1300/2150 loss 1.0665 elapsed 142s -[lwta_k4] step 1400/2150 loss 1.0240 elapsed 153s -[lwta_k4] step 1500/2150 loss 1.0439 elapsed 163s -[lwta_k4] step 1600/2150 loss 1.0516 elapsed 174s -[lwta_k4] step 1700/2150 loss 1.0211 elapsed 185s -[lwta_k4] step 1800/2150 loss 1.0123 elapsed 196s -[lwta_k4] step 1900/2150 loss 1.0387 elapsed 207s -[lwta_k4] step 2000/2150 loss 0.9838 elapsed 217s -[lwta_k4] step 2100/2150 loss 0.9776 elapsed 228s -[lwta_k4] step 2149/2150 loss 0.9949 elapsed 234s -training: 46,222.2 J duration=236.8s +[lwta_k4] step 100/2150 loss 1.6542 elapsed 11s +[lwta_k4] step 200/2150 loss 1.4573 elapsed 21s +[lwta_k4] step 300/2150 loss 1.3381 elapsed 31s +[lwta_k4] step 400/2150 loss 1.2373 elapsed 41s +[lwta_k4] step 500/2150 loss 1.2093 elapsed 51s +[lwta_k4] step 600/2150 loss 1.1910 elapsed 62s +[lwta_k4] step 700/2150 loss 1.1619 elapsed 72s +[lwta_k4] step 800/2150 loss 1.1566 elapsed 82s +[lwta_k4] step 900/2150 loss 1.1024 elapsed 92s +[lwta_k4] step 1000/2150 loss 1.1438 elapsed 102s +[lwta_k4] step 1100/2150 loss 1.0946 elapsed 112s +[lwta_k4] step 1200/2150 loss 1.1067 elapsed 122s +[lwta_k4] step 1300/2150 loss 1.0719 elapsed 133s +[lwta_k4] step 1400/2150 loss 1.0670 elapsed 143s +[lwta_k4] step 1500/2150 loss 1.0598 elapsed 153s +[lwta_k4] step 1600/2150 loss 1.0182 elapsed 163s +[lwta_k4] step 1700/2150 loss 1.0992 elapsed 173s +[lwta_k4] step 1800/2150 loss 1.0282 elapsed 183s +[lwta_k4] step 1900/2150 loss 1.0147 elapsed 194s +[lwta_k4] step 2000/2150 loss 1.0158 elapsed 204s +[lwta_k4] step 2100/2150 loss 0.9868 elapsed 214s +[lwta_k4] step 2149/2150 loss 0.9310 elapsed 219s +training: 44,328.7 J duration=221.2s evaluating on val split ... - eval 1,200/60,000 ( 2.0%) acc=0.7292 114 char/s eta= 517s - eval 2,400/60,000 ( 4.0%) acc=0.7142 114 char/s eta= 507s - eval 3,600/60,000 ( 6.0%) acc=0.7114 114 char/s eta= 497s - eval 4,800/60,000 ( 8.0%) acc=0.7196 114 char/s eta= 486s - eval 6,000/60,000 ( 10.0%) acc=0.7073 114 char/s eta= 475s - eval 7,200/60,000 ( 12.0%) acc=0.7024 114 char/s eta= 464s - eval 8,400/60,000 ( 14.0%) acc=0.7008 114 char/s eta= 454s - eval 9,600/60,000 ( 16.0%) acc=0.7052 114 char/s eta= 443s - eval 10,800/60,000 ( 18.0%) acc=0.7071 114 char/s eta= 432s - eval 12,000/60,000 ( 20.0%) acc=0.7086 114 char/s eta= 422s - eval 13,200/60,000 ( 22.0%) acc=0.7137 114 char/s eta= 411s - eval 14,400/60,000 ( 24.0%) acc=0.7153 114 char/s eta= 401s - eval 15,600/60,000 ( 26.0%) acc=0.7179 114 char/s eta= 390s - eval 16,800/60,000 ( 28.0%) acc=0.7202 114 char/s eta= 379s - eval 18,000/60,000 ( 30.0%) acc=0.7189 114 char/s eta= 368s - eval 19,200/60,000 ( 32.0%) acc=0.7204 114 char/s eta= 358s - eval 20,400/60,000 ( 34.0%) acc=0.7217 114 char/s eta= 347s - eval 21,600/60,000 ( 36.0%) acc=0.7214 114 char/s eta= 337s - eval 22,800/60,000 ( 38.0%) acc=0.7223 114 char/s eta= 327s - eval 24,000/60,000 ( 40.0%) acc=0.7229 114 char/s eta= 316s - eval 25,200/60,000 ( 42.0%) acc=0.7240 114 char/s eta= 306s - eval 26,400/60,000 ( 44.0%) acc=0.7248 114 char/s eta= 295s - eval 27,600/60,000 ( 46.0%) acc=0.7257 114 char/s eta= 285s - eval 28,800/60,000 ( 48.0%) acc=0.7258 114 char/s eta= 274s - eval 30,000/60,000 ( 50.0%) acc=0.7252 114 char/s eta= 263s - eval 31,200/60,000 ( 52.0%) acc=0.7232 114 char/s eta= 253s - eval 32,400/60,000 ( 54.0%) acc=0.7228 114 char/s eta= 242s - eval 33,600/60,000 ( 56.0%) acc=0.7209 114 char/s eta= 232s - eval 34,800/60,000 ( 58.0%) acc=0.7201 114 char/s eta= 221s - eval 36,000/60,000 ( 60.0%) acc=0.7196 114 char/s eta= 211s - eval 37,200/60,000 ( 62.0%) acc=0.7196 114 char/s eta= 200s - eval 38,400/60,000 ( 64.0%) acc=0.7196 114 char/s eta= 190s - eval 39,600/60,000 ( 66.0%) acc=0.7194 114 char/s eta= 179s - eval 40,800/60,000 ( 68.0%) acc=0.7189 114 char/s eta= 169s - eval 42,000/60,000 ( 70.0%) acc=0.7187 114 char/s eta= 158s - eval 43,200/60,000 ( 72.0%) acc=0.7188 114 char/s eta= 148s - eval 44,400/60,000 ( 74.0%) acc=0.7185 114 char/s eta= 137s - eval 45,600/60,000 ( 76.0%) acc=0.7185 114 char/s eta= 126s - eval 46,800/60,000 ( 78.0%) acc=0.7184 114 char/s eta= 116s - eval 48,000/60,000 ( 80.0%) acc=0.7187 114 char/s eta= 105s - eval 49,200/60,000 ( 82.0%) acc=0.7194 114 char/s eta= 95s - eval 50,400/60,000 ( 84.0%) acc=0.7208 114 char/s eta= 84s - eval 51,600/60,000 ( 86.0%) acc=0.7214 114 char/s eta= 74s - eval 52,800/60,000 ( 88.0%) acc=0.7227 114 char/s eta= 63s - eval 54,000/60,000 ( 90.0%) acc=0.7229 114 char/s eta= 53s - eval 55,200/60,000 ( 92.0%) acc=0.7224 114 char/s eta= 42s - eval 56,400/60,000 ( 94.0%) acc=0.7230 114 char/s eta= 32s - eval 57,600/60,000 ( 96.0%) acc=0.7234 114 char/s eta= 21s - eval 58,800/60,000 ( 98.0%) acc=0.7239 114 char/s eta= 11s - eval 60,000/60,000 (100.0%) acc=0.7238 114 char/s eta= 0s -chars=60,000 acc=0.7238 eval_duration=527.1s + eval 1,200/60,000 ( 2.0%) acc=0.7067 149 char/s eta= 396s + eval 2,400/60,000 ( 4.0%) acc=0.7037 149 char/s eta= 386s + eval 3,600/60,000 ( 6.0%) acc=0.7044 149 char/s eta= 377s + eval 4,800/60,000 ( 8.0%) acc=0.7127 149 char/s eta= 370s + eval 6,000/60,000 ( 10.0%) acc=0.7010 150 char/s eta= 360s + eval 7,200/60,000 ( 12.0%) acc=0.7004 150 char/s eta= 352s + eval 8,400/60,000 ( 14.0%) acc=0.6994 151 char/s eta= 342s + eval 9,600/60,000 ( 16.0%) acc=0.7035 151 char/s eta= 333s + eval 10,800/60,000 ( 18.0%) acc=0.7044 151 char/s eta= 326s + eval 12,000/60,000 ( 20.0%) acc=0.7061 151 char/s eta= 318s + eval 13,200/60,000 ( 22.0%) acc=0.7114 151 char/s eta= 310s + eval 14,400/60,000 ( 24.0%) acc=0.7133 151 char/s eta= 302s + eval 15,600/60,000 ( 26.0%) acc=0.7163 151 char/s eta= 294s + eval 16,800/60,000 ( 28.0%) acc=0.7193 151 char/s eta= 287s + eval 18,000/60,000 ( 30.0%) acc=0.7174 151 char/s eta= 279s + eval 19,200/60,000 ( 32.0%) acc=0.7186 151 char/s eta= 271s + eval 20,400/60,000 ( 34.0%) acc=0.7201 150 char/s eta= 263s + eval 21,600/60,000 ( 36.0%) acc=0.7198 150 char/s eta= 255s + eval 22,800/60,000 ( 38.0%) acc=0.7202 151 char/s eta= 247s + eval 24,000/60,000 ( 40.0%) acc=0.7214 150 char/s eta= 239s + eval 25,200/60,000 ( 42.0%) acc=0.7225 150 char/s eta= 231s + eval 26,400/60,000 ( 44.0%) acc=0.7234 151 char/s eta= 223s + eval 27,600/60,000 ( 46.0%) acc=0.7249 150 char/s eta= 215s + eval 28,800/60,000 ( 48.0%) acc=0.7253 150 char/s eta= 207s + eval 30,000/60,000 ( 50.0%) acc=0.7242 150 char/s eta= 200s + eval 31,200/60,000 ( 52.0%) acc=0.7222 150 char/s eta= 192s + eval 32,400/60,000 ( 54.0%) acc=0.7219 150 char/s eta= 184s + eval 33,600/60,000 ( 56.0%) acc=0.7204 150 char/s eta= 176s + eval 34,800/60,000 ( 58.0%) acc=0.7194 150 char/s eta= 168s + eval 36,000/60,000 ( 60.0%) acc=0.7187 150 char/s eta= 160s + eval 37,200/60,000 ( 62.0%) acc=0.7190 150 char/s eta= 152s + eval 38,400/60,000 ( 64.0%) acc=0.7191 150 char/s eta= 144s + eval 39,600/60,000 ( 66.0%) acc=0.7191 150 char/s eta= 136s + eval 40,800/60,000 ( 68.0%) acc=0.7184 150 char/s eta= 128s + eval 42,000/60,000 ( 70.0%) acc=0.7182 150 char/s eta= 120s + eval 43,200/60,000 ( 72.0%) acc=0.7187 150 char/s eta= 112s + eval 44,400/60,000 ( 74.0%) acc=0.7186 150 char/s eta= 104s + eval 45,600/60,000 ( 76.0%) acc=0.7189 150 char/s eta= 96s + eval 46,800/60,000 ( 78.0%) acc=0.7186 150 char/s eta= 88s + eval 48,000/60,000 ( 80.0%) acc=0.7190 150 char/s eta= 80s + eval 49,200/60,000 ( 82.0%) acc=0.7196 150 char/s eta= 72s + eval 50,400/60,000 ( 84.0%) acc=0.7210 150 char/s eta= 64s + eval 51,600/60,000 ( 86.0%) acc=0.7216 150 char/s eta= 56s + eval 52,800/60,000 ( 88.0%) acc=0.7228 150 char/s eta= 48s + eval 54,000/60,000 ( 90.0%) acc=0.7233 150 char/s eta= 40s + eval 55,200/60,000 ( 92.0%) acc=0.7225 150 char/s eta= 32s + eval 56,400/60,000 ( 94.0%) acc=0.7232 150 char/s eta= 24s + eval 57,600/60,000 ( 96.0%) acc=0.7238 150 char/s eta= 16s + eval 58,800/60,000 ( 98.0%) acc=0.7245 150 char/s eta= 8s + eval 60,000/60,000 (100.0%) acc=0.7246 150 char/s eta= 0s +chars=60,000 acc=0.7246 eval_duration=399.9s --- submission : lwta_k4 -training energy (J): 46,222.2 -training duration : 236.8s -val char-accuracy : 0.7238 +training energy (J): 44,328.7 +training duration : 221.2s +val char-accuracy : 0.7246 val chars : 60,000 wrote /tmp/result.json Stopping app - local entrypoint completed. ✓ App completed. View run at -https://modal.com/apps/ab-10/main/ap-rRKkJJfugmtJNujFsbVrP3 +https://modal.com/apps/gabriel-nakajima-an/main/ap-hyBCSy0XG5jwP218GI6bNc # final result { "submission": "lwta_k4", - "training_energy_J": 46222.22882105, - "training_duration_s": 236.750003579, - "val_char_accuracy": 0.72375, + "training_energy_J": 44328.7028316, + "training_duration_s": 221.249343368, + "cpu_energy_J": 9354.22259884751, + "total_energy_J": 53682.92543044751, + "val_char_accuracy": 0.72455, "val_chars": 60000, "gpu_name": "NVIDIA A100 80GB PCIe", - "date_utc": "2026-05-18T18:06:13Z", + "date_utc": "2026-05-20T22:55:41Z", "_nvml": { "nvml_available": true, "energy_counter_supported": true, "monotonic": true, - "idle_watts": 52.98316666666664, - "stress_watts_avg": 227.19281000664682, - "stress_energy_joules": 8522.132, - "stress_duration_s": 37.510570866, + "idle_watts": 53.112000000000016, + "stress_watts_avg": 229.4340394814706, + "stress_energy_joules": 8404.0, + "stress_duration_s": 36.629263988, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": [] }, diff --git a/submissions/lwta_k4_alpha_065/nvml.json b/submissions/lwta_k4_alpha_065/nvml.json index 55f39e0..fb37849 100644 --- a/submissions/lwta_k4_alpha_065/nvml.json +++ b/submissions/lwta_k4_alpha_065/nvml.json @@ -2,10 +2,10 @@ "nvml_available": true, "energy_counter_supported": true, "monotonic": true, - "idle_watts": 54.36613333333333, - "stress_watts_avg": 228.9315778481163, - "stress_energy_joules": 8622.915, - "stress_duration_s": 37.665904726, - "gpu_name": "NVIDIA A100 80GB PCIe", + "idle_watts": 60.704084745762636, + "stress_watts_avg": 354.6097806948912, + "stress_energy_joules": 13165.227, + "stress_duration_s": 37.125955675, + "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": [] } diff --git a/submissions/lwta_k4_alpha_065/result.json b/submissions/lwta_k4_alpha_065/result.json index ffa4ff7..b2d6674 100644 --- a/submissions/lwta_k4_alpha_065/result.json +++ b/submissions/lwta_k4_alpha_065/result.json @@ -1,20 +1,22 @@ { "submission": "lwta_k4_alpha_065", - "training_energy_J": 13173.6836969, - "training_duration_s": 117.52094606199998, - "val_char_accuracy": 0.7381833333333333, + "training_energy_J": 13751.454901850002, + "training_duration_s": 144.607121963, + "cpu_energy_J": 6170.442651387502, + "total_energy_J": 19921.897553237504, + "val_char_accuracy": 0.7327833333333333, "val_chars": 60000, - "gpu_name": "NVIDIA A100 80GB PCIe", - "date_utc": "2026-05-20T00:58:50Z", + "gpu_name": "NVIDIA A100-SXM4-80GB", + "date_utc": "2026-05-21T05:28:08Z", "_nvml": { "nvml_available": true, "energy_counter_supported": true, "monotonic": true, - "idle_watts": 54.36613333333333, - "stress_watts_avg": 228.9315778481163, - "stress_energy_joules": 8622.915, - "stress_duration_s": 37.665904726, - "gpu_name": "NVIDIA A100 80GB PCIe", + "idle_watts": 60.704084745762636, + "stress_watts_avg": 354.6097806948912, + "stress_energy_joules": 13165.227, + "stress_duration_s": 37.125955675, + "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": [] }, "contributor": "@subagent-L2clean-2026-05-19" diff --git a/submissions/lwta_k4_alpha_065/run.log b/submissions/lwta_k4_alpha_065/run.log index fed31cd..1673c1c 100644 --- a/submissions/lwta_k4_alpha_065/run.log +++ b/submissions/lwta_k4_alpha_065/run.log @@ -1,25 +1,25 @@ -# wikitext submit.py log — lwta_k4_alpha_065 — 2026-05-20T00:50:06+00:00Z +# wikitext submit.py log — lwta_k4_alpha_065 — 2026-05-21T05:19:18+00:00Z [modal] launching A100-80GB ... ✓ Initialized. View run at -https://modal.com/apps/gabriel-nakajima-an/main/ap-QClLkwZItRoeZ237Shsx0Z +https://modal.com/apps/gabriel-nakajima-an/main/ap-xwFThbQDkgtwWiaSsczU1X ✓ Created objects. ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/submit.py -├── 🔨 Created mount /Users/naka/src/sutro/wikitext/task.py ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/verify_nvml.py -├── 🔨 Created mount /Users/naka/src/sutro/wikitext/run_eval.py +├── 🔨 Created mount /Users/naka/src/sutro/wikitext/task.py ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/wikitext.py +├── 🔨 Created mount /Users/naka/src/sutro/wikitext/run_eval.py └── 🔨 Created function run_submission. [modal] verifying NVML energy counter ... -GPU: NVIDIA A100 80GB PCIe +GPU: NVIDIA A100-SXM4-80GB sampling idle power for 3s ... - idle: 54.4 W + idle: 60.7 W running 30s stress workload ... - duration: 37.7 s - energy delta: 8,622.9 J - avg power: 228.9 W + duration: 37.1 s + energy delta: 13,165.2 J + avg power: 354.6 W monotonic: True --- -{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 54.36613333333333, "stress_watts_avg": 228.9315778481163, "stress_energy_joules": 8622.915, "stress_duration_s": 37.665904726, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": []} +{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 60.704084745762636, "stress_watts_avg": 354.6097806948912, "stress_energy_joules": 13165.227, "stress_duration_s": 37.125955675, "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": []} [modal] running submission (TEST_CHARS=60000 MAX_TRAIN_SECONDS=300.0 ACC_MIN=0.7) ... loading WikiText-103 from /data ... train chars: 540,095,682 @@ -27,117 +27,120 @@ loading WikiText-103 from /data ... train wall-clock cap: 300 s val accuracy floor : 0.7000 training submission /workspace/lwta_k4_alpha_065.py ... +[codecarbon WARNING @ 05:20:10] Multiple instances of codecarbon are allowed to run at the same time. [lwta_k4_a065] starting GPU KN build; max_order=12 D=0.5 [lwta_k4_a065] top order=12 unique pairs: 157,942,722 2.5s -[lwta_k4_a065] ctx_len=11 ctxs=119,285,712 15.0s -[lwta_k4_a065] ctx_len=10 ctxs=84,282,364 13.0s -[lwta_k4_a065] ctx_len=9 ctxs=54,720,376 8.5s -[lwta_k4_a065] ctx_len=8 ctxs=31,924,091 5.2s -[lwta_k4_a065] ctx_len=7 ctxs=16,284,921 2.3s -[lwta_k4_a065] ctx_len=6 ctxs=7,016,442 1.1s +[lwta_k4_a065] ctx_len=11 ctxs=119,285,712 33.7s +[lwta_k4_a065] ctx_len=10 ctxs=84,282,364 19.5s +[lwta_k4_a065] ctx_len=9 ctxs=54,720,376 10.5s +[lwta_k4_a065] ctx_len=8 ctxs=31,924,091 6.3s +[lwta_k4_a065] ctx_len=7 ctxs=16,284,921 3.4s +[lwta_k4_a065] ctx_len=6 ctxs=7,016,442 1.5s [lwta_k4_a065] ctx_len=5 ctxs=2,438,281 0.6s [lwta_k4_a065] ctx_len=4 ctxs=637,143 0.1s [lwta_k4_a065] ctx_len=3 ctxs=122,882 0.0s [lwta_k4_a065] ctx_len=2 ctxs=12,282 0.0s [lwta_k4_a065] ctx_len=1 ctxs=204 0.0s [lwta_k4_a065] ctx_len=0 ctxs=1 0.0s -[lwta_k4_a065] KN build done: 48.3s +[lwta_k4_a065] KN build done: 78.1s [lwta_k4_a065] NN 3.29M params cfg=TrainConfig(d=256 L=4 H=4 bs=32 T=1024 steps=1200 lwta_k=4) [lwta_k4_a065] NN step 0/1200 loss 5.5452 elapsed 1s -[lwta_k4_a065] NN step 100/1200 loss 1.8225 elapsed 6s -[lwta_k4_a065] NN step 200/1200 loss 1.5410 elapsed 12s -[lwta_k4_a065] NN step 300/1200 loss 1.4316 elapsed 17s -[lwta_k4_a065] NN step 400/1200 loss 1.3322 elapsed 22s -[lwta_k4_a065] NN step 500/1200 loss 1.3151 elapsed 28s -[lwta_k4_a065] NN step 600/1200 loss 1.2459 elapsed 33s -[lwta_k4_a065] NN step 700/1200 loss 1.2173 elapsed 39s -[lwta_k4_a065] NN step 800/1200 loss 1.1725 elapsed 44s -[lwta_k4_a065] NN step 900/1200 loss 1.1813 elapsed 50s -[lwta_k4_a065] NN step 1000/1200 loss 1.1598 elapsed 55s -[lwta_k4_a065] NN step 1100/1200 loss 1.1275 elapsed 60s -[lwta_k4_a065] NN step 1199/1200 loss 1.1207 elapsed 66s -training: 13,173.7 J duration=117.5s +[lwta_k4_a065] NN step 100/1200 loss 1.7843 elapsed 6s +[lwta_k4_a065] NN step 200/1200 loss 1.5196 elapsed 11s +[lwta_k4_a065] NN step 300/1200 loss 1.4285 elapsed 16s +[lwta_k4_a065] NN step 400/1200 loss 1.3993 elapsed 21s +[lwta_k4_a065] NN step 500/1200 loss 1.3084 elapsed 26s +[lwta_k4_a065] NN step 600/1200 loss 1.2457 elapsed 32s +[lwta_k4_a065] NN step 700/1200 loss 1.2204 elapsed 37s +[lwta_k4_a065] NN step 800/1200 loss 1.2090 elapsed 42s +[lwta_k4_a065] NN step 900/1200 loss 1.1729 elapsed 47s +[lwta_k4_a065] NN step 1000/1200 loss 1.1880 elapsed 52s +[lwta_k4_a065] NN step 1100/1200 loss 1.1337 elapsed 57s +[lwta_k4_a065] NN step 1199/1200 loss 1.1552 elapsed 62s +training: 13,751.5 J duration=144.6s evaluating on val split ... - eval 1,200/60,000 ( 2.0%) acc=0.7175 163 char/s eta= 361s - eval 2,400/60,000 ( 4.0%) acc=0.7104 168 char/s eta= 343s - eval 3,600/60,000 ( 6.0%) acc=0.7131 168 char/s eta= 336s - eval 4,800/60,000 ( 8.0%) acc=0.7212 168 char/s eta= 329s - eval 6,000/60,000 ( 10.0%) acc=0.7170 169 char/s eta= 320s - eval 7,200/60,000 ( 12.0%) acc=0.7146 169 char/s eta= 312s - eval 8,400/60,000 ( 14.0%) acc=0.7156 169 char/s eta= 305s - eval 9,600/60,000 ( 16.0%) acc=0.7215 170 char/s eta= 297s - eval 10,800/60,000 ( 18.0%) acc=0.7262 169 char/s eta= 290s - eval 12,000/60,000 ( 20.0%) acc=0.7282 169 char/s eta= 283s - eval 13,200/60,000 ( 22.0%) acc=0.7321 170 char/s eta= 276s - eval 14,400/60,000 ( 24.0%) acc=0.7336 170 char/s eta= 269s - eval 15,600/60,000 ( 26.0%) acc=0.7354 170 char/s eta= 261s - eval 16,800/60,000 ( 28.0%) acc=0.7385 170 char/s eta= 254s - eval 18,000/60,000 ( 30.0%) acc=0.7392 170 char/s eta= 247s - eval 19,200/60,000 ( 32.0%) acc=0.7418 170 char/s eta= 240s - eval 20,400/60,000 ( 34.0%) acc=0.7428 170 char/s eta= 233s - eval 21,600/60,000 ( 36.0%) acc=0.7427 170 char/s eta= 226s - eval 22,800/60,000 ( 38.0%) acc=0.7430 170 char/s eta= 219s - eval 24,000/60,000 ( 40.0%) acc=0.7427 170 char/s eta= 212s - eval 25,200/60,000 ( 42.0%) acc=0.7432 170 char/s eta= 205s - eval 26,400/60,000 ( 44.0%) acc=0.7439 170 char/s eta= 198s - eval 27,600/60,000 ( 46.0%) acc=0.7439 169 char/s eta= 191s - eval 28,800/60,000 ( 48.0%) acc=0.7444 168 char/s eta= 185s - eval 30,000/60,000 ( 50.0%) acc=0.7434 168 char/s eta= 178s - eval 31,200/60,000 ( 52.0%) acc=0.7410 168 char/s eta= 172s - eval 32,400/60,000 ( 54.0%) acc=0.7404 168 char/s eta= 165s - eval 33,600/60,000 ( 56.0%) acc=0.7385 167 char/s eta= 158s - eval 34,800/60,000 ( 58.0%) acc=0.7383 167 char/s eta= 151s - eval 36,000/60,000 ( 60.0%) acc=0.7382 167 char/s eta= 144s - eval 37,200/60,000 ( 62.0%) acc=0.7385 167 char/s eta= 136s - eval 38,400/60,000 ( 64.0%) acc=0.7385 168 char/s eta= 129s - eval 39,600/60,000 ( 66.0%) acc=0.7382 168 char/s eta= 122s - eval 40,800/60,000 ( 68.0%) acc=0.7375 168 char/s eta= 114s - eval 42,000/60,000 ( 70.0%) acc=0.7368 168 char/s eta= 107s - eval 43,200/60,000 ( 72.0%) acc=0.7369 168 char/s eta= 100s - eval 44,400/60,000 ( 74.0%) acc=0.7363 168 char/s eta= 93s - eval 45,600/60,000 ( 76.0%) acc=0.7363 169 char/s eta= 85s - eval 46,800/60,000 ( 78.0%) acc=0.7354 168 char/s eta= 78s - eval 48,000/60,000 ( 80.0%) acc=0.7355 168 char/s eta= 71s - eval 49,200/60,000 ( 82.0%) acc=0.7354 168 char/s eta= 64s - eval 50,400/60,000 ( 84.0%) acc=0.7362 168 char/s eta= 57s - eval 51,600/60,000 ( 86.0%) acc=0.7364 169 char/s eta= 50s - eval 52,800/60,000 ( 88.0%) acc=0.7371 169 char/s eta= 43s - eval 54,000/60,000 ( 90.0%) acc=0.7373 169 char/s eta= 36s - eval 55,200/60,000 ( 92.0%) acc=0.7365 169 char/s eta= 28s - eval 56,400/60,000 ( 94.0%) acc=0.7365 169 char/s eta= 21s - eval 57,600/60,000 ( 96.0%) acc=0.7369 169 char/s eta= 14s - eval 58,800/60,000 ( 98.0%) acc=0.7375 169 char/s eta= 7s - eval 60,000/60,000 (100.0%) acc=0.7382 169 char/s eta= 0s -chars=60,000 acc=0.7382 eval_duration=355.4s + eval 1,200/60,000 ( 2.0%) acc=0.7200 181 char/s eta= 326s + eval 2,400/60,000 ( 4.0%) acc=0.7125 180 char/s eta= 319s + eval 3,600/60,000 ( 6.0%) acc=0.7111 181 char/s eta= 312s + eval 4,800/60,000 ( 8.0%) acc=0.7194 181 char/s eta= 305s + eval 6,000/60,000 ( 10.0%) acc=0.7150 182 char/s eta= 297s + eval 7,200/60,000 ( 12.0%) acc=0.7108 182 char/s eta= 289s + eval 8,400/60,000 ( 14.0%) acc=0.7112 183 char/s eta= 283s + eval 9,600/60,000 ( 16.0%) acc=0.7167 183 char/s eta= 276s + eval 10,800/60,000 ( 18.0%) acc=0.7214 183 char/s eta= 269s + eval 12,000/60,000 ( 20.0%) acc=0.7228 183 char/s eta= 263s + eval 13,200/60,000 ( 22.0%) acc=0.7275 183 char/s eta= 256s + eval 14,400/60,000 ( 24.0%) acc=0.7290 183 char/s eta= 249s + eval 15,600/60,000 ( 26.0%) acc=0.7306 183 char/s eta= 243s + eval 16,800/60,000 ( 28.0%) acc=0.7339 183 char/s eta= 236s + eval 18,000/60,000 ( 30.0%) acc=0.7340 183 char/s eta= 230s + eval 19,200/60,000 ( 32.0%) acc=0.7368 183 char/s eta= 223s + eval 20,400/60,000 ( 34.0%) acc=0.7388 183 char/s eta= 217s + eval 21,600/60,000 ( 36.0%) acc=0.7386 182 char/s eta= 210s + eval 22,800/60,000 ( 38.0%) acc=0.7384 182 char/s eta= 204s + eval 24,000/60,000 ( 40.0%) acc=0.7380 182 char/s eta= 198s + eval 25,200/60,000 ( 42.0%) acc=0.7381 182 char/s eta= 191s + eval 26,400/60,000 ( 44.0%) acc=0.7388 182 char/s eta= 185s + eval 27,600/60,000 ( 46.0%) acc=0.7386 182 char/s eta= 178s + eval 28,800/60,000 ( 48.0%) acc=0.7391 182 char/s eta= 171s + eval 30,000/60,000 ( 50.0%) acc=0.7383 182 char/s eta= 165s + eval 31,200/60,000 ( 52.0%) acc=0.7353 182 char/s eta= 158s + eval 32,400/60,000 ( 54.0%) acc=0.7346 182 char/s eta= 151s + eval 33,600/60,000 ( 56.0%) acc=0.7327 182 char/s eta= 145s + eval 34,800/60,000 ( 58.0%) acc=0.7330 182 char/s eta= 138s + eval 36,000/60,000 ( 60.0%) acc=0.7331 183 char/s eta= 131s + eval 37,200/60,000 ( 62.0%) acc=0.7333 183 char/s eta= 125s + eval 38,400/60,000 ( 64.0%) acc=0.7330 182 char/s eta= 118s + eval 39,600/60,000 ( 66.0%) acc=0.7324 182 char/s eta= 112s + eval 40,800/60,000 ( 68.0%) acc=0.7317 182 char/s eta= 105s + eval 42,000/60,000 ( 70.0%) acc=0.7306 182 char/s eta= 99s + eval 43,200/60,000 ( 72.0%) acc=0.7308 183 char/s eta= 92s + eval 44,400/60,000 ( 74.0%) acc=0.7306 183 char/s eta= 85s + eval 45,600/60,000 ( 76.0%) acc=0.7306 182 char/s eta= 79s + eval 46,800/60,000 ( 78.0%) acc=0.7301 182 char/s eta= 72s + eval 48,000/60,000 ( 80.0%) acc=0.7302 183 char/s eta= 66s + eval 49,200/60,000 ( 82.0%) acc=0.7301 183 char/s eta= 59s + eval 50,400/60,000 ( 84.0%) acc=0.7312 182 char/s eta= 53s + eval 51,600/60,000 ( 86.0%) acc=0.7315 182 char/s eta= 46s + eval 52,800/60,000 ( 88.0%) acc=0.7317 182 char/s eta= 39s + eval 54,000/60,000 ( 90.0%) acc=0.7319 182 char/s eta= 33s + eval 55,200/60,000 ( 92.0%) acc=0.7310 182 char/s eta= 26s + eval 56,400/60,000 ( 94.0%) acc=0.7309 182 char/s eta= 20s + eval 57,600/60,000 ( 96.0%) acc=0.7313 182 char/s eta= 13s + eval 58,800/60,000 ( 98.0%) acc=0.7319 182 char/s eta= 7s + eval 60,000/60,000 (100.0%) acc=0.7328 182 char/s eta= 0s +chars=60,000 acc=0.7328 eval_duration=329.1s --- submission : lwta_k4_alpha_065 -training energy (J): 13,173.7 -training duration : 117.5s -val char-accuracy : 0.7382 +training energy (J): 13,751.5 +training duration : 144.6s +val char-accuracy : 0.7328 val chars : 60,000 wrote /tmp/result.json Stopping app - local entrypoint completed. ✓ App completed. View run at -https://modal.com/apps/gabriel-nakajima-an/main/ap-QClLkwZItRoeZ237Shsx0Z +https://modal.com/apps/gabriel-nakajima-an/main/ap-xwFThbQDkgtwWiaSsczU1X # final result { "submission": "lwta_k4_alpha_065", - "training_energy_J": 13173.6836969, - "training_duration_s": 117.52094606199998, - "val_char_accuracy": 0.7381833333333333, + "training_energy_J": 13751.454901850002, + "training_duration_s": 144.607121963, + "cpu_energy_J": 6170.442651387502, + "total_energy_J": 19921.897553237504, + "val_char_accuracy": 0.7327833333333333, "val_chars": 60000, - "gpu_name": "NVIDIA A100 80GB PCIe", - "date_utc": "2026-05-20T00:58:50Z", + "gpu_name": "NVIDIA A100-SXM4-80GB", + "date_utc": "2026-05-21T05:28:08Z", "_nvml": { "nvml_available": true, "energy_counter_supported": true, "monotonic": true, - "idle_watts": 54.36613333333333, - "stress_watts_avg": 228.9315778481163, - "stress_energy_joules": 8622.915, - "stress_duration_s": 37.665904726, - "gpu_name": "NVIDIA A100 80GB PCIe", + "idle_watts": 60.704084745762636, + "stress_watts_avg": 354.6097806948912, + "stress_energy_joules": 13165.227, + "stress_duration_s": 37.125955675, + "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": [] }, "contributor": "@subagent-L2clean-2026-05-19" diff --git a/submissions/modded_nanogpt/nvml.json b/submissions/modded_nanogpt/nvml.json index 3423730..816fb1e 100644 --- a/submissions/modded_nanogpt/nvml.json +++ b/submissions/modded_nanogpt/nvml.json @@ -2,10 +2,10 @@ "nvml_available": true, "energy_counter_supported": true, "monotonic": true, - "idle_watts": 55.60349152542373, - "stress_watts_avg": 232.33601101968227, - "stress_energy_joules": 8741.791, - "stress_duration_s": 37.625639527999994, - "gpu_name": "NVIDIA A100 80GB PCIe", + "idle_watts": 62.52783050847465, + "stress_watts_avg": 350.16403334081195, + "stress_energy_joules": 13179.464, + "stress_duration_s": 37.637971765, + "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": [] } diff --git a/submissions/modded_nanogpt/result.json b/submissions/modded_nanogpt/result.json index 2f96ede..33a46ce 100644 --- a/submissions/modded_nanogpt/result.json +++ b/submissions/modded_nanogpt/result.json @@ -1,20 +1,22 @@ { "submission": "modded_nanogpt", - "training_energy_J": 51704.306257950004, - "training_duration_s": 246.648394841, - "val_char_accuracy": 0.7373666666666666, + "training_energy_J": 51728.92952335, + "training_duration_s": 242.66790953299997, + "cpu_energy_J": 10276.997479117497, + "total_energy_J": 62005.9270024675, + "val_char_accuracy": 0.7336833333333334, "val_chars": 60000, - "gpu_name": "NVIDIA A100 80GB PCIe", - "date_utc": "2026-05-12T22:10:25Z", + "gpu_name": "NVIDIA A100-SXM4-80GB", + "date_utc": "2026-05-21T05:31:57Z", "_nvml": { "nvml_available": true, "energy_counter_supported": true, "monotonic": true, - "idle_watts": 55.60349152542373, - "stress_watts_avg": 232.33601101968227, - "stress_energy_joules": 8741.791, - "stress_duration_s": 37.625639527999994, - "gpu_name": "NVIDIA A100 80GB PCIe", + "idle_watts": 62.52783050847465, + "stress_watts_avg": 350.16403334081195, + "stress_energy_joules": 13179.464, + "stress_duration_s": 37.637971765, + "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": [] }, "contributor": "@ab-10" diff --git a/submissions/modded_nanogpt/run.log b/submissions/modded_nanogpt/run.log index 0d20c70..73fa945 100644 --- a/submissions/modded_nanogpt/run.log +++ b/submissions/modded_nanogpt/run.log @@ -1,142 +1,141 @@ -# wikitext submit.py log — modded_nanogpt — 2026-05-12T21:57:26+00:00Z -[modal] launching A100-40GB ... +# wikitext submit.py log — modded_nanogpt — 2026-05-21T05:19:18+00:00Z +[modal] launching A100-80GB ... ✓ Initialized. View run at -https://modal.com/apps/ab-10/main/ap-E9Q8eIo1sjKa6HsN23nCdR +https://modal.com/apps/gabriel-nakajima-an/main/ap-8CUCBSgp30OSqDLk4DjLGC ✓ Created objects. -├── 🔨 Created mount /home/seneca/cybertronai-wikitext/submit.py -├── 🔨 Created mount /home/seneca/cybertronai-wikitext/verify_nvml.py -├── 🔨 Created mount /home/seneca/cybertronai-wikitext/run_eval.py -├── 🔨 Created mount /home/seneca/cybertronai-wikitext/task.py -├── 🔨 Created mount /home/seneca/cybertronai-wikitext/wikitext.py +├── 🔨 Created mount /Users/naka/src/sutro/wikitext/submit.py +├── 🔨 Created mount /Users/naka/src/sutro/wikitext/verify_nvml.py +├── 🔨 Created mount /Users/naka/src/sutro/wikitext/task.py +├── 🔨 Created mount /Users/naka/src/sutro/wikitext/run_eval.py +├── 🔨 Created mount /Users/naka/src/sutro/wikitext/wikitext.py └── 🔨 Created function run_submission. [modal] verifying NVML energy counter ... -GPU: NVIDIA A100 80GB PCIe +GPU: NVIDIA A100-SXM4-80GB sampling idle power for 3s ... - idle: 55.6 W + idle: 62.5 W running 30s stress workload ... -/usr/local/lib/python3.11/site-packages/torch/_subclasses/functional_tensor.py:295: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.) - cpu = _conversion_method_template(device=torch.device("cpu")) duration: 37.6 s - energy delta: 8,741.8 J - avg power: 232.3 W + energy delta: 13,179.5 J + avg power: 350.2 W monotonic: True --- -{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 55.60349152542373, "stress_watts_avg": 232.33601101968227, "stress_energy_joules": 8741.791, "stress_duration_s": 37.625639527999994, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": []} +{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 62.52783050847465, "stress_watts_avg": 350.16403334081195, "stress_energy_joules": 13179.464, "stress_duration_s": 37.637971765, "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": []} [modal] running submission (TEST_CHARS=60000 MAX_TRAIN_SECONDS=300.0 ACC_MIN=0.7) ... loading WikiText-103 from /data ... train chars: 540,095,682 val chars: 60,000 (scored, gated by --acc-min) train wall-clock cap: 300 s val accuracy floor : 0.7000 -/usr/local/lib/python3.11/site-packages/torch/_subclasses/functional_tensor.py:295: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.) - cpu = _conversion_method_template(device=torch.device("cpu")) training submission /workspace/modded_nanogpt.py ... +[codecarbon WARNING @ 05:20:12] Multiple instances of codecarbon are allowed to run at the same time. [modded] 10.84M params cfg=TrainConfig(d=384 L=6 H=6 bs=32 T=1024 steps=2150) [modded] step 0/2150 loss 5.5452 elapsed 1s -[modded] step 100/2150 loss 1.6180 elapsed 13s -[modded] step 200/2150 loss 1.4124 elapsed 24s -[modded] step 300/2150 loss 1.3156 elapsed 35s -[modded] step 400/2150 loss 1.2198 elapsed 46s -[modded] step 500/2150 loss 1.1874 elapsed 58s -[modded] step 600/2150 loss 1.1899 elapsed 69s -[modded] step 700/2150 loss 1.1334 elapsed 80s -[modded] step 800/2150 loss 1.1112 elapsed 91s -[modded] step 900/2150 loss 1.1067 elapsed 103s -[modded] step 1000/2150 loss 1.0968 elapsed 114s -[modded] step 1100/2150 loss 1.0989 elapsed 125s -[modded] step 1200/2150 loss 1.0336 elapsed 136s -[modded] step 1300/2150 loss 1.0725 elapsed 148s -[modded] step 1400/2150 loss 1.0814 elapsed 159s -[modded] step 1500/2150 loss 1.0162 elapsed 170s -[modded] step 1600/2150 loss 1.0225 elapsed 181s -[modded] step 1700/2150 loss 1.0033 elapsed 193s -[modded] step 1800/2150 loss 0.9861 elapsed 204s -[modded] step 1900/2150 loss 0.9606 elapsed 215s -[modded] step 2000/2150 loss 0.9690 elapsed 227s -[modded] step 2100/2150 loss 0.9526 elapsed 238s -[modded] step 2149/2150 loss 0.9696 elapsed 243s -training: 51,704.3 J duration=246.6s +[modded] step 100/2150 loss 1.6118 elapsed 13s +[modded] step 200/2150 loss 1.4113 elapsed 24s +[modded] step 300/2150 loss 1.3137 elapsed 35s +[modded] step 400/2150 loss 1.2343 elapsed 46s +[modded] step 500/2150 loss 1.2061 elapsed 57s +[modded] step 600/2150 loss 1.1634 elapsed 68s +[modded] step 700/2150 loss 1.1344 elapsed 79s +[modded] step 800/2150 loss 1.1528 elapsed 90s +[modded] step 900/2150 loss 1.0967 elapsed 101s +[modded] step 1000/2150 loss 1.1196 elapsed 111s +[modded] step 1100/2150 loss 1.0883 elapsed 122s +[modded] step 1200/2150 loss 1.0476 elapsed 133s +[modded] step 1300/2150 loss 1.0868 elapsed 144s +[modded] step 1400/2150 loss 1.0414 elapsed 155s +[modded] step 1500/2150 loss 1.0262 elapsed 166s +[modded] step 1600/2150 loss 1.0296 elapsed 177s +[modded] step 1700/2150 loss 0.9912 elapsed 188s +[modded] step 1800/2150 loss 1.0246 elapsed 199s +[modded] step 1900/2150 loss 0.9675 elapsed 210s +[modded] step 2000/2150 loss 0.9918 elapsed 221s +[modded] step 2100/2150 loss 0.9837 elapsed 233s +[modded] step 2149/2150 loss 0.9957 elapsed 238s +training: 51,728.9 J duration=242.7s evaluating on val split ... - eval 1,200/60,000 ( 2.0%) acc=0.7342 129 char/s eta= 456s - eval 2,400/60,000 ( 4.0%) acc=0.7212 128 char/s eta= 449s - eval 3,600/60,000 ( 6.0%) acc=0.7258 127 char/s eta= 444s - eval 4,800/60,000 ( 8.0%) acc=0.7342 126 char/s eta= 436s - eval 6,000/60,000 ( 10.0%) acc=0.7257 127 char/s eta= 427s - eval 7,200/60,000 ( 12.0%) acc=0.7228 126 char/s eta= 418s - eval 8,400/60,000 ( 14.0%) acc=0.7201 127 char/s eta= 408s - eval 9,600/60,000 ( 16.0%) acc=0.7257 127 char/s eta= 398s - eval 10,800/60,000 ( 18.0%) acc=0.7277 126 char/s eta= 389s - eval 12,000/60,000 ( 20.0%) acc=0.7286 126 char/s eta= 381s - eval 13,200/60,000 ( 22.0%) acc=0.7323 126 char/s eta= 371s - eval 14,400/60,000 ( 24.0%) acc=0.7330 126 char/s eta= 362s - eval 15,600/60,000 ( 26.0%) acc=0.7355 126 char/s eta= 353s - eval 16,800/60,000 ( 28.0%) acc=0.7379 126 char/s eta= 343s - eval 18,000/60,000 ( 30.0%) acc=0.7356 126 char/s eta= 334s - eval 19,200/60,000 ( 32.0%) acc=0.7364 126 char/s eta= 325s - eval 20,400/60,000 ( 34.0%) acc=0.7376 126 char/s eta= 315s - eval 21,600/60,000 ( 36.0%) acc=0.7368 126 char/s eta= 306s - eval 22,800/60,000 ( 38.0%) acc=0.7375 126 char/s eta= 296s - eval 24,000/60,000 ( 40.0%) acc=0.7378 125 char/s eta= 287s - eval 25,200/60,000 ( 42.0%) acc=0.7390 125 char/s eta= 277s - eval 26,400/60,000 ( 44.0%) acc=0.7399 125 char/s eta= 268s - eval 27,600/60,000 ( 46.0%) acc=0.7412 125 char/s eta= 258s - eval 28,800/60,000 ( 48.0%) acc=0.7410 125 char/s eta= 249s - eval 30,000/60,000 ( 50.0%) acc=0.7398 125 char/s eta= 239s - eval 31,200/60,000 ( 52.0%) acc=0.7378 125 char/s eta= 230s - eval 32,400/60,000 ( 54.0%) acc=0.7373 125 char/s eta= 220s - eval 33,600/60,000 ( 56.0%) acc=0.7355 125 char/s eta= 211s - eval 34,800/60,000 ( 58.0%) acc=0.7342 125 char/s eta= 201s - eval 36,000/60,000 ( 60.0%) acc=0.7335 125 char/s eta= 192s - eval 37,200/60,000 ( 62.0%) acc=0.7335 125 char/s eta= 182s - eval 38,400/60,000 ( 64.0%) acc=0.7339 125 char/s eta= 173s - eval 39,600/60,000 ( 66.0%) acc=0.7336 125 char/s eta= 163s - eval 40,800/60,000 ( 68.0%) acc=0.7331 125 char/s eta= 153s - eval 42,000/60,000 ( 70.0%) acc=0.7328 125 char/s eta= 144s - eval 43,200/60,000 ( 72.0%) acc=0.7331 125 char/s eta= 134s - eval 44,400/60,000 ( 74.0%) acc=0.7331 125 char/s eta= 125s - eval 45,600/60,000 ( 76.0%) acc=0.7330 125 char/s eta= 115s - eval 46,800/60,000 ( 78.0%) acc=0.7326 125 char/s eta= 105s - eval 48,000/60,000 ( 80.0%) acc=0.7331 125 char/s eta= 96s - eval 49,200/60,000 ( 82.0%) acc=0.7335 125 char/s eta= 86s - eval 50,400/60,000 ( 84.0%) acc=0.7350 125 char/s eta= 77s - eval 51,600/60,000 ( 86.0%) acc=0.7355 125 char/s eta= 67s - eval 52,800/60,000 ( 88.0%) acc=0.7363 125 char/s eta= 58s - eval 54,000/60,000 ( 90.0%) acc=0.7367 125 char/s eta= 48s - eval 55,200/60,000 ( 92.0%) acc=0.7360 125 char/s eta= 38s - eval 56,400/60,000 ( 94.0%) acc=0.7365 126 char/s eta= 29s - eval 57,600/60,000 ( 96.0%) acc=0.7368 126 char/s eta= 19s - eval 58,800/60,000 ( 98.0%) acc=0.7374 126 char/s eta= 10s - eval 60,000/60,000 (100.0%) acc=0.7374 126 char/s eta= 0s -chars=60,000 acc=0.7374 eval_duration=475.8s + eval 1,200/60,000 ( 2.0%) acc=0.7367 130 char/s eta= 452s + eval 2,400/60,000 ( 4.0%) acc=0.7196 131 char/s eta= 439s + eval 3,600/60,000 ( 6.0%) acc=0.7186 131 char/s eta= 430s + eval 4,800/60,000 ( 8.0%) acc=0.7267 131 char/s eta= 421s + eval 6,000/60,000 ( 10.0%) acc=0.7170 131 char/s eta= 413s + eval 7,200/60,000 ( 12.0%) acc=0.7133 131 char/s eta= 403s + eval 8,400/60,000 ( 14.0%) acc=0.7119 131 char/s eta= 395s + eval 9,600/60,000 ( 16.0%) acc=0.7160 130 char/s eta= 386s + eval 10,800/60,000 ( 18.0%) acc=0.7179 130 char/s eta= 378s + eval 12,000/60,000 ( 20.0%) acc=0.7196 131 char/s eta= 367s + eval 13,200/60,000 ( 22.0%) acc=0.7245 131 char/s eta= 358s + eval 14,400/60,000 ( 24.0%) acc=0.7256 131 char/s eta= 349s + eval 15,600/60,000 ( 26.0%) acc=0.7281 131 char/s eta= 340s + eval 16,800/60,000 ( 28.0%) acc=0.7308 130 char/s eta= 331s + eval 18,000/60,000 ( 30.0%) acc=0.7282 130 char/s eta= 322s + eval 19,200/60,000 ( 32.0%) acc=0.7296 130 char/s eta= 313s + eval 20,400/60,000 ( 34.0%) acc=0.7313 131 char/s eta= 303s + eval 21,600/60,000 ( 36.0%) acc=0.7306 130 char/s eta= 294s + eval 22,800/60,000 ( 38.0%) acc=0.7310 131 char/s eta= 285s + eval 24,000/60,000 ( 40.0%) acc=0.7311 131 char/s eta= 275s + eval 25,200/60,000 ( 42.0%) acc=0.7325 131 char/s eta= 266s + eval 26,400/60,000 ( 44.0%) acc=0.7333 131 char/s eta= 256s + eval 27,600/60,000 ( 46.0%) acc=0.7345 131 char/s eta= 247s + eval 28,800/60,000 ( 48.0%) acc=0.7347 131 char/s eta= 238s + eval 30,000/60,000 ( 50.0%) acc=0.7342 131 char/s eta= 228s + eval 31,200/60,000 ( 52.0%) acc=0.7326 131 char/s eta= 219s + eval 32,400/60,000 ( 54.0%) acc=0.7320 131 char/s eta= 210s + eval 33,600/60,000 ( 56.0%) acc=0.7303 131 char/s eta= 201s + eval 34,800/60,000 ( 58.0%) acc=0.7290 132 char/s eta= 192s + eval 36,000/60,000 ( 60.0%) acc=0.7282 132 char/s eta= 182s + eval 37,200/60,000 ( 62.0%) acc=0.7287 132 char/s eta= 173s + eval 38,400/60,000 ( 64.0%) acc=0.7291 132 char/s eta= 164s + eval 39,600/60,000 ( 66.0%) acc=0.7292 132 char/s eta= 155s + eval 40,800/60,000 ( 68.0%) acc=0.7289 131 char/s eta= 146s + eval 42,000/60,000 ( 70.0%) acc=0.7286 131 char/s eta= 137s + eval 43,200/60,000 ( 72.0%) acc=0.7291 131 char/s eta= 128s + eval 44,400/60,000 ( 74.0%) acc=0.7289 131 char/s eta= 119s + eval 45,600/60,000 ( 76.0%) acc=0.7290 131 char/s eta= 110s + eval 46,800/60,000 ( 78.0%) acc=0.7286 131 char/s eta= 101s + eval 48,000/60,000 ( 80.0%) acc=0.7290 131 char/s eta= 92s + eval 49,200/60,000 ( 82.0%) acc=0.7293 131 char/s eta= 83s + eval 50,400/60,000 ( 84.0%) acc=0.7307 131 char/s eta= 73s + eval 51,600/60,000 ( 86.0%) acc=0.7314 131 char/s eta= 64s + eval 52,800/60,000 ( 88.0%) acc=0.7322 131 char/s eta= 55s + eval 54,000/60,000 ( 90.0%) acc=0.7326 131 char/s eta= 46s + eval 55,200/60,000 ( 92.0%) acc=0.7320 131 char/s eta= 37s + eval 56,400/60,000 ( 94.0%) acc=0.7325 131 char/s eta= 27s + eval 57,600/60,000 ( 96.0%) acc=0.7329 131 char/s eta= 18s + eval 58,800/60,000 ( 98.0%) acc=0.7336 131 char/s eta= 9s + eval 60,000/60,000 (100.0%) acc=0.7337 131 char/s eta= 0s +chars=60,000 acc=0.7337 eval_duration=457.9s --- submission : modded_nanogpt -training energy (J): 51,704.3 -training duration : 246.6s -val char-accuracy : 0.7374 +training energy (J): 51,728.9 +training duration : 242.7s +val char-accuracy : 0.7337 val chars : 60,000 wrote /tmp/result.json Stopping app - local entrypoint completed. ✓ App completed. View run at -https://modal.com/apps/ab-10/main/ap-E9Q8eIo1sjKa6HsN23nCdR +https://modal.com/apps/gabriel-nakajima-an/main/ap-8CUCBSgp30OSqDLk4DjLGC # final result { "submission": "modded_nanogpt", - "training_energy_J": 51704.306257950004, - "training_duration_s": 246.648394841, - "val_char_accuracy": 0.7373666666666666, + "training_energy_J": 51728.92952335, + "training_duration_s": 242.66790953299997, + "cpu_energy_J": 10276.997479117497, + "total_energy_J": 62005.9270024675, + "val_char_accuracy": 0.7336833333333334, "val_chars": 60000, - "gpu_name": "NVIDIA A100 80GB PCIe", - "date_utc": "2026-05-12T22:10:25Z", + "gpu_name": "NVIDIA A100-SXM4-80GB", + "date_utc": "2026-05-21T05:31:57Z", "_nvml": { "nvml_available": true, "energy_counter_supported": true, "monotonic": true, - "idle_watts": 55.60349152542373, - "stress_watts_avg": 232.33601101968227, - "stress_energy_joules": 8741.791, - "stress_duration_s": 37.625639527999994, - "gpu_name": "NVIDIA A100 80GB PCIe", + "idle_watts": 62.52783050847465, + "stress_watts_avg": 350.16403334081195, + "stress_energy_joules": 13179.464, + "stress_duration_s": 37.637971765, + "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": [] }, "contributor": "@ab-10" diff --git a/submissions/paq_mixer_v3/nvml.json b/submissions/paq_mixer_v3/nvml.json index 3f0ff32..8e05d5d 100644 --- a/submissions/paq_mixer_v3/nvml.json +++ b/submissions/paq_mixer_v3/nvml.json @@ -2,10 +2,10 @@ "nvml_available": true, "energy_counter_supported": true, "monotonic": true, - "idle_watts": 66.40836666666658, - "stress_watts_avg": 345.33368906733165, - "stress_energy_joules": 13068.953, - "stress_duration_s": 37.84441951, - "gpu_name": "NVIDIA A100-SXM4-80GB", + "idle_watts": 56.77781666666669, + "stress_watts_avg": 232.3223612785609, + "stress_energy_joules": 8489.282, + "stress_duration_s": 36.540959524, + "gpu_name": "NVIDIA A100 80GB PCIe", "notes": [] } diff --git a/submissions/paq_mixer_v3/result.json b/submissions/paq_mixer_v3/result.json index 91bd8e5..da2d7dc 100644 --- a/submissions/paq_mixer_v3/result.json +++ b/submissions/paq_mixer_v3/result.json @@ -1,22 +1,22 @@ { "submission": "paq_mixer_v3", - "training_energy_J": 3582.3155354, - "training_duration_s": 122.294609292, - "cpu_energy_J": 5167.742507545003, - "total_energy_J": 8750.058042945002, + "training_energy_J": 2355.22674605, + "training_duration_s": 53.155205079, + "cpu_energy_J": 2251.575883315001, + "total_energy_J": 4606.802629365002, "val_char_accuracy": 0.70475, "val_chars": 60000, - "gpu_name": "NVIDIA A100-SXM4-80GB", - "date_utc": "2026-05-20T07:12:19Z", + "gpu_name": "NVIDIA A100 80GB PCIe", + "date_utc": "2026-05-21T05:05:13Z", "_nvml": { "nvml_available": true, "energy_counter_supported": true, "monotonic": true, - "idle_watts": 66.40836666666658, - "stress_watts_avg": 345.33368906733165, - "stress_energy_joules": 13068.953, - "stress_duration_s": 37.84441951, - "gpu_name": "NVIDIA A100-SXM4-80GB", + "idle_watts": 56.77781666666669, + "stress_watts_avg": 232.3223612785609, + "stress_energy_joules": 8489.282, + "stress_duration_s": 36.540959524, + "gpu_name": "NVIDIA A100 80GB PCIe", "notes": [] }, "contributor": "@worker-paq-mixer" diff --git a/submissions/paq_mixer_v3/run.log b/submissions/paq_mixer_v3/run.log index e2a2fd6..a476cef 100644 --- a/submissions/paq_mixer_v3/run.log +++ b/submissions/paq_mixer_v3/run.log @@ -1,7 +1,7 @@ -# wikitext submit.py log — paq_mixer_v3 — 2026-05-20T07:08:40+00:00Z +# wikitext submit.py log — paq_mixer_v3 — 2026-05-21T05:03:02+00:00Z [modal] launching A100-80GB ... ✓ Initialized. View run at -https://modal.com/apps/gabriel-nakajima-an/main/ap-Sm0PHVmoPmOFQsokhYXdqV +https://modal.com/apps/gabriel-nakajima-an/main/ap-Dycl40rOPJQ1MEo4LFRzo5 ✓ Created objects. ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/submit.py ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/task.py @@ -10,16 +10,16 @@ https://modal.com/apps/gabriel-nakajima-an/main/ap-Sm0PHVmoPmOFQsokhYXdqV ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/run_eval.py └── 🔨 Created function run_submission. [modal] verifying NVML energy counter ... -GPU: NVIDIA A100-SXM4-80GB +GPU: NVIDIA A100 80GB PCIe sampling idle power for 3s ... - idle: 66.4 W + idle: 56.8 W running 30s stress workload ... - duration: 37.8 s - energy delta: 13,069.0 J - avg power: 345.3 W + duration: 36.5 s + energy delta: 8,489.3 J + avg power: 232.3 W monotonic: True --- -{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 66.40836666666658, "stress_watts_avg": 345.33368906733165, "stress_energy_joules": 13068.953, "stress_duration_s": 37.84441951, "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": []} +{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 56.77781666666669, "stress_watts_avg": 232.3223612785609, "stress_energy_joules": 8489.282, "stress_duration_s": 36.540959524, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": []} [modal] running submission (TEST_CHARS=60000 MAX_TRAIN_SECONDS=300.0 ACC_MIN=0.7) ... loading WikiText-103 from /data ... train chars: 540,095,682 @@ -27,23 +27,23 @@ loading WikiText-103 from /data ... train wall-clock cap: 300 s val accuracy floor : 0.7000 training submission /workspace/paq_mixer_v3.py ... -[codecarbon WARNING @ 07:09:48] Multiple instances of codecarbon are allowed to run at the same time. +[codecarbon WARNING @ 05:03:59] Multiple instances of codecarbon are allowed to run at the same time. [paq_mixer] device=cuda K=11 max_ctx_len=10 WB_DISCOUNT=1.0 -[paq_mixer] encoded 539,096,898 train bytes (0.8s); heldout=2,000,000 bytes -[paq_mixer] top order=11 unique pairs: 118,988,639 2.1s -[paq_mixer] order k=11 ctx_len=10 ctxs=84,084,448 rows=118,988,639 39.1s -[paq_mixer] order k=10 ctx_len=9 ctxs=54,600,791 rows=84,084,448 28.6s -[paq_mixer] order k=9 ctx_len=8 ctxs=31,859,845 rows=54,600,791 10.5s -[paq_mixer] order k=8 ctx_len=7 ctxs=16,254,833 rows=31,859,845 2.9s +[paq_mixer] encoded 539,096,898 train bytes (0.6s); heldout=2,000,000 bytes +[paq_mixer] top order=11 unique pairs: 118,988,639 1.9s +[paq_mixer] order k=11 ctx_len=10 ctxs=84,084,448 rows=118,988,639 14.8s +[paq_mixer] order k=10 ctx_len=9 ctxs=54,600,791 rows=84,084,448 9.3s +[paq_mixer] order k=9 ctx_len=8 ctxs=31,859,845 rows=54,600,791 5.7s +[paq_mixer] order k=8 ctx_len=7 ctxs=16,254,833 rows=31,859,845 2.8s [paq_mixer] order k=7 ctx_len=6 ctxs=7,004,457 rows=16,254,833 1.3s -[paq_mixer] order k=6 ctx_len=5 ctxs=2,434,266 rows=7,004,457 0.5s +[paq_mixer] order k=6 ctx_len=5 ctxs=2,434,266 rows=7,004,457 0.4s [paq_mixer] order k=5 ctx_len=4 ctxs=636,106 rows=2,434,266 0.1s [paq_mixer] order k=4 ctx_len=3 ctxs=122,668 rows=636,106 0.0s [paq_mixer] order k=3 ctx_len=2 ctxs=12,277 rows=122,668 0.0s [paq_mixer] order k=2 ctx_len=1 ctxs=204 rows=12,277 0.0s [paq_mixer] order k=1 ctx_len=0 ctxs=1 rows=204 0.0s -[paq_mixer] tables built in 86.2s -[paq_mixer] collected 200,000 mixer training samples feat_dim=34 (29.5s) +[paq_mixer] tables built in 37.1s +[paq_mixer] collected 200,000 mixer training samples feat_dim=34 (12.0s) [paq_mixer] mixer step= 0 loss=1.4004 [paq_mixer] mixer step= 187 loss=1.0435 [paq_mixer] mixer step= 374 loss=1.0322 @@ -54,95 +54,93 @@ training submission /workspace/paq_mixer_v3.py ... [paq_mixer] mixer step=1309 loss=1.0324 [paq_mixer] mixer step=1496 loss=1.0380 [paq_mixer] mixer step=1499 loss=1.0537 -[paq_mixer] mixer fit done 5.1s last_loss=1.0537 -[paq_mixer] total build: 120.8s -training: 3,582.3 J duration=122.3s +[paq_mixer] mixer fit done 3.2s last_loss=1.0537 +[paq_mixer] total build: 52.4s +training: 2,355.2 J duration=53.2s evaluating on val split ... - eval 1,200/60,000 ( 2.0%) acc=0.6833 2522 char/s eta= 23s - eval 2,400/60,000 ( 4.0%) acc=0.6729 2518 char/s eta= 23s - eval 3,600/60,000 ( 6.0%) acc=0.6700 2521 char/s eta= 22s - eval 4,800/60,000 ( 8.0%) acc=0.6848 2517 char/s eta= 22s - eval 6,000/60,000 ( 10.0%) acc=0.6850 2520 char/s eta= 21s - eval 7,200/60,000 ( 12.0%) acc=0.6775 2524 char/s eta= 21s - eval 8,400/60,000 ( 14.0%) acc=0.6774 2526 char/s eta= 20s - eval 9,600/60,000 ( 16.0%) acc=0.6849 2524 char/s eta= 20s - eval 10,800/60,000 ( 18.0%) acc=0.6939 2521 char/s eta= 20s - eval 12,000/60,000 ( 20.0%) acc=0.6975 2519 char/s eta= 19s - eval 13,200/60,000 ( 22.0%) acc=0.7022 2516 char/s eta= 19s - eval 14,400/60,000 ( 24.0%) acc=0.7037 2515 char/s eta= 18s - eval 15,600/60,000 ( 26.0%) acc=0.7051 2514 char/s eta= 18s - eval 16,800/60,000 ( 28.0%) acc=0.7083 2512 char/s eta= 17s - eval 18,000/60,000 ( 30.0%) acc=0.7100 2511 char/s eta= 17s - eval 19,200/60,000 ( 32.0%) acc=0.7136 2509 char/s eta= 16s - eval 20,400/60,000 ( 34.0%) acc=0.7152 2508 char/s eta= 16s - eval 21,600/60,000 ( 36.0%) acc=0.7161 2509 char/s eta= 15s - eval 22,800/60,000 ( 38.0%) acc=0.7164 2508 char/s eta= 15s - eval 24,000/60,000 ( 40.0%) acc=0.7166 2508 char/s eta= 14s - eval 25,200/60,000 ( 42.0%) acc=0.7170 2509 char/s eta= 14s - eval 26,400/60,000 ( 44.0%) acc=0.7180 2509 char/s eta= 13s - eval 27,600/60,000 ( 46.0%) acc=0.7164 2509 char/s eta= 13s - eval 28,800/60,000 ( 48.0%) acc=0.7162 2511 char/s eta= 12s - eval 30,000/60,000 ( 50.0%) acc=0.7146 2512 char/s eta= 12s - eval 31,200/60,000 ( 52.0%) acc=0.7113 2514 char/s eta= 11s - eval 32,400/60,000 ( 54.0%) acc=0.7089 2516 char/s eta= 11s - eval 33,600/60,000 ( 56.0%) acc=0.7064 2516 char/s eta= 10s - eval 34,800/60,000 ( 58.0%) acc=0.7067 2516 char/s eta= 10s - eval 36,000/60,000 ( 60.0%) acc=0.7065 2515 char/s eta= 10s - eval 37,200/60,000 ( 62.0%) acc=0.7065 2514 char/s eta= 9s - eval 38,400/60,000 ( 64.0%) acc=0.7070 2513 char/s eta= 9s - eval 39,600/60,000 ( 66.0%) acc=0.7064 2513 char/s eta= 8s - eval 40,800/60,000 ( 68.0%) acc=0.7062 2512 char/s eta= 8s - eval 42,000/60,000 ( 70.0%) acc=0.7056 2510 char/s eta= 7s - eval 43,200/60,000 ( 72.0%) acc=0.7050 2510 char/s eta= 7s - eval 44,400/60,000 ( 74.0%) acc=0.7052 2510 char/s eta= 6s - eval 45,600/60,000 ( 76.0%) acc=0.7054 2510 char/s eta= 6s - eval 46,800/60,000 ( 78.0%) acc=0.7047 2511 char/s eta= 5s - eval 48,000/60,000 ( 80.0%) acc=0.7049 2512 char/s eta= 5s - eval 49,200/60,000 ( 82.0%) acc=0.7043 2513 char/s eta= 4s - eval 50,400/60,000 ( 84.0%) acc=0.7046 2514 char/s eta= 4s - eval 51,600/60,000 ( 86.0%) acc=0.7047 2515 char/s eta= 3s - eval 52,800/60,000 ( 88.0%) acc=0.7034 2518 char/s eta= 3s - eval 54,000/60,000 ( 90.0%) acc=0.7034 2520 char/s eta= 2s - eval 55,200/60,000 ( 92.0%) acc=0.7028 2521 char/s eta= 2s - eval 56,400/60,000 ( 94.0%) acc=0.7021 2522 char/s eta= 1s - eval 57,600/60,000 ( 96.0%) acc=0.7028 2523 char/s eta= 1s - eval 58,800/60,000 ( 98.0%) acc=0.7036 2524 char/s eta= 0s - eval 60,000/60,000 (100.0%) acc=0.7047 2525 char/s eta= 0s -chars=60,000 acc=0.7047 eval_duration=23.8s + eval 1,200/60,000 ( 2.0%) acc=0.6833 3703 char/s eta= 16s + eval 2,400/60,000 ( 4.0%) acc=0.6729 3699 char/s eta= 16s + eval 3,600/60,000 ( 6.0%) acc=0.6700 3707 char/s eta= 15s + eval 4,800/60,000 ( 8.0%) acc=0.6848 3708 char/s eta= 15s + eval 6,000/60,000 ( 10.0%) acc=0.6850 3715 char/s eta= 15s + eval 7,200/60,000 ( 12.0%) acc=0.6775 3721 char/s eta= 14s + eval 8,400/60,000 ( 14.0%) acc=0.6774 3727 char/s eta= 14s + eval 9,600/60,000 ( 16.0%) acc=0.6849 3727 char/s eta= 14s + eval 10,800/60,000 ( 18.0%) acc=0.6939 3725 char/s eta= 13s + eval 12,000/60,000 ( 20.0%) acc=0.6975 3724 char/s eta= 13s + eval 13,200/60,000 ( 22.0%) acc=0.7022 3723 char/s eta= 13s + eval 14,400/60,000 ( 24.0%) acc=0.7037 3721 char/s eta= 12s + eval 15,600/60,000 ( 26.0%) acc=0.7051 3721 char/s eta= 12s + eval 16,800/60,000 ( 28.0%) acc=0.7083 3721 char/s eta= 12s + eval 18,000/60,000 ( 30.0%) acc=0.7100 3721 char/s eta= 11s + eval 19,200/60,000 ( 32.0%) acc=0.7136 3719 char/s eta= 11s + eval 20,400/60,000 ( 34.0%) acc=0.7152 3718 char/s eta= 11s + eval 21,600/60,000 ( 36.0%) acc=0.7161 3719 char/s eta= 10s + eval 22,800/60,000 ( 38.0%) acc=0.7164 3718 char/s eta= 10s + eval 24,000/60,000 ( 40.0%) acc=0.7166 3718 char/s eta= 10s + eval 25,200/60,000 ( 42.0%) acc=0.7170 3718 char/s eta= 9s + eval 26,400/60,000 ( 44.0%) acc=0.7180 3718 char/s eta= 9s + eval 27,600/60,000 ( 46.0%) acc=0.7164 3718 char/s eta= 9s + eval 28,800/60,000 ( 48.0%) acc=0.7162 3720 char/s eta= 8s + eval 30,000/60,000 ( 50.0%) acc=0.7146 3721 char/s eta= 8s + eval 31,200/60,000 ( 52.0%) acc=0.7113 3723 char/s eta= 8s + eval 32,400/60,000 ( 54.0%) acc=0.7089 3726 char/s eta= 7s + eval 33,600/60,000 ( 56.0%) acc=0.7064 3728 char/s eta= 7s + eval 34,800/60,000 ( 58.0%) acc=0.7067 3728 char/s eta= 7s + eval 36,000/60,000 ( 60.0%) acc=0.7065 3728 char/s eta= 6s + eval 37,200/60,000 ( 62.0%) acc=0.7065 3728 char/s eta= 6s + eval 38,400/60,000 ( 64.0%) acc=0.7070 3728 char/s eta= 6s + eval 39,600/60,000 ( 66.0%) acc=0.7064 3728 char/s eta= 5s + eval 40,800/60,000 ( 68.0%) acc=0.7062 3727 char/s eta= 5s + eval 42,000/60,000 ( 70.0%) acc=0.7056 3727 char/s eta= 5s + eval 43,200/60,000 ( 72.0%) acc=0.7050 3726 char/s eta= 5s + eval 44,400/60,000 ( 74.0%) acc=0.7052 3726 char/s eta= 4s + eval 45,600/60,000 ( 76.0%) acc=0.7054 3725 char/s eta= 4s + eval 46,800/60,000 ( 78.0%) acc=0.7047 3725 char/s eta= 4s + eval 48,000/60,000 ( 80.0%) acc=0.7049 3724 char/s eta= 3s + eval 49,200/60,000 ( 82.0%) acc=0.7043 3724 char/s eta= 3s + eval 50,400/60,000 ( 84.0%) acc=0.7046 3724 char/s eta= 3s + eval 51,600/60,000 ( 86.0%) acc=0.7047 3724 char/s eta= 2s + eval 52,800/60,000 ( 88.0%) acc=0.7034 3727 char/s eta= 2s + eval 54,000/60,000 ( 90.0%) acc=0.7034 3727 char/s eta= 2s + eval 55,200/60,000 ( 92.0%) acc=0.7028 3728 char/s eta= 1s + eval 56,400/60,000 ( 94.0%) acc=0.7021 3728 char/s eta= 1s + eval 57,600/60,000 ( 96.0%) acc=0.7028 3727 char/s eta= 1s + eval 58,800/60,000 ( 98.0%) acc=0.7036 3727 char/s eta= 0s + eval 60,000/60,000 (100.0%) acc=0.7047 3727 char/s eta= 0s +chars=60,000 acc=0.7047 eval_duration=16.1s --- submission : paq_mixer_v3 -training energy (J): 3,582.3 -training duration : 122.3s +training energy (J): 2,355.2 +training duration : 53.2s val char-accuracy : 0.7047 val chars : 60,000 wrote /tmp/result.json Stopping app - local entrypoint completed. ✓ App completed. View run at -https://modal.com/apps/gabriel-nakajima-an/main/ap-Sm0PHVmoPmOFQsokhYXdqV +https://modal.com/apps/gabriel-nakajima-an/main/ap-Dycl40rOPJQ1MEo4LFRzo5 # final result { "submission": "paq_mixer_v3", - "training_energy_J": 3582.3155354, - "training_duration_s": 122.294609292, - "cpu_energy_J": 5167.742507545003, - "total_energy_J": 8750.058042945002, + "training_energy_J": 2355.22674605, + "training_duration_s": 53.155205079, + "cpu_energy_J": 2251.575883315001, + "total_energy_J": 4606.802629365002, "val_char_accuracy": 0.70475, "val_chars": 60000, - "gpu_name": "NVIDIA A100-SXM4-80GB", - "date_utc": "2026-05-20T07:12:19Z", + "gpu_name": "NVIDIA A100 80GB PCIe", + "date_utc": "2026-05-21T05:05:13Z", "_nvml": { "nvml_available": true, "energy_counter_supported": true, "monotonic": true, - "idle_watts": 66.40836666666658, - "stress_watts_avg": 345.33368906733165, - "stress_energy_joules": 13068.953, - "stress_duration_s": 37.84441951, - "gpu_name": "NVIDIA A100-SXM4-80GB", + "idle_watts": 56.77781666666669, + "stress_watts_avg": 232.3223612785609, + "stress_energy_joules": 8489.282, + "stress_duration_s": 36.540959524, + "gpu_name": "NVIDIA A100 80GB PCIe", "notes": [] }, "contributor": "@worker-paq-mixer" } --mixer" -} diff --git a/submissions/subset_70_mkn/nvml.json b/submissions/subset_70_mkn/nvml.json index 8380f6b..0ec523a 100644 --- a/submissions/subset_70_mkn/nvml.json +++ b/submissions/subset_70_mkn/nvml.json @@ -2,10 +2,10 @@ "nvml_available": true, "energy_counter_supported": true, "monotonic": true, - "idle_watts": 57.06305084745769, - "stress_watts_avg": 333.0922109881346, - "stress_energy_joules": 12488.622, - "stress_duration_s": 37.492987191, - "gpu_name": "NVIDIA A100-SXM4-80GB", + "idle_watts": 58.36485000000002, + "stress_watts_avg": 233.1941013031682, + "stress_energy_joules": 8767.363, + "stress_duration_s": 37.596847223, + "gpu_name": "NVIDIA A100 80GB PCIe", "notes": [] } diff --git a/submissions/subset_70_mkn/result.json b/submissions/subset_70_mkn/result.json index 2ca5e57..e956192 100644 --- a/submissions/subset_70_mkn/result.json +++ b/submissions/subset_70_mkn/result.json @@ -1,22 +1,22 @@ { "submission": "subset_70_mkn", - "training_energy_J": 1064.6838474000006, - "training_duration_s": 41.054503051999994, - "cpu_energy_J": 1736.325936897499, - "total_energy_J": 2801.0097842974997, + "training_energy_J": 1350.8209175499999, + "training_duration_s": 26.514841649000005, + "cpu_energy_J": 1123.5610902799988, + "total_energy_J": 2474.3820078299987, "val_char_accuracy": 0.7031333333333334, "val_chars": 60000, - "gpu_name": "NVIDIA A100-SXM4-80GB", - "date_utc": "2026-05-20T07:32:40Z", + "gpu_name": "NVIDIA A100 80GB PCIe", + "date_utc": "2026-05-21T05:05:01Z", "_nvml": { "nvml_available": true, "energy_counter_supported": true, "monotonic": true, - "idle_watts": 57.06305084745769, - "stress_watts_avg": 333.0922109881346, - "stress_energy_joules": 12488.622, - "stress_duration_s": 37.492987191, - "gpu_name": "NVIDIA A100-SXM4-80GB", + "idle_watts": 58.36485000000002, + "stress_watts_avg": 233.1941013031682, + "stress_energy_joules": 8767.363, + "stress_duration_s": 37.596847223, + "gpu_name": "NVIDIA A100 80GB PCIe", "notes": [] }, "contributor": "@exp-batch-iter4" diff --git a/submissions/subset_70_mkn/run.log b/submissions/subset_70_mkn/run.log index 46c551e..4a514ec 100644 --- a/submissions/subset_70_mkn/run.log +++ b/submissions/subset_70_mkn/run.log @@ -1,25 +1,25 @@ -# wikitext submit.py log — subset_70_mkn — 2026-05-20T07:30:25+00:00Z +# wikitext submit.py log — subset_70_mkn — 2026-05-21T05:03:02+00:00Z [modal] launching A100-80GB ... ✓ Initialized. View run at -https://modal.com/apps/gabriel-nakajima-an/main/ap-shQS1Hiyo4OPhMcNN4Xy5N +https://modal.com/apps/gabriel-nakajima-an/main/ap-TnCfSdLjln33sQ58a3CqLJ ✓ Created objects. ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/submit.py ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/task.py ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/verify_nvml.py -├── 🔨 Created mount /Users/naka/src/sutro/wikitext/run_eval.py ├── 🔨 Created mount /Users/naka/src/sutro/wikitext/wikitext.py +├── 🔨 Created mount /Users/naka/src/sutro/wikitext/run_eval.py └── 🔨 Created function run_submission. [modal] verifying NVML energy counter ... -GPU: NVIDIA A100-SXM4-80GB +GPU: NVIDIA A100 80GB PCIe sampling idle power for 3s ... - idle: 57.1 W + idle: 58.4 W running 30s stress workload ... - duration: 37.5 s - energy delta: 12,488.6 J - avg power: 333.1 W + duration: 37.6 s + energy delta: 8,767.4 J + avg power: 233.2 W monotonic: True --- -{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 57.06305084745769, "stress_watts_avg": 333.0922109881346, "stress_energy_joules": 12488.622, "stress_duration_s": 37.492987191, "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": []} +{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 58.36485000000002, "stress_watts_avg": 233.1941013031682, "stress_energy_joules": 8767.363, "stress_duration_s": 37.596847223, "gpu_name": "NVIDIA A100 80GB PCIe", "notes": []} [modal] running submission (TEST_CHARS=60000 MAX_TRAIN_SECONDS=300.0 ACC_MIN=0.7) ... loading WikiText-103 from /data ... train chars: 540,095,682 @@ -27,17 +27,17 @@ loading WikiText-103 from /data ... train wall-clock cap: 300 s val accuracy floor : 0.7000 training submission /workspace/subset_70_mkn.py ... -[codecarbon WARNING @ 07:31:20] Multiple instances of codecarbon are allowed to run at the same time. +[codecarbon WARNING @ 05:04:02] Multiple instances of codecarbon are allowed to run at the same time. [gpu_ngram_w3] starting build; max_order=11 D=0.5 [gpu_ngram_w3] SUBSET 0.7 -> 378,767,828 train bytes -[gpu_ngram_w3] encoded train: 378,767,828 bytes (0.6s) +[gpu_ngram_w3] encoded train: 378,767,828 bytes (0.3s) [gpu_ngram_w3] top order=11 unique pairs: 93,376,155 1.4s -[gpu_ngram_w3] ctx_len=10 ctxs=66,967,773 rows=93,376,155 15.2s -[gpu_ngram_w3] ctx_len=9 ctxs=44,196,096 rows=66,967,774 9.9s -[gpu_ngram_w3] ctx_len=8 ctxs=26,241,880 rows=44,196,096 5.7s -[gpu_ngram_w3] ctx_len=7 ctxs=13,634,362 rows=26,241,880 3.3s -[gpu_ngram_w3] ctx_len=6 ctxs=5,986,883 rows=13,634,362 1.5s -[gpu_ngram_w3] ctx_len=5 ctxs=2,116,383 rows=5,986,883 0.6s +[gpu_ngram_w3] ctx_len=10 ctxs=66,967,773 rows=93,376,155 8.5s +[gpu_ngram_w3] ctx_len=9 ctxs=44,196,096 rows=66,967,774 6.9s +[gpu_ngram_w3] ctx_len=8 ctxs=26,241,880 rows=44,196,096 4.3s +[gpu_ngram_w3] ctx_len=7 ctxs=13,634,362 rows=26,241,880 2.0s +[gpu_ngram_w3] ctx_len=6 ctxs=5,986,883 rows=13,634,362 1.2s +[gpu_ngram_w3] ctx_len=5 ctxs=2,116,383 rows=5,986,883 0.4s [gpu_ngram_w3] ctx_len=4 ctxs=562,545 rows=2,116,383 0.1s [gpu_ngram_w3] ctx_len=3 ctxs=110,361 rows=562,545 0.0s [gpu_ngram_w3] ctx_len=2 ctxs=11,730 rows=110,361 0.0s @@ -53,92 +53,92 @@ training submission /workspace/subset_70_mkn.py ... [mkn] k=8 D1=0.658 D2=1.091 D3=1.442 (n1=25150544, n2=6528748, n3=3003917) [mkn] k=9 D1=0.683 D2=1.108 D3=1.436 (n1=41521363, n2=9620057, n3=4187138) [mkn] k=10 D1=0.708 D2=1.126 D3=1.431 (n1=62211762, n2=12816182, n3=5273752) -[mkn] discounts computed: 1.3s -[gpu_ngram_w3] total build: 39.7s -training: 1,064.7 J duration=41.1s +[mkn] discounts computed: 0.6s +[gpu_ngram_w3] total build: 25.7s +training: 1,350.8 J duration=26.5s evaluating on val split ... - eval 1,200/60,000 ( 2.0%) acc=0.6883 1728 char/s eta= 34s - eval 2,400/60,000 ( 4.0%) acc=0.6746 1762 char/s eta= 33s - eval 3,600/60,000 ( 6.0%) acc=0.6722 1724 char/s eta= 33s - eval 4,800/60,000 ( 8.0%) acc=0.6867 1741 char/s eta= 32s - eval 6,000/60,000 ( 10.0%) acc=0.6870 1757 char/s eta= 31s - eval 7,200/60,000 ( 12.0%) acc=0.6806 1752 char/s eta= 30s - eval 8,400/60,000 ( 14.0%) acc=0.6799 1772 char/s eta= 29s - eval 9,600/60,000 ( 16.0%) acc=0.6864 1762 char/s eta= 29s - eval 10,800/60,000 ( 18.0%) acc=0.6951 1786 char/s eta= 28s - eval 12,000/60,000 ( 20.0%) acc=0.6977 1760 char/s eta= 27s - eval 13,200/60,000 ( 22.0%) acc=0.7017 1747 char/s eta= 27s - eval 14,400/60,000 ( 24.0%) acc=0.7035 1765 char/s eta= 26s - eval 15,600/60,000 ( 26.0%) acc=0.7056 1749 char/s eta= 25s - eval 16,800/60,000 ( 28.0%) acc=0.7089 1747 char/s eta= 25s - eval 18,000/60,000 ( 30.0%) acc=0.7106 1730 char/s eta= 24s - eval 19,200/60,000 ( 32.0%) acc=0.7143 1733 char/s eta= 24s - eval 20,400/60,000 ( 34.0%) acc=0.7155 1738 char/s eta= 23s - eval 21,600/60,000 ( 36.0%) acc=0.7163 1746 char/s eta= 22s - eval 22,800/60,000 ( 38.0%) acc=0.7168 1741 char/s eta= 21s - eval 24,000/60,000 ( 40.0%) acc=0.7168 1752 char/s eta= 21s - eval 25,200/60,000 ( 42.0%) acc=0.7169 1761 char/s eta= 20s - eval 26,400/60,000 ( 44.0%) acc=0.7181 1765 char/s eta= 19s - eval 27,600/60,000 ( 46.0%) acc=0.7165 1756 char/s eta= 18s - eval 28,800/60,000 ( 48.0%) acc=0.7165 1760 char/s eta= 18s - eval 30,000/60,000 ( 50.0%) acc=0.7152 1769 char/s eta= 17s - eval 31,200/60,000 ( 52.0%) acc=0.7122 1777 char/s eta= 16s - eval 32,400/60,000 ( 54.0%) acc=0.7098 1780 char/s eta= 16s - eval 33,600/60,000 ( 56.0%) acc=0.7074 1783 char/s eta= 15s - eval 34,800/60,000 ( 58.0%) acc=0.7074 1784 char/s eta= 14s - eval 36,000/60,000 ( 60.0%) acc=0.7070 1783 char/s eta= 13s - eval 37,200/60,000 ( 62.0%) acc=0.7068 1783 char/s eta= 13s - eval 38,400/60,000 ( 64.0%) acc=0.7070 1783 char/s eta= 12s - eval 39,600/60,000 ( 66.0%) acc=0.7061 1780 char/s eta= 11s - eval 40,800/60,000 ( 68.0%) acc=0.7057 1776 char/s eta= 11s - eval 42,000/60,000 ( 70.0%) acc=0.7050 1773 char/s eta= 10s - eval 43,200/60,000 ( 72.0%) acc=0.7044 1771 char/s eta= 9s - eval 44,400/60,000 ( 74.0%) acc=0.7045 1760 char/s eta= 9s - eval 45,600/60,000 ( 76.0%) acc=0.7043 1749 char/s eta= 8s - eval 46,800/60,000 ( 78.0%) acc=0.7037 1736 char/s eta= 8s - eval 48,000/60,000 ( 80.0%) acc=0.7039 1738 char/s eta= 7s - eval 49,200/60,000 ( 82.0%) acc=0.7033 1738 char/s eta= 6s - eval 50,400/60,000 ( 84.0%) acc=0.7037 1737 char/s eta= 6s - eval 51,600/60,000 ( 86.0%) acc=0.7036 1735 char/s eta= 5s - eval 52,800/60,000 ( 88.0%) acc=0.7023 1733 char/s eta= 4s - eval 54,000/60,000 ( 90.0%) acc=0.7024 1729 char/s eta= 3s - eval 55,200/60,000 ( 92.0%) acc=0.7018 1730 char/s eta= 3s - eval 56,400/60,000 ( 94.0%) acc=0.7010 1730 char/s eta= 2s - eval 57,600/60,000 ( 96.0%) acc=0.7013 1731 char/s eta= 1s - eval 58,800/60,000 ( 98.0%) acc=0.7019 1731 char/s eta= 1s - eval 60,000/60,000 (100.0%) acc=0.7031 1734 char/s eta= 0s -chars=60,000 acc=0.7031 eval_duration=34.6s + eval 1,200/60,000 ( 2.0%) acc=0.6883 2068 char/s eta= 28s + eval 2,400/60,000 ( 4.0%) acc=0.6746 2099 char/s eta= 27s + eval 3,600/60,000 ( 6.0%) acc=0.6722 2110 char/s eta= 27s + eval 4,800/60,000 ( 8.0%) acc=0.6867 2117 char/s eta= 26s + eval 6,000/60,000 ( 10.0%) acc=0.6870 2131 char/s eta= 25s + eval 7,200/60,000 ( 12.0%) acc=0.6806 2141 char/s eta= 25s + eval 8,400/60,000 ( 14.0%) acc=0.6799 2147 char/s eta= 24s + eval 9,600/60,000 ( 16.0%) acc=0.6864 2146 char/s eta= 23s + eval 10,800/60,000 ( 18.0%) acc=0.6951 2145 char/s eta= 23s + eval 12,000/60,000 ( 20.0%) acc=0.6977 2143 char/s eta= 22s + eval 13,200/60,000 ( 22.0%) acc=0.7017 2141 char/s eta= 22s + eval 14,400/60,000 ( 24.0%) acc=0.7035 2141 char/s eta= 21s + eval 15,600/60,000 ( 26.0%) acc=0.7056 2140 char/s eta= 21s + eval 16,800/60,000 ( 28.0%) acc=0.7089 2141 char/s eta= 20s + eval 18,000/60,000 ( 30.0%) acc=0.7106 2140 char/s eta= 20s + eval 19,200/60,000 ( 32.0%) acc=0.7143 2137 char/s eta= 19s + eval 20,400/60,000 ( 34.0%) acc=0.7155 2135 char/s eta= 19s + eval 21,600/60,000 ( 36.0%) acc=0.7163 2136 char/s eta= 18s + eval 22,800/60,000 ( 38.0%) acc=0.7168 2136 char/s eta= 17s + eval 24,000/60,000 ( 40.0%) acc=0.7168 2136 char/s eta= 17s + eval 25,200/60,000 ( 42.0%) acc=0.7169 2135 char/s eta= 16s + eval 26,400/60,000 ( 44.0%) acc=0.7181 2134 char/s eta= 16s + eval 27,600/60,000 ( 46.0%) acc=0.7165 2134 char/s eta= 15s + eval 28,800/60,000 ( 48.0%) acc=0.7165 2136 char/s eta= 15s + eval 30,000/60,000 ( 50.0%) acc=0.7152 2137 char/s eta= 14s + eval 31,200/60,000 ( 52.0%) acc=0.7122 2138 char/s eta= 13s + eval 32,400/60,000 ( 54.0%) acc=0.7098 2141 char/s eta= 13s + eval 33,600/60,000 ( 56.0%) acc=0.7074 2143 char/s eta= 12s + eval 34,800/60,000 ( 58.0%) acc=0.7074 2143 char/s eta= 12s + eval 36,000/60,000 ( 60.0%) acc=0.7070 2142 char/s eta= 11s + eval 37,200/60,000 ( 62.0%) acc=0.7068 2142 char/s eta= 11s + eval 38,400/60,000 ( 64.0%) acc=0.7070 2142 char/s eta= 10s + eval 39,600/60,000 ( 66.0%) acc=0.7061 2142 char/s eta= 10s + eval 40,800/60,000 ( 68.0%) acc=0.7057 2143 char/s eta= 9s + eval 42,000/60,000 ( 70.0%) acc=0.7050 2142 char/s eta= 8s + eval 43,200/60,000 ( 72.0%) acc=0.7044 2141 char/s eta= 8s + eval 44,400/60,000 ( 74.0%) acc=0.7045 2140 char/s eta= 7s + eval 45,600/60,000 ( 76.0%) acc=0.7043 2140 char/s eta= 7s + eval 46,800/60,000 ( 78.0%) acc=0.7037 2139 char/s eta= 6s + eval 48,000/60,000 ( 80.0%) acc=0.7039 2138 char/s eta= 6s + eval 49,200/60,000 ( 82.0%) acc=0.7033 2138 char/s eta= 5s + eval 50,400/60,000 ( 84.0%) acc=0.7037 2137 char/s eta= 4s + eval 51,600/60,000 ( 86.0%) acc=0.7036 2137 char/s eta= 4s + eval 52,800/60,000 ( 88.0%) acc=0.7023 2140 char/s eta= 3s + eval 54,000/60,000 ( 90.0%) acc=0.7024 2140 char/s eta= 3s + eval 55,200/60,000 ( 92.0%) acc=0.7018 2141 char/s eta= 2s + eval 56,400/60,000 ( 94.0%) acc=0.7010 2141 char/s eta= 2s + eval 57,600/60,000 ( 96.0%) acc=0.7013 2140 char/s eta= 1s + eval 58,800/60,000 ( 98.0%) acc=0.7019 2140 char/s eta= 1s + eval 60,000/60,000 (100.0%) acc=0.7031 2140 char/s eta= 0s +chars=60,000 acc=0.7031 eval_duration=28.0s --- submission : subset_70_mkn -training energy (J): 1,064.7 -training duration : 41.1s +training energy (J): 1,350.8 +training duration : 26.5s val char-accuracy : 0.7031 val chars : 60,000 wrote /tmp/result.json Stopping app - local entrypoint completed. ✓ App completed. View run at -https://modal.com/apps/gabriel-nakajima-an/main/ap-shQS1Hiyo4OPhMcNN4Xy5N +https://modal.com/apps/gabriel-nakajima-an/main/ap-TnCfSdLjln33sQ58a3CqLJ # final result { "submission": "subset_70_mkn", - "training_energy_J": 1064.6838474000006, - "training_duration_s": 41.054503051999994, - "cpu_energy_J": 1736.325936897499, - "total_energy_J": 2801.0097842974997, + "training_energy_J": 1350.8209175499999, + "training_duration_s": 26.514841649000005, + "cpu_energy_J": 1123.5610902799988, + "total_energy_J": 2474.3820078299987, "val_char_accuracy": 0.7031333333333334, "val_chars": 60000, - "gpu_name": "NVIDIA A100-SXM4-80GB", - "date_utc": "2026-05-20T07:32:40Z", + "gpu_name": "NVIDIA A100 80GB PCIe", + "date_utc": "2026-05-21T05:05:01Z", "_nvml": { "nvml_available": true, "energy_counter_supported": true, "monotonic": true, - "idle_watts": 57.06305084745769, - "stress_watts_avg": 333.0922109881346, - "stress_energy_joules": 12488.622, - "stress_duration_s": 37.492987191, - "gpu_name": "NVIDIA A100-SXM4-80GB", + "idle_watts": 58.36485000000002, + "stress_watts_avg": 233.1941013031682, + "stress_energy_joules": 8767.363, + "stress_duration_s": 37.596847223, + "gpu_name": "NVIDIA A100 80GB PCIe", "notes": [] }, "contributor": "@exp-batch-iter4" diff --git a/submit.py b/submit.py index cfefb3a..c8ec97d 100755 --- a/submit.py +++ b/submit.py @@ -109,6 +109,12 @@ # tables (used as a deterministic algorithm, not a "pretrained # weight" — see README "Internal representations"). .pip_install("tiktoken==0.7.0") + # CodeCarbon: CPU energy estimation backend used by EnergyMeter to + # populate cpu_energy_J + total_energy_J in result.json. The TDP + # fallback path is used (no MSR access in Modal containers); accuracy + # is acceptable for total-system-energy reporting per the field + # standard (HuggingFace Trainer, Patterson et al. 2021/2022). + .pip_install("codecarbon~=3.2") # Modal re-imports submit.py inside the container to resolve the # remote function. submit.py does a top-level `import task`, so # /workspace (where task.py lands via add_local_file) must be on @@ -317,10 +323,18 @@ def append_record(result: dict, dir_relpath: str) -> None: Replaces the placeholder dash row if present, otherwise appends. Disqualified rows render their accuracy cell as ``DQ`` so they don't pollute the leaderboard sort. + + The energy column reports ``total_energy_J`` (GPU NVML + CodeCarbon + CPU estimate) when the new harness produced it, falling back to + ``training_energy_J`` (NVML-only) for runs predating the + total-system-energy change. See ``MAINTAINING.md`` for the dated + semantics of the column over time. """ readme = HERE / "README.md" text = readme.read_text() - energy = result.get("training_energy_J") + energy = result.get("total_energy_J") + if energy is None: + energy = result.get("training_energy_J") energy_cell = f"{energy:>10,.0f}" if energy is not None else " —" if result.get("disqualified"): acc_cell = " DQ" diff --git a/test_wikitext.py b/test_wikitext.py index c2bebdd..44dc93e 100644 --- a/test_wikitext.py +++ b/test_wikitext.py @@ -102,6 +102,157 @@ def test_energy_meter_fallback_when_no_nvml() -> None: assert m.duration_s >= 0 +def test_energy_meter_total_is_gpu_plus_cpu() -> None: + """When both GPU and CPU backends return values, total_energy_J = sum.""" + class _StubGpuBackend: + available = True + def start(self) -> None: pass + def stop(self, duration_s: float) -> float: return 1000.0 # net joules + + class _StubCpuBackend: + available = True + def start(self) -> None: pass + def stop(self) -> float: return 200.0 # net joules + + meter = EnergyMeter(gpu_backend=_StubGpuBackend(), cpu_backend=_StubCpuBackend()) + with meter.measure() as m: + pass + assert m.energy_joules == 1000.0 + assert m.cpu_energy_J == 200.0 + assert m.total_energy_J == 1200.0 + + +def test_energy_meter_raises_when_gpu_available_but_cpu_missing() -> None: + """If NVML works but the CPU backend doesn't, EnergyMeter must fail loudly. + + Silent half-measurement (GPU only, cpu_energy_J None) would land + inconsistent rows on the leaderboard. Loud-fail forces the operator + to fix the env (install codecarbon, or pass an explicit cpu_backend + for an intentional calibration without CPU tracking). + """ + import pytest + + class _StubGpu: + available = True + def start(self) -> None: pass + def stop(self, duration_s: float) -> float: return 100.0 + + class _StubUnavailCpu: + available = False + def start(self) -> None: pass + def stop(self): return None + + with pytest.raises(RuntimeError, match="CPU energy backend"): + EnergyMeter(gpu_backend=_StubGpu(), cpu_backend=_StubUnavailCpu()) + + +def test_energy_meter_no_raise_when_cpu_present_but_gpu_missing() -> None: + """Dev pattern: CodeCarbon installed but no NVML — no raise. + + Loud-fail only triggers when NVML is available without CodeCarbon + (real GPU box, broken energy backend). A laptop with CodeCarbon + installed but no GPU should construct an EnergyMeter cleanly and + just not measure GPU energy. + """ + class _UnavailGpu: + available = False + def start(self) -> None: pass + def stop(self, duration_s: float = 0.0): return None + + class _AvailCpu: + available = True + def start(self) -> None: pass + def stop(self): return 100.0 + + meter = EnergyMeter(gpu_backend=_UnavailGpu(), cpu_backend=_AvailCpu()) + assert not meter.available + + +def test_total_energy_none_when_only_one_backend_yields_value() -> None: + """total_energy_J stays None if either backend returns None from stop().""" + class _GpuOk: + available = True + def start(self) -> None: pass + def stop(self, duration_s: float) -> float: return 100.0 + + class _CpuYieldsNone: + # available=True so the constructor doesn't raise, but stop() + # yields None — simulates a tracker that started OK and then + # failed to read its counter on the way out. + available = True + def start(self) -> None: pass + def stop(self): return None + + meter = EnergyMeter(gpu_backend=_GpuOk(), cpu_backend=_CpuYieldsNone()) + with meter.measure() as m: + pass + assert m.energy_joules == 100.0 + assert m.cpu_energy_J is None + assert m.total_energy_J is None + + +def test_energy_meter_dev_mode_no_raise_when_both_unavailable() -> None: + """Dev pattern: no NVML AND no CodeCarbon — soft, not loud. + + Local smoke tests on a CPU-only laptop must still be able to + construct an EnergyMeter without crashing; measurement just + returns None for everything. + """ + class _Unavail: + available = False + def start(self) -> None: pass + def stop(self, duration_s: float = 0.0): return None + + meter = EnergyMeter(gpu_backend=_Unavail(), cpu_backend=_Unavail()) + assert not meter.available + + +def test_default_cpu_backend_uses_codecarbon_when_installed() -> None: + """When CodeCarbon is installed, default cpu_backend populates cpu_energy_J.""" + import pytest + pytest.importorskip("codecarbon") + + class _StubGpu: + available = True + def start(self) -> None: pass + def stop(self, duration_s: float) -> float: return 100.0 + + meter = EnergyMeter(gpu_backend=_StubGpu()) # default cpu_backend + with meter.measure() as m: + sum(range(1_000_000)) # short CPU work + assert m.cpu_energy_J is not None, "default cpu_backend should populate cpu_energy_J" + assert m.cpu_energy_J >= 0.0 + assert m.total_energy_J is not None + assert m.total_energy_J >= 100.0 # at least the GPU contribution + + +def test_total_energy_enforces_wall_clock_floor() -> None: + """total_energy_J must be >= duration_s * p_floor_watts even when backends under-attribute.""" + class _LowGpu: + available = True + def start(self) -> None: pass + def stop(self, duration_s: float) -> float: return 5.0 # tiny GPU energy + + class _ZeroCpu: + available = True + def start(self) -> None: pass + def stop(self) -> float: return 0.0 # CodeCarbon under-attribution sim + + meter = EnergyMeter( + gpu_backend=_LowGpu(), + cpu_backend=_ZeroCpu(), + p_floor_watts=50.0, + ) + with meter.measure() as m: + time.sleep(0.4) # wall clock ~ 0.4s → floor ~ 20J + assert m.duration_s >= 0.3 + floor = m.duration_s * 50.0 + raw_sum = m.energy_joules + m.cpu_energy_J + # Floor must bind: raw sum is 5J, floor ~20J + assert m.total_energy_J >= floor + assert m.total_energy_J == max(raw_sum, floor) + + # --------------------------------------------------------------------------- # Wall-clock guard (README rule 4) # --------------------------------------------------------------------------- diff --git a/wikitext.py b/wikitext.py index 8b52b1e..06c23f9 100644 --- a/wikitext.py +++ b/wikitext.py @@ -159,10 +159,15 @@ def evaluate( class Measurement: energy_joules: float | None = None duration_s: float = 0.0 + cpu_energy_J: float | None = None + total_energy_J: float | None = None def __str__(self) -> str: e = (f"{self.energy_joules:,.1f} J" if self.energy_joules is not None else "energy: not measured") + if self.cpu_energy_J is not None and self.total_energy_J is not None: + e += (f" cpu={self.cpu_energy_J:,.1f} J" + f" total={self.total_energy_J:,.1f} J") return f"{e} duration={self.duration_s:.1f}s" @@ -189,42 +194,133 @@ class EnergyMeter: README rule 4 lives in ``wall_clock_guard`` instead. """ - def __init__(self, *, gpu_index: int = 0, idle_watts: float = 50.0): + def __init__(self, *, gpu_index: int = 0, idle_watts: float = 50.0, + gpu_backend=None, cpu_backend=None, p_floor_watts: float = 50.0): + self.gpu_index = gpu_index + self.idle_watts = idle_watts + self.p_floor_watts = p_floor_watts + self._gpu_backend = (gpu_backend if gpu_backend is not None + else _NvmlGpuBackend(gpu_index, idle_watts)) + self._cpu_backend = (cpu_backend if cpu_backend is not None + else _CodeCarbonCpuBackend()) + # Fail loudly if we're on a real GPU box but the CPU backend + # failed to load. Silent half-measurement would land inconsistent + # rows on the leaderboard. (Dev machines without NVML stay in + # soft "neither available" mode — no measurement, no crash.) + if self._gpu_backend.available and not self._cpu_backend.available: + raise RuntimeError( + "EnergyMeter: NVML is available but the CPU energy backend " + "is not. CodeCarbon is listed in requirements.txt and the " + "Modal image — install it (`pip install codecarbon`), or " + "pass an explicit cpu_backend if running a calibration " + "that intentionally skips CPU tracking." + ) + self.available = self._gpu_backend.available + + @contextmanager + def measure(self) -> Iterator[Measurement]: + m = Measurement() + if self._gpu_backend.available: + self._gpu_backend.start() + if self._cpu_backend.available: + self._cpu_backend.start() + t0 = time.monotonic() + try: + yield m + finally: + # Capture duration / energy even if the body raised (e.g. + # TrainingTimeoutError from wall_clock_guard) — caller can + # then report the partial numbers on the DQ row. + m.duration_s = time.monotonic() - t0 + if self._gpu_backend.available: + m.energy_joules = self._gpu_backend.stop(m.duration_s) + if self._cpu_backend.available: + m.cpu_energy_J = self._cpu_backend.stop() + if m.energy_joules is not None and m.cpu_energy_J is not None: + raw_sum = m.energy_joules + m.cpu_energy_J + floor = m.duration_s * self.p_floor_watts + m.total_energy_J = max(raw_sum, floor) + + +class _NvmlGpuBackend: + """Default GPU energy backend wrapping pynvml's + ``nvmlDeviceGetTotalEnergyConsumption`` counter with idle subtraction.""" + + def __init__(self, gpu_index: int = 0, idle_watts: float = 50.0): self.gpu_index = gpu_index self.idle_watts = idle_watts self.available = False self._handle = None self._pynvml = None + self._e0: int | None = None try: import pynvml # type: ignore[import-not-found] pynvml.nvmlInit() self._handle = pynvml.nvmlDeviceGetHandleByIndex(gpu_index) - # Probe the energy counter; if unsupported, fall back. pynvml.nvmlDeviceGetTotalEnergyConsumption(self._handle) self._pynvml = pynvml self.available = True except Exception: - self.available = False + pass - @contextmanager - def measure(self) -> Iterator[Measurement]: - m = Measurement() - e0: int | None = None + def start(self) -> None: if self.available and self._pynvml is not None: - e0 = self._pynvml.nvmlDeviceGetTotalEnergyConsumption(self._handle) - t0 = time.monotonic() + self._e0 = self._pynvml.nvmlDeviceGetTotalEnergyConsumption(self._handle) + + # ``stop`` takes ``duration_s`` because NVML returns a running total + # and we subtract ``idle_watts * duration`` to get net training + # energy. The CPU backend's ``stop`` doesn't need a duration arg — + # CodeCarbon's tracker timestamps its own start/stop internally. + def stop(self, duration_s: float) -> float | None: + if not (self.available and self._pynvml is not None and self._e0 is not None): + return None + e1 = self._pynvml.nvmlDeviceGetTotalEnergyConsumption(self._handle) + e_run_j = (e1 - self._e0) / 1000.0 # NVML returns millijoules + e_idle_j = duration_s * self.idle_watts + return max(0.0, e_run_j - e_idle_j) + + +class _CodeCarbonCpuBackend: + """Default CPU energy backend wrapping CodeCarbon's ``EmissionsTracker``. + + Sets ``available = False`` if CodeCarbon is not installed. On its own + that is silent (returns ``None`` from ``stop()``), but the surrounding + ``EnergyMeter.__init__`` raises ``RuntimeError`` when NVML is + available and this backend is not — so a leaderboard run on Modal + fails loudly rather than silently dropping the CPU component. The + silent path is only reached on dev boxes that also have no NVML. + + Note: reads ``tracker._total_cpu_energy.kWh`` after stop. That + attribute is internal to CodeCarbon; we pin a minor version range in + ``requirements.txt`` (and the Modal image) to keep the path stable. + """ + + def __init__(self) -> None: + self.available = False + self._tracker = None + self._EmissionsTracker = None try: - yield m - finally: - # Capture duration / energy even if the body raised (e.g. - # TrainingTimeoutError from wall_clock_guard) — caller can - # then report the partial numbers on the DQ row. - m.duration_s = time.monotonic() - t0 - if self.available and self._pynvml is not None and e0 is not None: - e1 = self._pynvml.nvmlDeviceGetTotalEnergyConsumption(self._handle) - e_run_j = (e1 - e0) / 1000.0 # NVML returns millijoules - e_idle_j = m.duration_s * self.idle_watts - m.energy_joules = max(0.0, e_run_j - e_idle_j) + from codecarbon import EmissionsTracker + self._EmissionsTracker = EmissionsTracker + self.available = True + except Exception: + pass + + def start(self) -> None: + if not self.available or self._EmissionsTracker is None: + return + self._tracker = self._EmissionsTracker( + save_to_file=False, log_level="error", measure_power_secs=1.0 + ) + self._tracker.start() + + def stop(self) -> float | None: + if not self.available or self._tracker is None: + return None + self._tracker.stop() + kwh = self._tracker._total_cpu_energy.kWh + self._tracker = None + return kwh * 3.6e6 # kWh → J # --------------------------------------------------------------------------- From c46ec127fb32c7cbe081d1a10a017eabc32e175b Mon Sep 17 00:00:00 2001 From: Gabriel Nakajima An Date: Mon, 25 May 2026 15:04:43 -0700 Subject: [PATCH 2/5] README: move orphan rows into the table; add 13 PR-#5 entries MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit PR #5 merged without updating README's Record History — its 13 new submissions had no rows. PR #4's earlier auto-appends from ``submit.py:append_record`` placed re-run rows AFTER the ``[^2]`` footnote (orphan, wrong format, no GPU column). Cleaning both up: - Move the 9 orphan PASS rows from after the footnotes into the table proper, reformatted with the GPU column to match existing style. - Add the 4 PR-#5 DQ submissions that were missing entirely (``gpu_ngram_w31_k10``, ``chunker_phase1_v2``, ``bpe_internal_nn_v2``, ``mamba_byte``). - Drop the 2026-05-20 ``modded_nanogpt`` DQ row — it was a transient SXM4-scheduler failure that's been superseded by the 2026-05-21 PASS row in the same table; keeping it confuses the dir link. Fix the underlying bug in ``submit.py:append_record`` so future auto-appends land inside the table block instead of past the footnotes: new ``_insert_into_record_history_table`` helper walks the file, finds the Record History header + pipe-table block, and inserts the new row after the last pipe-prefixed line of that block. Falls back to the prior plain-append behaviour only if the table can't be located (defensive). Add ``scripts/validate_record_history.py`` — re-usable validator that: - parses the Record History markdown table, - flags orphan submission rows outside the table block, - cross-references the LATEST row per slot against the submission's current ``result.json`` (energy + accuracy within tolerance, PASS/DQ status matches), - catches duplicate PASS rows for the same submission on the same date. ``python3 scripts/validate_record_history.py`` now reports ``README Record History: OK`` on this branch. ``pytest test_wikitext.py`` → 15/15 still pass. Co-Authored-By: Claude Opus 4.7 (1M context) --- README.md | 27 ++-- scripts/validate_record_history.py | 235 +++++++++++++++++++++++++++++ submit.py | 54 ++++++- 3 files changed, 299 insertions(+), 17 deletions(-) create mode 100755 scripts/validate_record_history.py diff --git a/README.md b/README.md index ce37b6c..6093d84 100644 --- a/README.md +++ b/README.md @@ -35,9 +35,21 @@ The `Energy (J)` column reports **`total_energy_J`** (GPU NVML net of idle basel | 2026-05-18 | 3,612 | DQ | A100 80GB PCIe | chunker_d1 | [dir](research/catalog/new_directions/chunker_d1) | @ab-10 | | 2026-05-18 | 735 | DQ | A100 80GB PCIe | ppm_c | [dir](research/catalog/new_directions/ppm_c) | @ab-10 | | 2026-05-17 | 70 | DQ | A100 80GB SXM4 | P2-A_random_projection | [dir](research/forward-forward-deep/runs/phase2/P2-A_random_projection) | @ab-10 | -| 2026-05-20 | 53,683 | 0.7246 | A100 80GB PCIe | lwta_k4 | [dir](submissions/lwta_k4) | @ab-10 (re-run on new harness; total_J = 44,329 gpu + 9,354 cpu) | -| 2026-05-20 | 54,614 | 0.7145 | A100 80GB PCIe | lwta_k2 | [dir](submissions/lwta_k2) | @ab-10 (re-run on new harness; total_J = 44,583 gpu + 10,031 cpu) | -| 2026-05-20 | 66,747 | DQ | A100 80GB SXM4 | modded_nanogpt | [dir](submissions/modded_nanogpt) | @ab-10 (re-run on new harness landed on SXM4 and hit 300 s cap; re-running) | +| 2026-05-19 | 60,864 | DQ | A100 80GB PCIe | mamba_byte | [dir](submissions/mamba_byte) | @claude-mamba | +| 2026-05-20 | 1,752 | DQ | A100 80GB SXM4 | gpu_ngram_w31_k10 | [dir](submissions/gpu_ngram_w31_k10) | @follow-up-paq-prediction | +| 2026-05-20 | 13,936 | DQ | A100 80GB SXM4 | chunker_phase1_v2 | [dir](submissions/chunker_phase1_v2) | @explore-chunker-2026-05-19 | +| 2026-05-20 | 24,417 | DQ | A100 80GB SXM4 | bpe_internal_nn_v2 | [dir](submissions/bpe_internal_nn_v2) | @subagent-xorfix-2026-05-19 | +| 2026-05-20 | 53,683 | 0.7246 | A100 80GB PCIe | lwta_k4 | [dir](submissions/lwta_k4) | @ab-10 (re-run on new harness; total_J = 44,329 gpu + 9,354 cpu) | +| 2026-05-20 | 54,614 | 0.7145 | A100 80GB PCIe | lwta_k2 | [dir](submissions/lwta_k2) | @ab-10 (re-run on new harness; total_J = 44,583 gpu + 10,031 cpu) | +| 2026-05-21 | 2,474 | 0.7031 | A100 80GB PCIe | subset_70_mkn | [dir](submissions/subset_70_mkn) | @exp-batch-iter4 | +| 2026-05-21 | 3,092 | 0.7050 | A100 80GB PCIe | gpu_ngram_w31_k11 | [dir](submissions/gpu_ngram_w31_k11) | @follow-up-paq-prediction | +| 2026-05-21 | 4,607 | 0.7047 | A100 80GB PCIe | paq_mixer_v3 | [dir](submissions/paq_mixer_v3) | @worker-paq-mixer | +| 2026-05-21 | 8,602 | 0.7184 | A100 80GB PCIe | gpu_ngram_o14_xorfix | [dir](submissions/gpu_ngram_o14_xorfix) | @subagent-xorfix-2026-05-19 | +| 2026-05-21 | 9,591 | 0.7063 | A100 80GB PCIe | chunker_phase1_v1 | [dir](submissions/chunker_phase1_v1) | @explore-chunker-2026-05-19 | +| 2026-05-21 | 14,578 | 0.7184 | A100 80GB PCIe | deep_backoff_kn | [dir](submissions/deep_backoff_kn) | @nakajimagabriel | +| 2026-05-21 | 19,922 | 0.7328 | A100 80GB SXM4 | lwta_k4_alpha_065 | [dir](submissions/lwta_k4_alpha_065) | @subagent-L2clean-2026-05-19 | +| 2026-05-21 | 20,743 | 0.7390 | A100 80GB SXM4 | alpha_06 | [dir](submissions/alpha_06) | @subagent-xorfix-2026-05-19 | +| 2026-05-21 | 62,006 | 0.7337 | A100 80GB SXM4 | modded_nanogpt | [dir](submissions/modded_nanogpt) | @ab-10 | ## Rules @@ -69,12 +81,3 @@ For an internal-BPE submission, `predict()` returns `P(next_char | observed_char [^1]: More energy efficient [^2]: As of writing this -| 2026-05-21 | 3,092 | 0.7050 | gpu_ngram_w31_k11 | [dir](submissions/gpu_ngram_w31_k11) | @follow-up-paq-prediction | -| 2026-05-21 | 2,474 | 0.7031 | subset_70_mkn | [dir](submissions/subset_70_mkn) | @exp-batch-iter4 | -| 2026-05-21 | 4,607 | 0.7047 | paq_mixer_v3 | [dir](submissions/paq_mixer_v3) | @worker-paq-mixer | -| 2026-05-21 | 8,602 | 0.7184 | gpu_ngram_o14_xorfix | [dir](submissions/gpu_ngram_o14_xorfix) | @subagent-xorfix-2026-05-19 | -| 2026-05-21 | 14,578 | 0.7184 | deep_backoff_kn | [dir](submissions/deep_backoff_kn) | @nakajimagabriel | -| 2026-05-21 | 9,591 | 0.7063 | chunker_phase1_v1 | [dir](submissions/chunker_phase1_v1) | @explore-chunker-2026-05-19 | -| 2026-05-21 | 19,922 | 0.7328 | lwta_k4_alpha_065 | [dir](submissions/lwta_k4_alpha_065) | @subagent-L2clean-2026-05-19 | -| 2026-05-21 | 20,743 | 0.7390 | alpha_06 | [dir](submissions/alpha_06) | @subagent-xorfix-2026-05-19 | -| 2026-05-21 | 62,006 | 0.7337 | modded_nanogpt | [dir](submissions/modded_nanogpt) | @ab-10 | diff --git a/scripts/validate_record_history.py b/scripts/validate_record_history.py new file mode 100755 index 0000000..f4edb18 --- /dev/null +++ b/scripts/validate_record_history.py @@ -0,0 +1,235 @@ +#!/usr/bin/env python3 +"""Validate that README.md's Record History table is consistent with the +underlying submission result.json files. + +Run from the repo root: + + python3 scripts/validate_record_history.py + +Exit code is 0 if the table is consistent, 1 if any check fails. + +Checks performed: + +1. The Record History table parses cleanly (7 columns, all rows + well-formed). +2. No rows exist outside the table (no orphan submission rows after + footnotes — a regression caused by the prior ``append_record`` + placing rows past the end of the file). +3. For each submission row whose dir link points to ``submissions//``, + the linked ``result.json`` exists and its energy / accuracy match the + row to within reasonable tolerance. +4. No submission appears multiple times as PASS on the same date. +""" +from __future__ import annotations + +import json +import re +import sys +from pathlib import Path + +HERE = Path(__file__).resolve().parent.parent +README = HERE / "README.md" +SUBMISSIONS = HERE / "submissions" + +ENERGY_TOL_REL = 0.01 +ACC_TOL = 1e-4 + + +def main() -> int: + text = README.read_text() + failures: list[str] = [] + + table_rows, orphan_rows = _extract_table(text) + + if not table_rows: + failures.append("Could not find the Record History table.") + return _report(failures) + + if orphan_rows: + failures.append( + f"Found {len(orphan_rows)} orphan submission row(s) outside " + f"the Record History table:" + ) + for line_no, row in orphan_rows: + failures.append(f" line {line_no}: {row.strip()}") + + # Each submission slot may have multiple rows (one per re-run on a + # setup change). result.json is overwritten and only reflects the + # most recent run — so only the latest row per slot should match + # result.json. Earlier rows are historical and skipped. + parsed_rows: list[tuple[int, tuple[str, str, str, str, str, str, str]]] = [] + pass_by_config: dict[str, list[tuple[int, str, str]]] = {} + for line_no, row in table_rows: + parsed = _parse_row(row) + if parsed is None: + failures.append(f"line {line_no}: row failed to parse: {row.strip()}") + continue + parsed_rows.append((line_no, parsed)) + date, energy_cell, acc_cell, gpu, config, dir_link, contributor = parsed + if acc_cell != "DQ": + pass_by_config.setdefault(config, []).append((line_no, date, acc_cell)) + + # Group by slot dir, validate only the last (highest line_no) row + # against the slot's current result.json. + latest_by_slot: dict[str, tuple[int, tuple]] = {} + for line_no, parsed in parsed_rows: + _, _, _, _, _, dir_link, _ = parsed + m = re.match(r"\[dir\]\(submissions/([^)]+)\)", dir_link) + if not m: + continue + slot = m.group(1).rstrip("/") + latest_by_slot[slot] = (line_no, parsed) + + for slot, (line_no, parsed) in latest_by_slot.items(): + date, energy_cell, acc_cell, gpu, config, dir_link, contributor = parsed + result_path = SUBMISSIONS / slot / "result.json" + if not result_path.exists(): + failures.append( + f"line {line_no}: {slot}: result.json missing at {result_path}" + ) + continue + try: + result = json.loads(result_path.read_text()) + except json.JSONDecodeError as exc: + failures.append(f"line {line_no}: {slot}: result.json unreadable ({exc})") + continue + _check_row_against_result( + line_no, slot, energy_cell, acc_cell, result, failures + ) + + for config, rows in pass_by_config.items(): + dates = [r[1] for r in rows] + if len(set(dates)) < len(dates): + same_date = sorted(rows, key=lambda r: r[0]) + failures.append(f"{config}: multiple PASS rows on the same date:") + for line_no, date, acc in same_date: + failures.append(f" line {line_no}: {date} acc={acc}") + + return _report(failures) + + +def _extract_table(text: str) -> tuple[list[tuple[int, str]], list[tuple[int, str]]]: + """Return (table_rows, orphan_rows). + + table_rows: data rows inside the Record History markdown table. + orphan_rows: lines starting with ``|`` AFTER the table block closed, + i.e. submission-looking rows that landed past the table separator + (typically appended after footnotes by a buggy append_record). + """ + lines = text.splitlines() + in_record_history = False + in_table = False + table_rows: list[tuple[int, str]] = [] + orphans: list[tuple[int, str]] = [] + past_table = False + + for i, line in enumerate(lines, start=1): + stripped = line.strip() + if stripped.startswith("## Record History"): + in_record_history = True + continue + if in_record_history and not in_table: + if line.startswith("|") and "Energy" in line and "Val" in line: + in_table = True + continue + if in_table: + if line.startswith("|---") or line.startswith("|--"): + continue + if line.startswith("|"): + table_rows.append((i, line)) + continue + in_table = False + past_table = True + continue + if past_table: + if line.startswith("|") and "[dir](submissions/" in line: + orphans.append((i, line)) + + return table_rows, orphans + + +def _parse_row(row: str) -> tuple[str, str, str, str, str, str, str] | None: + cells = [c.strip() for c in row.strip().strip("|").split("|")] + if len(cells) < 6: + return None + if len(cells) == 6: + date, energy, acc, config, dir_link, contributor = cells + gpu = "" + elif len(cells) >= 7: + date, energy, acc, gpu, config, dir_link, contributor = cells[:7] + else: + return None + return date, energy, acc, gpu, config, dir_link, contributor + + +def _check_row_against_result( + line_no: int, + slot: str, + energy_cell: str, + acc_cell: str, + result: dict, + failures: list, +) -> None: + is_dq = ( + result.get("disqualified", False) + or result.get("val_char_accuracy") is None + or result.get("val_char_accuracy", 0.0) < 0.70 + ) + + if acc_cell == "DQ": + if not is_dq: + failures.append( + f"line {line_no}: {slot}: row says DQ but result.json is PASS" + ) + else: + if is_dq: + failures.append( + f"line {line_no}: {slot}: row claims PASS but result.json is DQ" + ) + else: + try: + row_acc = float(acc_cell) + except ValueError: + failures.append( + f"line {line_no}: {slot}: cannot parse acc cell {acc_cell!r}" + ) + return + result_acc = result.get("val_char_accuracy", 0.0) + if abs(row_acc - result_acc) > ACC_TOL: + failures.append( + f"line {line_no}: {slot}: acc row={row_acc:.4f} vs " + f"result.json={result_acc:.4f}" + ) + + try: + row_energy = float(energy_cell.replace(",", "").strip()) + except ValueError: + failures.append( + f"line {line_no}: {slot}: cannot parse energy cell {energy_cell!r}" + ) + return + expected_energy = result.get("total_energy_J") + if expected_energy is None: + expected_energy = result.get("training_energy_J", 0.0) + if expected_energy is None or expected_energy == 0: + return + rel = abs(row_energy - expected_energy) / max(1.0, expected_energy) + if rel > ENERGY_TOL_REL: + failures.append( + f"line {line_no}: {slot}: energy row={row_energy:,.0f} vs " + f"result.json={expected_energy:,.0f} (rel diff {rel:.2%})" + ) + + +def _report(failures: list[str]) -> int: + if not failures: + print("README Record History: OK") + return 0 + print(f"README Record History: {len(failures)} issue(s) found:") + for f in failures: + print(f" {f}") + return 1 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/submit.py b/submit.py index c8ec97d..4e53a7f 100755 --- a/submit.py +++ b/submit.py @@ -320,7 +320,13 @@ def save_nvml_artifact( def append_record(result: dict, dir_relpath: str) -> None: """Append one row to the Record History table in README.md. - Replaces the placeholder dash row if present, otherwise appends. + Inserts the row at the end of the Record History markdown table + block (after the last data row, before the blank line that closes + the table). Earlier versions appended to the end of the file, which + landed rows past the footnotes and broke the table — this version + keeps them inside the table. + + Replaces the placeholder dash row if present, otherwise inserts. Disqualified rows render their accuracy cell as ``DQ`` so they don't pollute the leaderboard sort. @@ -351,10 +357,48 @@ def append_record(result: dict, dir_relpath: str) -> None: ) placeholder = "| — | — | — | — | — | — |\n" if placeholder in text: - text = text.replace(placeholder, row, 1) - else: - text = text.rstrip() + "\n" + row - readme.write_text(text) + readme.write_text(text.replace(placeholder, row, 1)) + return + + new_text = _insert_into_record_history_table(text, row) + if new_text is None: + # Table not found — fall back to plain append. Better than crashing. + new_text = text.rstrip() + "\n" + row + readme.write_text(new_text) + + +def _insert_into_record_history_table(text: str, row: str) -> str | None: + """Return ``text`` with ``row`` inserted at the end of the Record + History markdown table block. Returns ``None`` if no table was found. + + The table is identified by a ``## Record History`` heading followed + by a markdown pipe-table header. The new row is inserted after the + last consecutive pipe-prefixed line of the table. + """ + lines = text.splitlines(keepends=True) + in_record_history = False + in_table = False + last_pipe_line = -1 + for i, line in enumerate(lines): + if line.lstrip().startswith("## Record History"): + in_record_history = True + continue + if not in_record_history: + continue + if not in_table: + if line.startswith("|") and "Energy" in line and "Val" in line: + in_table = True + last_pipe_line = i + continue + # In the table: every pipe-line counts as the running tail. + if line.startswith("|"): + last_pipe_line = i + continue + # First non-pipe line closes the table. + break + if last_pipe_line < 0: + return None + return "".join(lines[: last_pipe_line + 1] + [row] + lines[last_pipe_line + 1 :]) class _Tee(io.TextIOBase): From 62e5f07c2a855df41821ed315db04986d81d3d63 Mon Sep 17 00:00:00 2001 From: Armin Stepanyan <12305910+ab-10@users.noreply.github.com> Date: Fri, 22 May 2026 03:41:20 +0000 Subject: [PATCH 3/5] Add CPU energy usage to total --- run_eval.py | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/run_eval.py b/run_eval.py index c9e5806..881eb3c 100644 --- a/run_eval.py +++ b/run_eval.py @@ -119,7 +119,7 @@ def main() -> None: if m is not None: print(f"training duration : {m.duration_s:.1f}s") if m.energy_joules is not None: - print(f"training energy (J): {m.energy_joules:,.1f} (at kill)") + print(f"training energy (J): {_fmt_training_energy(m)} (at kill)") if args.results_json is not None: payload = { "submission": submission_name, @@ -155,7 +155,7 @@ def main() -> None: f"below floor {args.acc_min:.4f}") print(f"submission : {submission_name}") if m.energy_joules is not None: - print(f"training energy (J): {m.energy_joules:,.1f}") + print(f"training energy (J): {_fmt_training_energy(m)}") print(f"training duration : {m.duration_s:.1f}s") if args.results_json is not None: payload = { @@ -178,10 +178,7 @@ def main() -> None: print("---") print(f"submission : {submission_name}") - if m.energy_joules is not None: - print(f"training energy (J): {m.energy_joules:,.1f}") - else: - print("training energy (J): NOT MEASURED") + print(f"training energy (J): {_fmt_training_energy(m)}") print(f"training duration : {m.duration_s:.1f}s") print(f"val char-accuracy : {val_result.accuracy:.4f}") print(f"val chars : {val_result.n_chars:,}") @@ -217,5 +214,16 @@ def _utc_now() -> str: .replace(microsecond=0).isoformat().replace("+00:00", "Z")) +def _fmt_training_energy(m) -> str: + if (m.total_energy_J is not None + and m.energy_joules is not None + and m.cpu_energy_J is not None): + return (f"{m.total_energy_J:,.1f} " + f"({m.energy_joules:,.1f} GPU + {m.cpu_energy_J:,.1f} CPU)") + if m.energy_joules is not None: + return f"{m.energy_joules:,.1f}" + return "NOT MEASURED" + + if __name__ == "__main__": main() From 2bb068d0c8269ad72a2c9207a97dab703af34428 Mon Sep 17 00:00:00 2001 From: Gabriel Nakajima An Date: Mon, 25 May 2026 15:24:13 -0700 Subject: [PATCH 4/5] Normalize hallucinated contributor handles to @gabrielnan MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The submission ``__author__`` fields under ``submissions/{adamw_lr3e3_wd0_long, alpha_06, bpe_internal_nn_v2, chunker_phase1_v1, chunker_phase1_v2, deep_backoff_kn, gpu_ngram_o14_xorfix, gpu_ngram_w31_k10, gpu_ngram_w31_k11, lwta_k4_alpha_065, mamba_byte, paq_mixer_v3, subset_70_mkn}/`` were written by AI subagents during development and contained subagent-style identifiers (``@subagent-xorfix-2026-05-19``, ``@exp-batch-iter4``, ``@follow-up-paq-prediction``, ``@worker-paq-mixer``, ``@explore-chunker-2026-05-19``, ``@subagent-L2clean-2026-05-19``, ``@claude-mamba``, ``@explore-reopen-adamw``) — none of which are real GitHub usernames. ``@nakajimagabriel`` was also invented (GitHub returns 404 for that login); the actual contributor account is ``@gabrielnan``. Replaced both classes of hallucinated handles with ``@gabrielnan`` in: - each affected ``submission.py``'s ``__author__`` line, - each affected ``result.json``'s ``contributor`` field, - the matching rows in ``README.md``'s Record History table. scripts/validate_record_history.py still reports OK. Co-Authored-By: Claude Opus 4.7 (1M context) --- README.md | 24 +-- submissions/adamw_lr3e3_wd0_long/result.json | 2 +- .../adamw_lr3e3_wd0_long/submission.py | 2 +- submissions/alpha_06/result.json | 2 +- submissions/alpha_06/result.sxm4.json | 21 +++ submissions/alpha_06/run.sxm4.log | 144 ++++++++++++++++++ submissions/alpha_06/submission.py | 2 +- submissions/bpe_internal_nn_v2/result.json | 2 +- submissions/bpe_internal_nn_v2/submission.py | 2 +- submissions/chunker_phase1_v1/result.json | 2 +- submissions/chunker_phase1_v1/submission.py | 2 +- submissions/chunker_phase1_v2/result.json | 2 +- submissions/chunker_phase1_v2/submission.py | 2 +- submissions/deep_backoff_kn/result.json | 2 +- submissions/deep_backoff_kn/submission.py | 2 +- submissions/gpu_ngram_o14_xorfix/result.json | 2 +- .../gpu_ngram_o14_xorfix/submission.py | 2 +- submissions/gpu_ngram_w31_k10/result.json | 2 +- submissions/gpu_ngram_w31_k10/submission.py | 2 +- submissions/gpu_ngram_w31_k11/result.json | 2 +- submissions/gpu_ngram_w31_k11/submission.py | 2 +- submissions/lwta_k4_alpha_065/result.json | 2 +- submissions/lwta_k4_alpha_065/submission.py | 2 +- submissions/mamba_byte/result.json | 2 +- submissions/mamba_byte/submission.py | 2 +- submissions/paq_mixer_v3/result.json | 2 +- submissions/paq_mixer_v3/submission.py | 2 +- submissions/subset_70_mkn/result.json | 2 +- submissions/subset_70_mkn/submission.py | 2 +- 29 files changed, 203 insertions(+), 38 deletions(-) create mode 100644 submissions/alpha_06/result.sxm4.json create mode 100644 submissions/alpha_06/run.sxm4.log diff --git a/README.md b/README.md index 6093d84..87eb1e3 100644 --- a/README.md +++ b/README.md @@ -35,20 +35,20 @@ The `Energy (J)` column reports **`total_energy_J`** (GPU NVML net of idle basel | 2026-05-18 | 3,612 | DQ | A100 80GB PCIe | chunker_d1 | [dir](research/catalog/new_directions/chunker_d1) | @ab-10 | | 2026-05-18 | 735 | DQ | A100 80GB PCIe | ppm_c | [dir](research/catalog/new_directions/ppm_c) | @ab-10 | | 2026-05-17 | 70 | DQ | A100 80GB SXM4 | P2-A_random_projection | [dir](research/forward-forward-deep/runs/phase2/P2-A_random_projection) | @ab-10 | -| 2026-05-19 | 60,864 | DQ | A100 80GB PCIe | mamba_byte | [dir](submissions/mamba_byte) | @claude-mamba | -| 2026-05-20 | 1,752 | DQ | A100 80GB SXM4 | gpu_ngram_w31_k10 | [dir](submissions/gpu_ngram_w31_k10) | @follow-up-paq-prediction | -| 2026-05-20 | 13,936 | DQ | A100 80GB SXM4 | chunker_phase1_v2 | [dir](submissions/chunker_phase1_v2) | @explore-chunker-2026-05-19 | -| 2026-05-20 | 24,417 | DQ | A100 80GB SXM4 | bpe_internal_nn_v2 | [dir](submissions/bpe_internal_nn_v2) | @subagent-xorfix-2026-05-19 | +| 2026-05-19 | 60,864 | DQ | A100 80GB PCIe | mamba_byte | [dir](submissions/mamba_byte) | @gabrielnan | +| 2026-05-20 | 1,752 | DQ | A100 80GB SXM4 | gpu_ngram_w31_k10 | [dir](submissions/gpu_ngram_w31_k10) | @gabrielnan | +| 2026-05-20 | 13,936 | DQ | A100 80GB SXM4 | chunker_phase1_v2 | [dir](submissions/chunker_phase1_v2) | @gabrielnan | +| 2026-05-20 | 24,417 | DQ | A100 80GB SXM4 | bpe_internal_nn_v2 | [dir](submissions/bpe_internal_nn_v2) | @gabrielnan | | 2026-05-20 | 53,683 | 0.7246 | A100 80GB PCIe | lwta_k4 | [dir](submissions/lwta_k4) | @ab-10 (re-run on new harness; total_J = 44,329 gpu + 9,354 cpu) | | 2026-05-20 | 54,614 | 0.7145 | A100 80GB PCIe | lwta_k2 | [dir](submissions/lwta_k2) | @ab-10 (re-run on new harness; total_J = 44,583 gpu + 10,031 cpu) | -| 2026-05-21 | 2,474 | 0.7031 | A100 80GB PCIe | subset_70_mkn | [dir](submissions/subset_70_mkn) | @exp-batch-iter4 | -| 2026-05-21 | 3,092 | 0.7050 | A100 80GB PCIe | gpu_ngram_w31_k11 | [dir](submissions/gpu_ngram_w31_k11) | @follow-up-paq-prediction | -| 2026-05-21 | 4,607 | 0.7047 | A100 80GB PCIe | paq_mixer_v3 | [dir](submissions/paq_mixer_v3) | @worker-paq-mixer | -| 2026-05-21 | 8,602 | 0.7184 | A100 80GB PCIe | gpu_ngram_o14_xorfix | [dir](submissions/gpu_ngram_o14_xorfix) | @subagent-xorfix-2026-05-19 | -| 2026-05-21 | 9,591 | 0.7063 | A100 80GB PCIe | chunker_phase1_v1 | [dir](submissions/chunker_phase1_v1) | @explore-chunker-2026-05-19 | -| 2026-05-21 | 14,578 | 0.7184 | A100 80GB PCIe | deep_backoff_kn | [dir](submissions/deep_backoff_kn) | @nakajimagabriel | -| 2026-05-21 | 19,922 | 0.7328 | A100 80GB SXM4 | lwta_k4_alpha_065 | [dir](submissions/lwta_k4_alpha_065) | @subagent-L2clean-2026-05-19 | -| 2026-05-21 | 20,743 | 0.7390 | A100 80GB SXM4 | alpha_06 | [dir](submissions/alpha_06) | @subagent-xorfix-2026-05-19 | +| 2026-05-21 | 2,474 | 0.7031 | A100 80GB PCIe | subset_70_mkn | [dir](submissions/subset_70_mkn) | @gabrielnan | +| 2026-05-21 | 3,092 | 0.7050 | A100 80GB PCIe | gpu_ngram_w31_k11 | [dir](submissions/gpu_ngram_w31_k11) | @gabrielnan | +| 2026-05-21 | 4,607 | 0.7047 | A100 80GB PCIe | paq_mixer_v3 | [dir](submissions/paq_mixer_v3) | @gabrielnan | +| 2026-05-21 | 8,602 | 0.7184 | A100 80GB PCIe | gpu_ngram_o14_xorfix | [dir](submissions/gpu_ngram_o14_xorfix) | @gabrielnan | +| 2026-05-21 | 9,591 | 0.7063 | A100 80GB PCIe | chunker_phase1_v1 | [dir](submissions/chunker_phase1_v1) | @gabrielnan | +| 2026-05-21 | 14,578 | 0.7184 | A100 80GB PCIe | deep_backoff_kn | [dir](submissions/deep_backoff_kn) | @gabrielnan | +| 2026-05-21 | 19,922 | 0.7328 | A100 80GB SXM4 | lwta_k4_alpha_065 | [dir](submissions/lwta_k4_alpha_065) | @gabrielnan | +| 2026-05-21 | 20,743 | 0.7390 | A100 80GB SXM4 | alpha_06 | [dir](submissions/alpha_06) | @gabrielnan | | 2026-05-21 | 62,006 | 0.7337 | A100 80GB SXM4 | modded_nanogpt | [dir](submissions/modded_nanogpt) | @ab-10 | diff --git a/submissions/adamw_lr3e3_wd0_long/result.json b/submissions/adamw_lr3e3_wd0_long/result.json index bc31931..8e42955 100644 --- a/submissions/adamw_lr3e3_wd0_long/result.json +++ b/submissions/adamw_lr3e3_wd0_long/result.json @@ -17,5 +17,5 @@ "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": [] }, - "contributor": "@explore-reopen-adamw" + "contributor": "@gabrielnan" } diff --git a/submissions/adamw_lr3e3_wd0_long/submission.py b/submissions/adamw_lr3e3_wd0_long/submission.py index a922fbf..5aaa6fd 100644 --- a/submissions/adamw_lr3e3_wd0_long/submission.py +++ b/submissions/adamw_lr3e3_wd0_long/submission.py @@ -26,7 +26,7 @@ """ from __future__ import annotations -__author__ = "@explore-reopen-adamw" +__author__ = "@gabrielnan" import math import os diff --git a/submissions/alpha_06/result.json b/submissions/alpha_06/result.json index b96b4ca..ac1a778 100644 --- a/submissions/alpha_06/result.json +++ b/submissions/alpha_06/result.json @@ -19,5 +19,5 @@ "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": [] }, - "contributor": "@subagent-xorfix-2026-05-19" + "contributor": "@gabrielnan" } diff --git a/submissions/alpha_06/result.sxm4.json b/submissions/alpha_06/result.sxm4.json new file mode 100644 index 0000000..96475b1 --- /dev/null +++ b/submissions/alpha_06/result.sxm4.json @@ -0,0 +1,21 @@ +{ + "submission": "alpha_06", + "training_energy_J": 14047.8704136, + "training_duration_s": 159.464391728, + "val_char_accuracy": 0.7437, + "val_chars": 60000, + "gpu_name": "NVIDIA A100-SXM4-80GB", + "date_utc": "2026-05-20T01:10:10Z", + "_nvml": { + "nvml_available": true, + "energy_counter_supported": true, + "monotonic": true, + "idle_watts": 61.72905000000001, + "stress_watts_avg": 336.50938625277473, + "stress_energy_joules": 12708.794, + "stress_duration_s": 37.766536445, + "gpu_name": "NVIDIA A100-SXM4-80GB", + "notes": [] + }, + "contributor": "@subagent-xorfix-2026-05-19" +} diff --git a/submissions/alpha_06/run.sxm4.log b/submissions/alpha_06/run.sxm4.log new file mode 100644 index 0000000..da9fb6f --- /dev/null +++ b/submissions/alpha_06/run.sxm4.log @@ -0,0 +1,144 @@ +# wikitext submit.py log — alpha_06 — 2026-05-20T00:58:54+00:00Z +[modal] launching A100-80GB ... +✓ Initialized. View run at +https://modal.com/apps/gabriel-nakajima-an/main/ap-4rmVWzqcQj1VatkwS8Da4X +✓ Created objects. +├── 🔨 Created mount /Users/naka/src/sutro/wikitext/submit.py +├── 🔨 Created mount /Users/naka/src/sutro/wikitext/task.py +├── 🔨 Created mount /Users/naka/src/sutro/wikitext/verify_nvml.py +├── 🔨 Created mount /Users/naka/src/sutro/wikitext/wikitext.py +├── 🔨 Created mount /Users/naka/src/sutro/wikitext/run_eval.py +└── 🔨 Created function run_submission. +[modal] verifying NVML energy counter ... +GPU: NVIDIA A100-SXM4-80GB +sampling idle power for 3s ... + idle: 61.7 W +running 30s stress workload ... + duration: 37.8 s + energy delta: 12,708.8 J + avg power: 336.5 W + monotonic: True +--- +{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 61.72905000000001, "stress_watts_avg": 336.50938625277473, "stress_energy_joules": 12708.794, "stress_duration_s": 37.766536445, "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": []} +[modal] running submission (TEST_CHARS=60000 MAX_TRAIN_SECONDS=300.0 ACC_MIN=0.7) ... +loading WikiText-103 from /data ... + train chars: 540,095,682 + val chars: 60,000 (scored, gated by --acc-min) +train wall-clock cap: 300 s +val accuracy floor : 0.7000 +training submission /workspace/alpha_06.py ... +[clean_w31] starting GPU KN build; max_order=12 D=0.5 +[clean_w31] top order=12 unique pairs: 157,942,722 2.6s +[clean_w31] ctx_len=11 ctxs=119,285,712 30.5s +[clean_w31] ctx_len=10 ctxs=84,282,364 20.3s +[clean_w31] ctx_len=9 ctxs=54,720,376 13.2s +[clean_w31] ctx_len=8 ctxs=31,924,091 7.9s +[clean_w31] ctx_len=7 ctxs=16,284,921 4.2s +[clean_w31] ctx_len=6 ctxs=7,016,442 1.9s +[clean_w31] ctx_len=5 ctxs=2,438,281 0.7s +[clean_w31] ctx_len=4 ctxs=637,143 0.2s +[clean_w31] ctx_len=3 ctxs=122,882 0.0s +[clean_w31] ctx_len=2 ctxs=12,282 0.0s +[clean_w31] ctx_len=1 ctxs=204 0.0s +[clean_w31] ctx_len=0 ctxs=1 0.0s +[clean_w31] KN build done: 81.6s +[clean_w31] NN 3.29M params cfg=TrainConfig(d=256 L=4 H=4 bs=32 T=1024 steps=1200) +[clean_w31] NN step 0/1200 loss 5.5452 elapsed 1s +[clean_w31] NN step 100/1200 loss 1.7587 elapsed 7s +[clean_w31] NN step 200/1200 loss 1.4674 elapsed 13s +[clean_w31] NN step 300/1200 loss 1.3990 elapsed 19s +[clean_w31] NN step 400/1200 loss 1.3359 elapsed 25s +[clean_w31] NN step 500/1200 loss 1.2644 elapsed 32s +[clean_w31] NN step 600/1200 loss 1.2352 elapsed 38s +[clean_w31] NN step 700/1200 loss 1.1895 elapsed 44s +[clean_w31] NN step 800/1200 loss 1.1475 elapsed 50s +[clean_w31] NN step 900/1200 loss 1.1349 elapsed 56s +[clean_w31] NN step 1000/1200 loss 1.1164 elapsed 62s +[clean_w31] NN step 1100/1200 loss 1.1330 elapsed 68s +[clean_w31] NN step 1199/1200 loss 1.1061 elapsed 74s +training: 14,047.9 J duration=159.5s +evaluating on val split ... + eval 1,200/60,000 ( 2.0%) acc=0.7333 127 char/s eta= 464s + eval 2,400/60,000 ( 4.0%) acc=0.7221 126 char/s eta= 456s + eval 3,600/60,000 ( 6.0%) acc=0.7222 126 char/s eta= 447s + eval 4,800/60,000 ( 8.0%) acc=0.7298 126 char/s eta= 437s + eval 6,000/60,000 ( 10.0%) acc=0.7277 127 char/s eta= 426s + eval 7,200/60,000 ( 12.0%) acc=0.7235 127 char/s eta= 415s + eval 8,400/60,000 ( 14.0%) acc=0.7229 128 char/s eta= 404s + eval 9,600/60,000 ( 16.0%) acc=0.7286 128 char/s eta= 394s + eval 10,800/60,000 ( 18.0%) acc=0.7342 128 char/s eta= 384s + eval 12,000/60,000 ( 20.0%) acc=0.7347 128 char/s eta= 374s + eval 13,200/60,000 ( 22.0%) acc=0.7391 128 char/s eta= 364s + eval 14,400/60,000 ( 24.0%) acc=0.7410 129 char/s eta= 355s + eval 15,600/60,000 ( 26.0%) acc=0.7424 129 char/s eta= 345s + eval 16,800/60,000 ( 28.0%) acc=0.7456 129 char/s eta= 336s + eval 18,000/60,000 ( 30.0%) acc=0.7466 129 char/s eta= 326s + eval 19,200/60,000 ( 32.0%) acc=0.7496 129 char/s eta= 317s + eval 20,400/60,000 ( 34.0%) acc=0.7513 129 char/s eta= 307s + eval 21,600/60,000 ( 36.0%) acc=0.7513 129 char/s eta= 298s + eval 22,800/60,000 ( 38.0%) acc=0.7514 129 char/s eta= 288s + eval 24,000/60,000 ( 40.0%) acc=0.7513 129 char/s eta= 279s + eval 25,200/60,000 ( 42.0%) acc=0.7514 129 char/s eta= 270s + eval 26,400/60,000 ( 44.0%) acc=0.7524 129 char/s eta= 260s + eval 27,600/60,000 ( 46.0%) acc=0.7518 129 char/s eta= 251s + eval 28,800/60,000 ( 48.0%) acc=0.7523 129 char/s eta= 242s + eval 30,000/60,000 ( 50.0%) acc=0.7518 129 char/s eta= 232s + eval 31,200/60,000 ( 52.0%) acc=0.7493 129 char/s eta= 223s + eval 32,400/60,000 ( 54.0%) acc=0.7480 129 char/s eta= 214s + eval 33,600/60,000 ( 56.0%) acc=0.7460 129 char/s eta= 204s + eval 34,800/60,000 ( 58.0%) acc=0.7462 129 char/s eta= 195s + eval 36,000/60,000 ( 60.0%) acc=0.7464 129 char/s eta= 186s + eval 37,200/60,000 ( 62.0%) acc=0.7463 129 char/s eta= 176s + eval 38,400/60,000 ( 64.0%) acc=0.7460 129 char/s eta= 167s + eval 39,600/60,000 ( 66.0%) acc=0.7457 129 char/s eta= 158s + eval 40,800/60,000 ( 68.0%) acc=0.7448 129 char/s eta= 148s + eval 42,000/60,000 ( 70.0%) acc=0.7439 130 char/s eta= 139s + eval 43,200/60,000 ( 72.0%) acc=0.7438 129 char/s eta= 130s + eval 44,400/60,000 ( 74.0%) acc=0.7434 129 char/s eta= 120s + eval 45,600/60,000 ( 76.0%) acc=0.7432 129 char/s eta= 111s + eval 46,800/60,000 ( 78.0%) acc=0.7425 130 char/s eta= 102s + eval 48,000/60,000 ( 80.0%) acc=0.7425 130 char/s eta= 93s + eval 49,200/60,000 ( 82.0%) acc=0.7423 130 char/s eta= 83s + eval 50,400/60,000 ( 84.0%) acc=0.7430 130 char/s eta= 74s + eval 51,600/60,000 ( 86.0%) acc=0.7432 130 char/s eta= 65s + eval 52,800/60,000 ( 88.0%) acc=0.7428 130 char/s eta= 56s + eval 54,000/60,000 ( 90.0%) acc=0.7429 130 char/s eta= 46s + eval 55,200/60,000 ( 92.0%) acc=0.7419 130 char/s eta= 37s + eval 56,400/60,000 ( 94.0%) acc=0.7420 130 char/s eta= 28s + eval 57,600/60,000 ( 96.0%) acc=0.7423 130 char/s eta= 19s + eval 58,800/60,000 ( 98.0%) acc=0.7429 130 char/s eta= 9s + eval 60,000/60,000 (100.0%) acc=0.7437 130 char/s eta= 0s +chars=60,000 acc=0.7437 eval_duration=462.8s +--- +submission : alpha_06 +training energy (J): 14,047.9 +training duration : 159.5s +val char-accuracy : 0.7437 +val chars : 60,000 +wrote /tmp/result.json +Stopping app - local entrypoint completed. +✓ App completed. View run at +https://modal.com/apps/gabriel-nakajima-an/main/ap-4rmVWzqcQj1VatkwS8Da4X + +# final result +{ + "submission": "alpha_06", + "training_energy_J": 14047.8704136, + "training_duration_s": 159.464391728, + "val_char_accuracy": 0.7437, + "val_chars": 60000, + "gpu_name": "NVIDIA A100-SXM4-80GB", + "date_utc": "2026-05-20T01:10:10Z", + "_nvml": { + "nvml_available": true, + "energy_counter_supported": true, + "monotonic": true, + "idle_watts": 61.72905000000001, + "stress_watts_avg": 336.50938625277473, + "stress_energy_joules": 12708.794, + "stress_duration_s": 37.766536445, + "gpu_name": "NVIDIA A100-SXM4-80GB", + "notes": [] + }, + "contributor": "@subagent-xorfix-2026-05-19" +} diff --git a/submissions/alpha_06/submission.py b/submissions/alpha_06/submission.py index 7bc034f..ad88845 100644 --- a/submissions/alpha_06/submission.py +++ b/submissions/alpha_06/submission.py @@ -8,7 +8,7 @@ """ from __future__ import annotations -__author__ = "@subagent-xorfix-2026-05-19" +__author__ = "@gabrielnan" import os import time diff --git a/submissions/bpe_internal_nn_v2/result.json b/submissions/bpe_internal_nn_v2/result.json index 260f72f..8a28017 100644 --- a/submissions/bpe_internal_nn_v2/result.json +++ b/submissions/bpe_internal_nn_v2/result.json @@ -20,5 +20,5 @@ "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": [] }, - "contributor": "@subagent-xorfix-2026-05-19" + "contributor": "@gabrielnan" } diff --git a/submissions/bpe_internal_nn_v2/submission.py b/submissions/bpe_internal_nn_v2/submission.py index ad60520..1fe4680 100644 --- a/submissions/bpe_internal_nn_v2/submission.py +++ b/submissions/bpe_internal_nn_v2/submission.py @@ -19,7 +19,7 @@ """ from __future__ import annotations -__author__ = "@subagent-xorfix-2026-05-19" +__author__ = "@gabrielnan" import concurrent.futures import os diff --git a/submissions/chunker_phase1_v1/result.json b/submissions/chunker_phase1_v1/result.json index d95dc4f..4eb30f9 100644 --- a/submissions/chunker_phase1_v1/result.json +++ b/submissions/chunker_phase1_v1/result.json @@ -19,5 +19,5 @@ "gpu_name": "NVIDIA A100 80GB PCIe", "notes": [] }, - "contributor": "@explore-chunker-2026-05-19" + "contributor": "@gabrielnan" } diff --git a/submissions/chunker_phase1_v1/submission.py b/submissions/chunker_phase1_v1/submission.py index e57482d..11f94eb 100644 --- a/submissions/chunker_phase1_v1/submission.py +++ b/submissions/chunker_phase1_v1/submission.py @@ -37,7 +37,7 @@ """ from __future__ import annotations -__author__ = "@explore-chunker-2026-05-19" +__author__ = "@gabrielnan" import os import time diff --git a/submissions/chunker_phase1_v2/result.json b/submissions/chunker_phase1_v2/result.json index 4111ea3..a8cfd09 100644 --- a/submissions/chunker_phase1_v2/result.json +++ b/submissions/chunker_phase1_v2/result.json @@ -20,5 +20,5 @@ "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": [] }, - "contributor": "@explore-chunker-2026-05-19" + "contributor": "@gabrielnan" } diff --git a/submissions/chunker_phase1_v2/submission.py b/submissions/chunker_phase1_v2/submission.py index a142cff..42d380a 100644 --- a/submissions/chunker_phase1_v2/submission.py +++ b/submissions/chunker_phase1_v2/submission.py @@ -45,7 +45,7 @@ """ from __future__ import annotations -__author__ = "@explore-chunker-2026-05-19" +__author__ = "@gabrielnan" import os import time diff --git a/submissions/deep_backoff_kn/result.json b/submissions/deep_backoff_kn/result.json index 934a739..2ddfa31 100644 --- a/submissions/deep_backoff_kn/result.json +++ b/submissions/deep_backoff_kn/result.json @@ -19,5 +19,5 @@ "gpu_name": "NVIDIA A100 80GB PCIe", "notes": [] }, - "contributor": "@nakajimagabriel" + "contributor": "@gabrielnan" } diff --git a/submissions/deep_backoff_kn/submission.py b/submissions/deep_backoff_kn/submission.py index 0e41b74..1097562 100644 --- a/submissions/deep_backoff_kn/submission.py +++ b/submissions/deep_backoff_kn/submission.py @@ -28,7 +28,7 @@ """ from __future__ import annotations -__author__ = "@nakajimagabriel" +__author__ = "@gabrielnan" import multiprocessing import os diff --git a/submissions/gpu_ngram_o14_xorfix/result.json b/submissions/gpu_ngram_o14_xorfix/result.json index 2b79e7d..5e32485 100644 --- a/submissions/gpu_ngram_o14_xorfix/result.json +++ b/submissions/gpu_ngram_o14_xorfix/result.json @@ -19,5 +19,5 @@ "gpu_name": "NVIDIA A100 80GB PCIe", "notes": [] }, - "contributor": "@subagent-xorfix-2026-05-19" + "contributor": "@gabrielnan" } diff --git a/submissions/gpu_ngram_o14_xorfix/submission.py b/submissions/gpu_ngram_o14_xorfix/submission.py index a3ed390..b1fec29 100644 --- a/submissions/gpu_ngram_o14_xorfix/submission.py +++ b/submissions/gpu_ngram_o14_xorfix/submission.py @@ -43,7 +43,7 @@ class is identical) but the GLOBAL order of distinct (hi, lo) keys is """ from __future__ import annotations -__author__ = "@subagent-xorfix-2026-05-19" +__author__ = "@gabrielnan" import os import time diff --git a/submissions/gpu_ngram_w31_k10/result.json b/submissions/gpu_ngram_w31_k10/result.json index c4de566..f946b23 100644 --- a/submissions/gpu_ngram_w31_k10/result.json +++ b/submissions/gpu_ngram_w31_k10/result.json @@ -22,5 +22,5 @@ "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": [] }, - "contributor": "@follow-up-paq-prediction" + "contributor": "@gabrielnan" } diff --git a/submissions/gpu_ngram_w31_k10/submission.py b/submissions/gpu_ngram_w31_k10/submission.py index eee7390..eb9d7c6 100644 --- a/submissions/gpu_ngram_w31_k10/submission.py +++ b/submissions/gpu_ngram_w31_k10/submission.py @@ -27,7 +27,7 @@ """ from __future__ import annotations -__author__ = "@follow-up-paq-prediction" +__author__ = "@gabrielnan" import os import time diff --git a/submissions/gpu_ngram_w31_k11/result.json b/submissions/gpu_ngram_w31_k11/result.json index 3d6a2df..60452b1 100644 --- a/submissions/gpu_ngram_w31_k11/result.json +++ b/submissions/gpu_ngram_w31_k11/result.json @@ -19,5 +19,5 @@ "gpu_name": "NVIDIA A100 80GB PCIe", "notes": [] }, - "contributor": "@follow-up-paq-prediction" + "contributor": "@gabrielnan" } diff --git a/submissions/gpu_ngram_w31_k11/submission.py b/submissions/gpu_ngram_w31_k11/submission.py index 9e0a8a2..bfbfdec 100644 --- a/submissions/gpu_ngram_w31_k11/submission.py +++ b/submissions/gpu_ngram_w31_k11/submission.py @@ -27,7 +27,7 @@ """ from __future__ import annotations -__author__ = "@follow-up-paq-prediction" +__author__ = "@gabrielnan" import os import time diff --git a/submissions/lwta_k4_alpha_065/result.json b/submissions/lwta_k4_alpha_065/result.json index b2d6674..167d108 100644 --- a/submissions/lwta_k4_alpha_065/result.json +++ b/submissions/lwta_k4_alpha_065/result.json @@ -19,5 +19,5 @@ "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": [] }, - "contributor": "@subagent-L2clean-2026-05-19" + "contributor": "@gabrielnan" } diff --git a/submissions/lwta_k4_alpha_065/submission.py b/submissions/lwta_k4_alpha_065/submission.py index bae85ab..a91ec6d 100644 --- a/submissions/lwta_k4_alpha_065/submission.py +++ b/submissions/lwta_k4_alpha_065/submission.py @@ -22,7 +22,7 @@ """ from __future__ import annotations -__author__ = "@subagent-L2clean-2026-05-19" +__author__ = "@gabrielnan" import os import time diff --git a/submissions/mamba_byte/result.json b/submissions/mamba_byte/result.json index 60fe461..7a4c60f 100644 --- a/submissions/mamba_byte/result.json +++ b/submissions/mamba_byte/result.json @@ -18,5 +18,5 @@ "gpu_name": "NVIDIA A100 80GB PCIe", "notes": [] }, - "contributor": "@claude-mamba" + "contributor": "@gabrielnan" } diff --git a/submissions/mamba_byte/submission.py b/submissions/mamba_byte/submission.py index d891094..46a4cf7 100644 --- a/submissions/mamba_byte/submission.py +++ b/submissions/mamba_byte/submission.py @@ -56,7 +56,7 @@ """ from __future__ import annotations -__author__ = "@claude-mamba" +__author__ = "@gabrielnan" import math import os diff --git a/submissions/paq_mixer_v3/result.json b/submissions/paq_mixer_v3/result.json index da2d7dc..f9b455c 100644 --- a/submissions/paq_mixer_v3/result.json +++ b/submissions/paq_mixer_v3/result.json @@ -19,5 +19,5 @@ "gpu_name": "NVIDIA A100 80GB PCIe", "notes": [] }, - "contributor": "@worker-paq-mixer" + "contributor": "@gabrielnan" } diff --git a/submissions/paq_mixer_v3/submission.py b/submissions/paq_mixer_v3/submission.py index 0fb380b..fda5d1d 100644 --- a/submissions/paq_mixer_v3/submission.py +++ b/submissions/paq_mixer_v3/submission.py @@ -35,7 +35,7 @@ """ from __future__ import annotations -__author__ = "@worker-paq-mixer" +__author__ = "@gabrielnan" import os import time diff --git a/submissions/subset_70_mkn/result.json b/submissions/subset_70_mkn/result.json index e956192..1d80233 100644 --- a/submissions/subset_70_mkn/result.json +++ b/submissions/subset_70_mkn/result.json @@ -19,5 +19,5 @@ "gpu_name": "NVIDIA A100 80GB PCIe", "notes": [] }, - "contributor": "@exp-batch-iter4" + "contributor": "@gabrielnan" } diff --git a/submissions/subset_70_mkn/submission.py b/submissions/subset_70_mkn/submission.py index 340faf8..b6cd0a7 100644 --- a/submissions/subset_70_mkn/submission.py +++ b/submissions/subset_70_mkn/submission.py @@ -27,7 +27,7 @@ """ from __future__ import annotations -__author__ = "@exp-batch-iter4" +__author__ = "@gabrielnan" import os import time From 0054c31c6455005b00c9f7cc80151cd06cb64e1e Mon Sep 17 00:00:00 2001 From: Gabriel Nakajima An Date: Mon, 25 May 2026 15:24:24 -0700 Subject: [PATCH 5/5] Drop historical .sxm4.json/.log snapshots accidentally added result.sxm4.json + run.sxm4.log under submissions/alpha_06/ were snapshots from an earlier re-run attempt (preserving the SXM4 result before re-launching for a PCIe). They aren't part of the canonical submission artifact set and don't belong on the leaderboard branch. Co-Authored-By: Claude Opus 4.7 (1M context) --- submissions/alpha_06/result.sxm4.json | 21 ---- submissions/alpha_06/run.sxm4.log | 144 -------------------------- 2 files changed, 165 deletions(-) delete mode 100644 submissions/alpha_06/result.sxm4.json delete mode 100644 submissions/alpha_06/run.sxm4.log diff --git a/submissions/alpha_06/result.sxm4.json b/submissions/alpha_06/result.sxm4.json deleted file mode 100644 index 96475b1..0000000 --- a/submissions/alpha_06/result.sxm4.json +++ /dev/null @@ -1,21 +0,0 @@ -{ - "submission": "alpha_06", - "training_energy_J": 14047.8704136, - "training_duration_s": 159.464391728, - "val_char_accuracy": 0.7437, - "val_chars": 60000, - "gpu_name": "NVIDIA A100-SXM4-80GB", - "date_utc": "2026-05-20T01:10:10Z", - "_nvml": { - "nvml_available": true, - "energy_counter_supported": true, - "monotonic": true, - "idle_watts": 61.72905000000001, - "stress_watts_avg": 336.50938625277473, - "stress_energy_joules": 12708.794, - "stress_duration_s": 37.766536445, - "gpu_name": "NVIDIA A100-SXM4-80GB", - "notes": [] - }, - "contributor": "@subagent-xorfix-2026-05-19" -} diff --git a/submissions/alpha_06/run.sxm4.log b/submissions/alpha_06/run.sxm4.log deleted file mode 100644 index da9fb6f..0000000 --- a/submissions/alpha_06/run.sxm4.log +++ /dev/null @@ -1,144 +0,0 @@ -# wikitext submit.py log — alpha_06 — 2026-05-20T00:58:54+00:00Z -[modal] launching A100-80GB ... -✓ Initialized. View run at -https://modal.com/apps/gabriel-nakajima-an/main/ap-4rmVWzqcQj1VatkwS8Da4X -✓ Created objects. -├── 🔨 Created mount /Users/naka/src/sutro/wikitext/submit.py -├── 🔨 Created mount /Users/naka/src/sutro/wikitext/task.py -├── 🔨 Created mount /Users/naka/src/sutro/wikitext/verify_nvml.py -├── 🔨 Created mount /Users/naka/src/sutro/wikitext/wikitext.py -├── 🔨 Created mount /Users/naka/src/sutro/wikitext/run_eval.py -└── 🔨 Created function run_submission. -[modal] verifying NVML energy counter ... -GPU: NVIDIA A100-SXM4-80GB -sampling idle power for 3s ... - idle: 61.7 W -running 30s stress workload ... - duration: 37.8 s - energy delta: 12,708.8 J - avg power: 336.5 W - monotonic: True ---- -{"nvml_available": true, "energy_counter_supported": true, "monotonic": true, "idle_watts": 61.72905000000001, "stress_watts_avg": 336.50938625277473, "stress_energy_joules": 12708.794, "stress_duration_s": 37.766536445, "gpu_name": "NVIDIA A100-SXM4-80GB", "notes": []} -[modal] running submission (TEST_CHARS=60000 MAX_TRAIN_SECONDS=300.0 ACC_MIN=0.7) ... -loading WikiText-103 from /data ... - train chars: 540,095,682 - val chars: 60,000 (scored, gated by --acc-min) -train wall-clock cap: 300 s -val accuracy floor : 0.7000 -training submission /workspace/alpha_06.py ... -[clean_w31] starting GPU KN build; max_order=12 D=0.5 -[clean_w31] top order=12 unique pairs: 157,942,722 2.6s -[clean_w31] ctx_len=11 ctxs=119,285,712 30.5s -[clean_w31] ctx_len=10 ctxs=84,282,364 20.3s -[clean_w31] ctx_len=9 ctxs=54,720,376 13.2s -[clean_w31] ctx_len=8 ctxs=31,924,091 7.9s -[clean_w31] ctx_len=7 ctxs=16,284,921 4.2s -[clean_w31] ctx_len=6 ctxs=7,016,442 1.9s -[clean_w31] ctx_len=5 ctxs=2,438,281 0.7s -[clean_w31] ctx_len=4 ctxs=637,143 0.2s -[clean_w31] ctx_len=3 ctxs=122,882 0.0s -[clean_w31] ctx_len=2 ctxs=12,282 0.0s -[clean_w31] ctx_len=1 ctxs=204 0.0s -[clean_w31] ctx_len=0 ctxs=1 0.0s -[clean_w31] KN build done: 81.6s -[clean_w31] NN 3.29M params cfg=TrainConfig(d=256 L=4 H=4 bs=32 T=1024 steps=1200) -[clean_w31] NN step 0/1200 loss 5.5452 elapsed 1s -[clean_w31] NN step 100/1200 loss 1.7587 elapsed 7s -[clean_w31] NN step 200/1200 loss 1.4674 elapsed 13s -[clean_w31] NN step 300/1200 loss 1.3990 elapsed 19s -[clean_w31] NN step 400/1200 loss 1.3359 elapsed 25s -[clean_w31] NN step 500/1200 loss 1.2644 elapsed 32s -[clean_w31] NN step 600/1200 loss 1.2352 elapsed 38s -[clean_w31] NN step 700/1200 loss 1.1895 elapsed 44s -[clean_w31] NN step 800/1200 loss 1.1475 elapsed 50s -[clean_w31] NN step 900/1200 loss 1.1349 elapsed 56s -[clean_w31] NN step 1000/1200 loss 1.1164 elapsed 62s -[clean_w31] NN step 1100/1200 loss 1.1330 elapsed 68s -[clean_w31] NN step 1199/1200 loss 1.1061 elapsed 74s -training: 14,047.9 J duration=159.5s -evaluating on val split ... - eval 1,200/60,000 ( 2.0%) acc=0.7333 127 char/s eta= 464s - eval 2,400/60,000 ( 4.0%) acc=0.7221 126 char/s eta= 456s - eval 3,600/60,000 ( 6.0%) acc=0.7222 126 char/s eta= 447s - eval 4,800/60,000 ( 8.0%) acc=0.7298 126 char/s eta= 437s - eval 6,000/60,000 ( 10.0%) acc=0.7277 127 char/s eta= 426s - eval 7,200/60,000 ( 12.0%) acc=0.7235 127 char/s eta= 415s - eval 8,400/60,000 ( 14.0%) acc=0.7229 128 char/s eta= 404s - eval 9,600/60,000 ( 16.0%) acc=0.7286 128 char/s eta= 394s - eval 10,800/60,000 ( 18.0%) acc=0.7342 128 char/s eta= 384s - eval 12,000/60,000 ( 20.0%) acc=0.7347 128 char/s eta= 374s - eval 13,200/60,000 ( 22.0%) acc=0.7391 128 char/s eta= 364s - eval 14,400/60,000 ( 24.0%) acc=0.7410 129 char/s eta= 355s - eval 15,600/60,000 ( 26.0%) acc=0.7424 129 char/s eta= 345s - eval 16,800/60,000 ( 28.0%) acc=0.7456 129 char/s eta= 336s - eval 18,000/60,000 ( 30.0%) acc=0.7466 129 char/s eta= 326s - eval 19,200/60,000 ( 32.0%) acc=0.7496 129 char/s eta= 317s - eval 20,400/60,000 ( 34.0%) acc=0.7513 129 char/s eta= 307s - eval 21,600/60,000 ( 36.0%) acc=0.7513 129 char/s eta= 298s - eval 22,800/60,000 ( 38.0%) acc=0.7514 129 char/s eta= 288s - eval 24,000/60,000 ( 40.0%) acc=0.7513 129 char/s eta= 279s - eval 25,200/60,000 ( 42.0%) acc=0.7514 129 char/s eta= 270s - eval 26,400/60,000 ( 44.0%) acc=0.7524 129 char/s eta= 260s - eval 27,600/60,000 ( 46.0%) acc=0.7518 129 char/s eta= 251s - eval 28,800/60,000 ( 48.0%) acc=0.7523 129 char/s eta= 242s - eval 30,000/60,000 ( 50.0%) acc=0.7518 129 char/s eta= 232s - eval 31,200/60,000 ( 52.0%) acc=0.7493 129 char/s eta= 223s - eval 32,400/60,000 ( 54.0%) acc=0.7480 129 char/s eta= 214s - eval 33,600/60,000 ( 56.0%) acc=0.7460 129 char/s eta= 204s - eval 34,800/60,000 ( 58.0%) acc=0.7462 129 char/s eta= 195s - eval 36,000/60,000 ( 60.0%) acc=0.7464 129 char/s eta= 186s - eval 37,200/60,000 ( 62.0%) acc=0.7463 129 char/s eta= 176s - eval 38,400/60,000 ( 64.0%) acc=0.7460 129 char/s eta= 167s - eval 39,600/60,000 ( 66.0%) acc=0.7457 129 char/s eta= 158s - eval 40,800/60,000 ( 68.0%) acc=0.7448 129 char/s eta= 148s - eval 42,000/60,000 ( 70.0%) acc=0.7439 130 char/s eta= 139s - eval 43,200/60,000 ( 72.0%) acc=0.7438 129 char/s eta= 130s - eval 44,400/60,000 ( 74.0%) acc=0.7434 129 char/s eta= 120s - eval 45,600/60,000 ( 76.0%) acc=0.7432 129 char/s eta= 111s - eval 46,800/60,000 ( 78.0%) acc=0.7425 130 char/s eta= 102s - eval 48,000/60,000 ( 80.0%) acc=0.7425 130 char/s eta= 93s - eval 49,200/60,000 ( 82.0%) acc=0.7423 130 char/s eta= 83s - eval 50,400/60,000 ( 84.0%) acc=0.7430 130 char/s eta= 74s - eval 51,600/60,000 ( 86.0%) acc=0.7432 130 char/s eta= 65s - eval 52,800/60,000 ( 88.0%) acc=0.7428 130 char/s eta= 56s - eval 54,000/60,000 ( 90.0%) acc=0.7429 130 char/s eta= 46s - eval 55,200/60,000 ( 92.0%) acc=0.7419 130 char/s eta= 37s - eval 56,400/60,000 ( 94.0%) acc=0.7420 130 char/s eta= 28s - eval 57,600/60,000 ( 96.0%) acc=0.7423 130 char/s eta= 19s - eval 58,800/60,000 ( 98.0%) acc=0.7429 130 char/s eta= 9s - eval 60,000/60,000 (100.0%) acc=0.7437 130 char/s eta= 0s -chars=60,000 acc=0.7437 eval_duration=462.8s ---- -submission : alpha_06 -training energy (J): 14,047.9 -training duration : 159.5s -val char-accuracy : 0.7437 -val chars : 60,000 -wrote /tmp/result.json -Stopping app - local entrypoint completed. -✓ App completed. View run at -https://modal.com/apps/gabriel-nakajima-an/main/ap-4rmVWzqcQj1VatkwS8Da4X - -# final result -{ - "submission": "alpha_06", - "training_energy_J": 14047.8704136, - "training_duration_s": 159.464391728, - "val_char_accuracy": 0.7437, - "val_chars": 60000, - "gpu_name": "NVIDIA A100-SXM4-80GB", - "date_utc": "2026-05-20T01:10:10Z", - "_nvml": { - "nvml_available": true, - "energy_counter_supported": true, - "monotonic": true, - "idle_watts": 61.72905000000001, - "stress_watts_avg": 336.50938625277473, - "stress_energy_joules": 12708.794, - "stress_duration_s": 37.766536445, - "gpu_name": "NVIDIA A100-SXM4-80GB", - "notes": [] - }, - "contributor": "@subagent-xorfix-2026-05-19" -}