wikitext

Train a character-level language model from scratch on WikiText-103. Submissions are scored by lowest energy to reach 0.7 character prediction accuracy. Scorer is agnostic to backprop. Submissions of novel learning algorithms are encouraged!

Quickstart

Prerequisites:

Modal account.
Python 3.11

# From wip-wikitext/

python3 -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt

modal token new

python submit.py submissions/modded_nanogpt

Record History

Date	Energy (J)	Val char-acc	GPU	Config	Submission	Contributor
2026-05-12	51,704	0.7374	A100 80GB PCIe	modded_nanogpt	dir	@KellerJordan
2026-05-18	46,222	0.7238	A100 80GB PCIe	lwta_k4	dir	@ab-10
2026-05-18	46,132	0.7146	A100 80GB PCIe	lwta_k2	dir	@ab-10
2026-05-18	55,459	DQ	A100 80GB SXM4	hyena	dir	@ab-10
2026-05-18	20,348	DQ	A100 80GB SXM4	pointer_sentinel	dir	@ab-10
2026-05-18	3,612	DQ	A100 80GB PCIe	chunker_d1	dir	@ab-10
2026-05-18	735	DQ	A100 80GB PCIe	ppm_c	dir	@ab-10
2026-05-17	70	DQ	A100 80GB SXM4	P2-A_random_projection	dir	@ab-10

Rules

Train a character-level language model from scratch on WikiText-103-raw-v1. Submissions that meet the constraints below are ranked by training energy (joules), lower wins. Greedy-argmax char-accuracy is computed on the first 60,000 chars of each split; val is gated by rule 5, test is reported alongside but not gated.

Submissions must:

Train from scratch. (No pre-trained weights — WikiText overlaps WebText, so pre-trained init poisons the comparison.)
Use the standard WikiText-103 train/valid/test split. (You can change batch size, sequence length, attention structure, etc.; just don't change the underlying streams of characters.)
Expose a streaming next-character distribution via the CharModel API. (The runner calls predict() for position i strictly before observe() commits the ground-truth at position i — within-document future-peeking is structurally impossible.) a. Implementing CharModel ABC from wikitext.py is the most straightforward way to do this.
Finish training in < 300 s wall-clock on the pinned Modal A100-80GB PCIe, measured from the first call into train() to its return. (Eval is not charged against this budget.)
Attain val char-acc ≥ 0.70 on the first 60,000 chars of the val split.

Internal representations

The char-level scoring contract (rules 2 + 3) constrains the eval-facing interface, not the model's internal representation. Submissions should pick whichever internal unit (bytes, BPE, WordPiece, words, ...) best optimises their architecture's energy-to-accuracy ratio, and translate to per-char probabilities at the CharModel boundary.

Allowed. Internal BPE / WordPiece / word vocabulary; subword-aware decoders; char-level marginalisation over partial-token continuations; hybrid architectures that mix per-char and per-token computation.
Encouraged where it helps. Sub-quadratic mixers (Hyena, Mamba, etc.) benefit from shorter sequences; BPE typically gives 4–5× shorter context vs bytes, which can lift the per-step budget enough for those architectures to actually converge inside 300 s.
Tokeniser merge tables (GPT-2 BPE, SentencePiece, etc.) are not "pretrained weights". They are deterministic algorithms over byte/codepoint streams, allowed under rule 1.
Pretrained tokeniser model weights are not allowed — e.g., shipping a SentencePiece model that was trained on external data — same rationale as rule 1.

For an internal-BPE submission, predict() returns P(next_char | observed_chars) by marginalising over BPE tokens consistent with the current byte buffer: for an active partial buffer p and candidate next char c, P(c) = Σ_t P(t | committed_context) · 1[bytes(t) starts with p+c], normalised over the candidate set whose tokens start with p.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
fixtures/tiny		fixtures/tiny
research		research
submissions		submissions
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
bake_wikitext.py		bake_wikitext.py
fetch_data.py		fetch_data.py
requirements.txt		requirements.txt
run_eval.py		run_eval.py
submit.py		submit.py
task.py		task.py
test_wikitext.py		test_wikitext.py
verify_nvml.py		verify_nvml.py
wikitext.py		wikitext.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wikitext

Quickstart

Record History

Rules

Internal representations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

wikitext

Quickstart

Record History

Rules

Internal representations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages