Motivation
Why is this evaluation important for HOPE/TITAN reproduction?
To see if Nested Learning can live up to the hype of emergent learning when compared to AR and Diffusion training method for basic pre-training
Task details
Implementation sketch
Outline scripts/flags needed (e.g., extend scripts/eval/zeroshot.py).
Acceptance criteria
Describe what needs to be captured (JSON fields, plots, etc.).
Reports on how over-optimized autoregression cannot outperform Nested Learning when given the same clock time and/or FLOPs, to target the same dataset but getting lower Perplexity.
Motivation
Why is this evaluation important for HOPE/TITAN reproduction?
To see if Nested Learning can live up to the hype of emergent learning when compared to AR and Diffusion training method for basic pre-training
Task details
Implementation sketch
Outline scripts/flags needed (e.g., extend
scripts/eval/zeroshot.py).Acceptance criteria
Describe what needs to be captured (JSON fields, plots, etc.).
Reports on how over-optimized autoregression cannot outperform Nested Learning when given the same clock time and/or FLOPs, to target the same dataset but getting lower Perplexity.