Explore latent thought training for data efficiency

Ref: https://arxiv.org/abs/2503.18866

**Why**
We lack data, but have plenty of compute for the current scale of models we're interested in. Being able to generate more data in a smart way might be key to actually improving models. BoLT seems like a promising way to do that.

**Approach**
Implement the method and run experiments on e.g. 7B CPT w/ Dynaword or similar scale data. This method can soak up a lot of compute, so it'll be important to cap compute usage for experiments, maybe 20k MI250X hours to show positive results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore latent thought training for data efficiency #55

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Explore latent thought training for data efficiency #55

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions