[v2] Reproducibility: release paper training script, preprocessing, and exact hyperparameters

Hi! Thank you for sharing this repository—really impressive work, and I enjoyed reading the paper.

While reviewing the codebase, I noticed that the provided example training script appears to be set up for an Alpaca dataset. I was wondering if you might be able to release the `LLaMA-Nemotron post-training dataset` based training code used for the main paper results, including:
- Dataset preprocessing details, especially how you handle thinking tokens (e.g., Qwen2.5 does not natively support “thinking” mode, but think tokens are included in the training dataset).
- The full set of hyperparameters used in training (optimizer, LR schedule, batch size, warmup, sequence length, seed, etc.), ideally in a single config file or command line for reproducibility. Many are noted in the paper, but still some are missing so it is hard to reproduce the results.

Separately, it looks like parts of the current implementation rely heavily on modeling.py in the HuggingFace model code. Do you have any plans to handle the modeling changes directly in this repository? I think having the modeling modifications visible here would make it much easier for others to read, compare, and contribute—especially if you plan to expand support to additional model architectures.

Thanks again for the great work, and I’d really appreciate any guidance or updates you can share.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v2] Reproducibility: release paper training script, preprocessing, and exact hyperparameters #62

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[v2] Reproducibility: release paper training script, preprocessing, and exact hyperparameters #62

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions