-
Notifications
You must be signed in to change notification settings - Fork 107
Description
Hi! Thank you for sharing this repository—really impressive work, and I enjoyed reading the paper.
While reviewing the codebase, I noticed that the provided example training script appears to be set up for an Alpaca dataset. I was wondering if you might be able to release the LLaMA-Nemotron post-training dataset based training code used for the main paper results, including:
- Dataset preprocessing details, especially how you handle thinking tokens (e.g., Qwen2.5 does not natively support “thinking” mode, but think tokens are included in the training dataset).
- The full set of hyperparameters used in training (optimizer, LR schedule, batch size, warmup, sequence length, seed, etc.), ideally in a single config file or command line for reproducibility. Many are noted in the paper, but still some are missing so it is hard to reproduce the results.
Separately, it looks like parts of the current implementation rely heavily on modeling.py in the HuggingFace model code. Do you have any plans to handle the modeling changes directly in this repository? I think having the modeling modifications visible here would make it much easier for others to read, compare, and contribute—especially if you plan to expand support to additional model architectures.
Thanks again for the great work, and I’d really appreciate any guidance or updates you can share.