Skip to content

[v2] Reproducibility: release paper training script, preprocessing, and exact hyperparameters #62

@yeonjoon-jung01

Description

@yeonjoon-jung01

Hi! Thank you for sharing this repository—really impressive work, and I enjoyed reading the paper.

While reviewing the codebase, I noticed that the provided example training script appears to be set up for an Alpaca dataset. I was wondering if you might be able to release the LLaMA-Nemotron post-training dataset based training code used for the main paper results, including:

  • Dataset preprocessing details, especially how you handle thinking tokens (e.g., Qwen2.5 does not natively support “thinking” mode, but think tokens are included in the training dataset).
  • The full set of hyperparameters used in training (optimizer, LR schedule, batch size, warmup, sequence length, seed, etc.), ideally in a single config file or command line for reproducibility. Many are noted in the paper, but still some are missing so it is hard to reproduce the results.

Separately, it looks like parts of the current implementation rely heavily on modeling.py in the HuggingFace model code. Do you have any plans to handle the modeling changes directly in this repository? I think having the modeling modifications visible here would make it much easier for others to read, compare, and contribute—especially if you plan to expand support to additional model architectures.

Thanks again for the great work, and I’d really appreciate any guidance or updates you can share.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions