Thank you for your excellent work on the recent paper. I noticed that some training hyperparameters were provided, such as:
- Learning rate: 1e-4
- Iterations: 1500
- Number of data pairs: 6,000
I’m writing to ask if you could kindly share more details about the DPO training setup—specifically the batch size, gradient accumulation steps, and the value of beta_dpo used during training.
Thank you in advance for your time and assistance.
Thank you for your excellent work on the recent paper. I noticed that some training hyperparameters were provided, such as:
I’m writing to ask if you could kindly share more details about the DPO training setup—specifically the batch size, gradient accumulation steps, and the value of beta_dpo used during training.
Thank you in advance for your time and assistance.