Details about the DPO training config

Thank you for your excellent work on the recent paper. I noticed that some training hyperparameters were provided, such as:
 - Learning rate: 1e-4
 - Iterations: 1500
 - Number of data pairs: 6,000
 
I’m writing to ask if you could kindly share more details about the DPO training setup—specifically the **batch size**, **gradient accumulation steps**, and the value of **beta_dpo** used during training.
Thank you in advance for your time and assistance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Details about the DPO training config #9

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Details about the DPO training config #9

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions