Model degeneration issue?

Hello,

I have tried training my own version of DiffuLLaMA but unfortunately I have ran into model degeneration when attempting to generate text outputs. When generating answers using the scoring function for datasets like HSwag, Wino, SIQA, etc. I can get somewhat comparable performance to that of the paper (~40-50% acc for each dataset), but I am unable to produce any reasonable text output for other tasks like Gsm8k. (The model only generates a bunch of new line and comma tokens)

I have trained on Llama-7b on 1 split of SlimPajama with these hyperparameters:
batch-size 16
gradient-accumulate-every 8  
max-train-steps 1000 
learning-rate 1.5e-5 
seq-length 2048

I was wondering if I need to change any of these hyperparameters or perhaps am required to do training on more than just 1 split in order to produce meaningful text outputs? but I also had to change some stuff in training script for training to work on my end, so there may be some errors in my training script that there were not in the original.

Would just love some guidance here, as me and my lab are trying to do some followup work on this paper.

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model degeneration issue? #19

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Model degeneration issue? #19

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions