-
Notifications
You must be signed in to change notification settings - Fork 26
Description
Hello,
I have tried training my own version of DiffuLLaMA but unfortunately I have ran into model degeneration when attempting to generate text outputs. When generating answers using the scoring function for datasets like HSwag, Wino, SIQA, etc. I can get somewhat comparable performance to that of the paper (~40-50% acc for each dataset), but I am unable to produce any reasonable text output for other tasks like Gsm8k. (The model only generates a bunch of new line and comma tokens)
I have trained on Llama-7b on 1 split of SlimPajama with these hyperparameters:
batch-size 16
gradient-accumulate-every 8
max-train-steps 1000
learning-rate 1.5e-5
seq-length 2048
I was wondering if I need to change any of these hyperparameters or perhaps am required to do training on more than just 1 split in order to produce meaningful text outputs? but I also had to change some stuff in training script for training to work on my end, so there may be some errors in my training script that there were not in the original.
Would just love some guidance here, as me and my lab are trying to do some followup work on this paper.
Thank you!