Skip to content

Model degeneration issue? #19

@astradzhao

Description

@astradzhao

Hello,

I have tried training my own version of DiffuLLaMA but unfortunately I have ran into model degeneration when attempting to generate text outputs. When generating answers using the scoring function for datasets like HSwag, Wino, SIQA, etc. I can get somewhat comparable performance to that of the paper (~40-50% acc for each dataset), but I am unable to produce any reasonable text output for other tasks like Gsm8k. (The model only generates a bunch of new line and comma tokens)

I have trained on Llama-7b on 1 split of SlimPajama with these hyperparameters:
batch-size 16
gradient-accumulate-every 8
max-train-steps 1000
learning-rate 1.5e-5
seq-length 2048

I was wondering if I need to change any of these hyperparameters or perhaps am required to do training on more than just 1 split in order to produce meaningful text outputs? but I also had to change some stuff in training script for training to work on my end, so there may be some errors in my training script that there were not in the original.

Would just love some guidance here, as me and my lab are trying to do some followup work on this paper.

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions