Skip to content

diffugpt-s (127M) generations aren't fluent and repeats #23

@nitsanluke

Description

@nitsanluke

Hi @summmeer ,

Thanks for all the great contributions on the new dLLMs. I tested the small model (diffugpt-s) with the inference script provided. It yields repeats and incomplete generations. Is this something you've observed or am I missing something in the inference setting.

  1. If this is indeed the case can you elaborate on what size of the models do you start to find more fluent and useful generations.
  2. Also the small models were CPTd for ~130B tokens and the larger Llama model was CPTd on ~60B, was it a compute limitation or the larger model converts earlier.

Thanks!

Inferences setting:

script: inf_diffugpt.py

# conditional generation with 16 new tokens and 16 steps.


   diffusion_steps = 16
    gen_len = 16

    print("="*20, "Prefix gen...")
    prefix = [tokenizer.bos_token_id] + tokenizer.encode("obama is the president")

    src_mask = [1]*len(prefix)+[0]*(gen_len)
    x0 = prefix + [0]*(gen_len)

    inputs = {
        "input_ids": torch.tensor([x0]), 
        "src_mask": torch.tensor([src_mask])
    }
    print(inputs)
    torch.manual_seed(1234)
    res = generate_samples(model, args, tokenizer, inputs, verbose=args.verbose)
    pred = tokenizer.decode(res.tolist()[0])
    print(pred)
"obama is the president that is being president the president of assistant assistant vice is and assistant vice is the"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions