diffugpt-s (127M) generations aren't fluent and repeats

Hi @summmeer ,

Thanks for all the great contributions on the new dLLMs. I tested the small model (diffugpt-s) with the inference script provided. It yields repeats and incomplete generations. Is this something you've observed or am I missing something in the inference setting. 

1. If this is indeed the case can you elaborate on what size of the models do you start to find more fluent and useful generations. 
2. Also the small models were CPTd for ~130B tokens and the larger Llama model was CPTd on ~60B, was it a compute limitation or the larger model converts earlier. 
 
Thanks! 

Inferences setting:

```
script: inf_diffugpt.py

# conditional generation with 16 new tokens and 16 steps.


   diffusion_steps = 16
    gen_len = 16

    print("="*20, "Prefix gen...")
    prefix = [tokenizer.bos_token_id] + tokenizer.encode("obama is the president")

    src_mask = [1]*len(prefix)+[0]*(gen_len)
    x0 = prefix + [0]*(gen_len)

    inputs = {
        "input_ids": torch.tensor([x0]), 
        "src_mask": torch.tensor([src_mask])
    }
    print(inputs)
    torch.manual_seed(1234)
    res = generate_samples(model, args, tokenizer, inputs, verbose=args.verbose)
    pred = tokenizer.decode(res.tolist()[0])
    print(pred)
"obama is the president that is being president the president of assistant assistant vice is and assistant vice is the"
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

diffugpt-s (127M) generations aren't fluent and repeats #23

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

diffugpt-s (127M) generations aren't fluent and repeats #23

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions