I don't know if I should input attention_mask in the SFT process

like your code:

input_ids = torch. LongTensor([[1,2,3,4,1,2,5,5],
                              [5,5,1,2,3,4,1,2]]). to(device)
retention_mask = torch. LongTensor([[1,1,1,1,1,1,0,0],
                                   [0,0,1,1,1,1,1,1]]). to(device)

parallel_outputs = model(input_ids, retention_mask=retention_mask, forward_impl='parallel', use_cache=True)

If I want to sft train, should I pass in retention_mask, or do I just need input_ids and labels



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

I don't know if I should input attention_mask in the SFT process #40

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

I don't know if I should input attention_mask in the SFT process #40

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions