-
Notifications
You must be signed in to change notification settings - Fork 26
Open
Description
Hi there. I want to understand how to use the RetNet to train a model with the longer context. It is not clear from available documentation how to train the model for a large context. There is no parameters for Trainer of TrainingArguments how do one actually passes 100k text with 512 chunksize into the model during the training? Should one chunk the text themselves with overlapped text fragments? Or there is a method to pass 100k text and it will be processed?
What exactly these outputs can be used for?
chunk_outputs = model(input_ids, forward_impl='chunkwise', use_cache=True, recurrent_chunk_size=4)
chunk_state = chunk_outputs.last_hidden_state
chunk_cache = chunk_outputs.past_key_values
Should one mix these outputs into the beginning of a next input during the data preparation?
Please help.
Metadata
Metadata
Assignees
Labels
No labels