Info/Documentation on chunkwise training

Hi there. I want to understand how to use the RetNet to train a model with the longer context. It is not clear from available documentation how to train the model for a large context. There is no parameters for Trainer of TrainingArguments how do one actually passes 100k text with 512 chunksize into the model during the training? Should one chunk the text themselves with overlapped text fragments? Or there is a method to pass 100k text and it will be processed?

What exactly these outputs can be used for?
```
chunk_outputs = model(input_ids, forward_impl='chunkwise', use_cache=True, recurrent_chunk_size=4)
chunk_state = chunk_outputs.last_hidden_state
chunk_cache = chunk_outputs.past_key_values
```
Should one mix these outputs into the beginning of a next input during the data preparation?

Please help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Info/Documentation on chunkwise training #30

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Info/Documentation on chunkwise training #30

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions