Skip to content

Info/Documentation on chunkwise training #30

@pkpro

Description

@pkpro

Hi there. I want to understand how to use the RetNet to train a model with the longer context. It is not clear from available documentation how to train the model for a large context. There is no parameters for Trainer of TrainingArguments how do one actually passes 100k text with 512 chunksize into the model during the training? Should one chunk the text themselves with overlapped text fragments? Or there is a method to pass 100k text and it will be processed?

What exactly these outputs can be used for?

chunk_outputs = model(input_ids, forward_impl='chunkwise', use_cache=True, recurrent_chunk_size=4)
chunk_state = chunk_outputs.last_hidden_state
chunk_cache = chunk_outputs.past_key_values

Should one mix these outputs into the beginning of a next input during the data preparation?

Please help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions