Skip to content

Support multiple dataloader for grpo #1603

@yuki-97

Description

@yuki-97
  1. Support multiple dataloaders for multiple datasets so that we can control how much to load from each dataset.
  2. Provide an interface and a simplified example of how to control the ratio of each dataset
    1. E.g. at one training step, we can load 2 subbatches from dataloader1 and 3 subbatches from dataloader2. Then in the final training batch, the corresponding task ratio will be 2:3.
    2. The implementation will be similar to custom-plarallel-plan: write the custom logic in a file, then point to that file in the config.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions