Dynamic/variable batch size support

For the model I am training, I am relying on a custom [Sampler](https://pytorch.org/docs/stable/data.html#torch.utils.data.Sampler), that returns variable batch sizes. My task at hand is translation, where I following [Attention is all you need (2017)](https://papers.nips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf) create batches based on total token count in a batch, which given the variable length input, results in batches of varying numbers of examples (examples here being one source/target text translation pair).

For regular DDP based training, this worked fine, by simply creating a distributed version of this sampler, to split the variable size batch into sub-batches based on the GPU rank. For DeepSpeed however, I am forced to provide either `train_micro_batch_size_per_gpu` or `train_batch_size`, both my current understanding tells me are based on the number of examples in the batch.

As the number of examples in my batch varies for each batch, and I just want to configure the accumulation based on batch count, rather than batch size, I'm not sure how to achieve this with DeepSpeed's configuration.

Am I misunderstanding the impact of the configuration variables, missing some other configuration, or is this not possible to achieve at the moment?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic/variable batch size support #1051

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dynamic/variable batch size support #1051

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions