DDP Performance

I've tried reproducing your results, which works well when running on one GPU. It just takes a long time to train. So, your recent DDP addition was an exciting new feature I had to try out.\
Unfortunately, scaling from 1x H100 to 8x H100 (in the same DGX-H100 node) decreases it/sec from 8.71 to 0.61. Assuming it does 8x as much work per step, that still is slower than the single-GPU baseline.

Did I miss some config options? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DDP Performance #37

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

DDP Performance #37

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions