Skip to content

DDP Performance #37

@ClashLuke

Description

@ClashLuke

I've tried reproducing your results, which works well when running on one GPU. It just takes a long time to train. So, your recent DDP addition was an exciting new feature I had to try out.
Unfortunately, scaling from 1x H100 to 8x H100 (in the same DGX-H100 node) decreases it/sec from 8.71 to 0.61. Assuming it does 8x as much work per step, that still is slower than the single-GPU baseline.

Did I miss some config options?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions