Skip to content

[Feature]: Update sharding config dims #11583

@greg-kwasniewski1

Description

@greg-kwasniewski1

🚀 The feature, motivation and pitch

The MoE-all-to-all PR #10985 cleaned up the definition of sharding_dims and sharding_source.

sharding_dims: ['tp', 'ep', 'bmm'] specifies which transformations will be applied
sharding_source: ['manual', 'factory'] specifies the source of TP sharding heuristics.

What it means:

Previously

    sharding_dims: ['ep', 'bmm']
    sharding_source: ['manual', 'factory']

meant that
a) heuristics pattern matcher will be applied to ep and bmm transformations, but not tp.
b) for TP, there will be no heuristic pass, but only manual and/or factory tp config will be applied

this was confusing, as tp sharding will be applied even though it is not listed in sharding_dims: ['ep', 'bmm'].

The above PR reversed the priorities, but dedicated configs need to be updated. Now:

`sharding_dims: ['tp', 'ep', 'bmm']` 

should almost always be set to include tp. But for manual TP sharding configuration,

 sharding_source: ['manual']

should be specified.

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Labels

Scale-out<NV>Multi-GPU and distributed inference scaling issues, tensor/pipeline/data parallelismfeature requestNew feature or request. This includes new model, dtype, functionality support

Type

No type

Projects

Status

In review

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions