-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
🚀 The feature, motivation and pitch
The MoE-all-to-all PR #10985 cleaned up the definition of sharding_dims and sharding_source.
sharding_dims: ['tp', 'ep', 'bmm'] specifies which transformations will be applied
sharding_source: ['manual', 'factory'] specifies the source of TP sharding heuristics.
What it means:
Previously
sharding_dims: ['ep', 'bmm']
sharding_source: ['manual', 'factory']
meant that
a) heuristics pattern matcher will be applied to ep and bmm transformations, but not tp.
b) for TP, there will be no heuristic pass, but only manual and/or factory tp config will be applied
this was confusing, as tp sharding will be applied even though it is not listed in sharding_dims: ['ep', 'bmm'].
The above PR reversed the priorities, but dedicated configs need to be updated. Now:
`sharding_dims: ['tp', 'ep', 'bmm']`
should almost always be set to include tp. But for manual TP sharding configuration,
sharding_source: ['manual']
should be specified.
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status