[Feature][AutoDeploy]: Explore optimal sharding configurations

### 🚀 The feature, motivation and pitch

Currently, our sharding config is based on LLMArgs such as 
```
    sharding_source: ['manual', 'factory', 'heuristic']
    support_partial_config: true
    sharding_dims: ['tp', 'ep', 'bmm']
    shard_all_unprocessed: false
    dist_mapping: {'tp': 2, 'ep' :2]
```
There are still open questions regarding sharding, especially around MoE, and what is the optimal strategy for:
- shared experts
- latent projections (for MoLE)
- MLA: latent projections
The PT backend does not expose these configurations. The only source of truth for sharding is the `Mapping` object. Figure out what PT backed does with these nodes and if this is truly optimal.

On the other hand, based on pareto plots, w know that depending on the troughput-latency tradeoff, different parallel configurations are optimal, transitioning from DEP to TEP, to TP. Determine inflation points and configure the runtime to dynamically switch configurations depending on runtime parameters.
    

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature][AutoDeploy]: Explore optimal sharding configurations #11656

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature][AutoDeploy]: Explore optimal sharding configurations #11656

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions