Release v1.0 · MooreThreads/SimuMax

This release delivers a significant breakthrough in the accuracy of memory and performance estimation for large models. It also introduces several major features to enhance model compatibility, flexibility, and user experience.

Highlights

Dramatically Improved Estimation Accuracy:
- Memory Estimation: Expanded test coverage for both Dense and MoE models. Memory estimation error is now consistently controlled within 1%.
- Performance Estimation:
  - On NVIDIA A100-PCIE, performance estimation error is consistently below 3%.

New Features & Enhancements

MLA Support:
- Introduced support for the MLA model architecture
Enhanced Layer Specification:
- Added granular control for defining first-stage and last-stage layers in pipeline parallelism, allowing for more optimized model partitioning.
Advanced MoE Customization:
- Support for customizable dense layers in Mixture-of-Experts (MoE) models, providing greater flexibility in model design.
Megatron Compatibility Layer:
- Launched a simplified model migration pipeline for effortless conversion and analysis of models built with NVIDIA's Megatron framework.
Optimized Recomputation Strategy:
- Implemented finer-grained selective recompute, enabling more precise control over the memory-for-computation trade-off to optimize for larger model sizes or higher throughput.
Comprehensive Efficiency Analysis:
- New capability to measure and analyze efficiency and utilization across various tensor shapes and memory layouts.

Bug Fixes

Fixed an incorrect token numbers calculation when etp > 1.
Corrected the FLOPs or memory access (e.g., HBM access volume) calculation for several operators.
Resolved inaccuracies in the estimated communication volume and associated data types.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

New Features & Enhancements

Bug Fixes

Uh oh!