Skip to content

Releases: ROCm/madengine

v2.0.0

09 Apr 13:59
04aac39

Choose a tag to compare

🎉 What's New

madengine v2.0 is a complete rewrite of the MAD orchestration engine with a modern, production-ready architecture. This major release replaces the legacy v1.x codebase with a unified CLI, comprehensive error handling, and support for distributed AI workloads across Kubernetes and SLURM.

🚀 Key Highlights

Unified CLI Experience

One command to rule them all: madengine now provides a consistent interface for all operations.

Multi-Target Deployment

Run AI workloads wherever you need them:

  • Local: Direct Docker execution for development and single-GPU jobs
  • Kubernetes: Production-ready K8s Jobs with full launcher support
  • SLURM: HPC cluster integration with intelligent job scheduling

Distributed Framework Support

Native support for 6 distributed training and inference frameworks:

Training:

  • torchrun (PyTorch DDP/FSDP)
  • DeepSpeed (ZeRO optimization)
  • Megatron-LM (large-scale transformers)
  • TorchTitan (LLM pre-training with FSDP2+TP+PP+CP)

Inference:

  • vLLM (high-throughput LLM inference)
  • SGLang (structured generation)

All launchers work seamlessly with both Kubernetes and SLURM deployments.

Advanced Profiling

Comprehensive ROCm profiling suite for AMD GPUs:

  • 8 pre-configured profiles: compute, memory, communication, full analysis, and more
  • ROCprofv3 support: Latest ROCm 7.0+ profiling capabilities
  • Perfetto integration: Generate traces for Perfetto UI visualization
  • Ready-to-use configs: 6 example configurations in examples/profiling-configs/

Production-Grade Quality

  • 4.5/5 code quality rating (detailed metrics in CODE_QUALITY_REPORT_v2.md)
  • 71% type hint coverage with mypy validation
  • Zero technical debt: No TODO/FIXME/HACK markers
  • Pre-commit hooks: Automated quality checks (black, isort, flake8, mypy, bandit)
  • Security fixes: SQL injection vulnerability patched, improved exception handling

What's Changed

  • madengine v2 with unified framework for local and distribution by @coketaste in #57

Full Changelog: v1.0.0...v2.0.0

v1.0.0

08 Apr 21:07
4438d32

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: https://github.com/ROCm/madengine/commits/v1.0.0