Skip to content

[Dev] Add Llama3 training example and fix cache save & static shape marking #13

Closed
wtr0504 wants to merge 5 commits intoSandAI-org:mainfrom
wtr0504:dev/training
Closed

[Dev] Add Llama3 training example and fix cache save & static shape marking #13
wtr0504 wants to merge 5 commits intoSandAI-org:mainfrom
wtr0504:dev/training

Conversation

@wtr0504
Copy link
Copy Markdown
Collaborator

@wtr0504 wtr0504 commented Apr 1, 2026

🗂️ PR Category

  • ✨ New Feature
  • 🚀 Optimization (performance, memory, etc.)
  • 💥 Breaking Change
  • 🐛 Bug Fix
  • 🛠️ Development / Refactoring
  • 📚 Documentation
  • 🧹 Chore (Dependencies, CI/CD, Configuration, etc.)
  • 🧪 Testing

📝 Description

Summary

  • Add end-to-end Llama3 training example (example/training/) with FSDP support, a distributed training script, and an Nsys profiling launch script.
  • Fix a cache save bug where aot_autograd artifacts were empty, causing compiled graphs to fail to persist correctly.

Changes

  • example/training/llama3.py — Llama3 model definition adapted to use magi_compile
  • example/training/train.py — distributed training loop with FSDP and NVTX profiling hooks
  • example/training/train.sh — torchrun launcher with optional Nsys profiling
  • magi_compiler/magi_backend/piecewise_compiler.py — workaround for empty aot_autograd artifacts on cache save
  • magi_compiler/utils/nvtx.py — profiler for iteration

@wtr0504 wtr0504 closed this Apr 1, 2026
@wtr0504 wtr0504 deleted the dev/training branch April 1, 2026 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant