[Dev] Add Llama3 training example and fix cache save & static shape marking by wtr0504 · Pull Request #13 · SandAI-org/MagiCompiler

wtr0504 · 2026-04-01T11:32:07Z

🗂️ PR Category

Summary

Add end-to-end Llama3 training example (example/training/) with FSDP support, a distributed training script, and an Nsys profiling launch script.
Fix a cache save bug where aot_autograd artifacts were empty, causing compiled graphs to fail to persist correctly.

Changes

example/training/llama3.py — Llama3 model definition adapted to use magi_compile
example/training/train.py — distributed training loop with FSDP and NVTX profiling hooks
example/training/train.sh — torchrun launcher with optional Nsys profiling
magi_compiler/magi_backend/piecewise_compiler.py — workaround for empty aot_autograd artifacts on cache save
magi_compiler/utils/nvtx.py — profiler for iteration

wtr0504 added 5 commits April 1, 2026 19:22

refactor: simplify heuristic save-node selection & add Transformer test

0a01d34

chore

7e55fd1

dev: Add training example with llama & fix cache save bug

cf5da14

chore

c66cec3

chore

925aab8

wtr0504 closed this Apr 1, 2026

wtr0504 deleted the dev/training branch April 1, 2026 11:40