[Dev] Add Llama3 training example and fix cache save#14
Open
wtr0504 wants to merge 3 commits intoSandAI-org:mainfrom
Open
[Dev] Add Llama3 training example and fix cache save#14wtr0504 wants to merge 3 commits intoSandAI-org:mainfrom
wtr0504 wants to merge 3 commits intoSandAI-org:mainfrom
Conversation
jiahy0825
reviewed
Apr 1, 2026
| from dataclasses import dataclass | ||
| from typing import Optional, Tuple | ||
|
|
||
| import fairscale.nn.model_parallel.initialize as fs_init |
Collaborator
There was a problem hiding this comment.
Do we need to add this pkg to requirements-test.txt?
| device = torch.device("cpu") | ||
|
|
||
| # Initialize a small config for testing | ||
| config = ModelArgs(n_layers=10, max_batch_size=2, max_seq_len=1024) |
Collaborator
There was a problem hiding this comment.
Use official config for profiling
| export MAGI_ENABLE_FX_GRAPH_VIZ=${MAGI_ENABLE_FX_GRAPH_VIZ:-false} | ||
|
|
||
| $NSYS_CMD torchrun $DISTRIBUTED_ARGS $SCRIPT_DIR/train.py \ | ||
| $NSYS_ARGS |
Collaborator
There was a problem hiding this comment.
No NSYS_ARGS provided? Check again and try to simplify this script~
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🗂️ PR Category
📝 Description
Summary
Add end-to-end Llama3 training example (example/training/) with FSDP support, a distributed training script, and an Nsys profiling launch script.
Fix a cache save bug where aot_autograd artifacts were empty, causing compiled graphs to fail to persist correctly.
Changes
example/training/llama3.py — Llama3 model definition adapted to use magi_compile
example/training/train.py — distributed training loop with FSDP and NVTX profiling hooks
example/training/train.sh — torchrun launcher with optional Nsys profiling
magi_compiler/magi_backend/piecewise_compiler.py — workaround for empty aot_autograd artifacts on cache save
magi_compiler/utils/nvtx.py — profiler for iteration