These scripts are a profiling kit that lives alongside the learning path.
Run everything from the executorch_sme2_kit/ directory:
cd executorch_sme2_kit/The scripts will create (or reuse):
.venv/Python virtual environmentexecutorch/ExecuTorch checkout (tracksmain)executorch/cmake-out/builtexecutor_runnerbinaries (runners stay with their ExecuTorch version for traceability)out_<model>/artifacts/exported.pteand.etrecordfilesmodel_profiling/configs/JSON pipeline configsout_<model>/runs/results, logs, traces, and manifests
Note: Replace <model> with your actual model name (e.g., out_mobilenet/, out_edgetam/).
- Setup: ~20–30 min
- Build: ~15–25 min
- Export + pipeline + analyze: ~10 min
- Total first success: ~60–75 min
After setup + build, validate the end-to-end flow with a tiny model:
source .venv/bin/activate
python model_profiling/scripts/run_quick_test.pyThis runs the full workflow (validate → build → export toy model → pipeline → validate results) in ~10–15 minutes and confirms your setup works.
# 1. Setup (one-time)
bash model_profiling/scripts/setup_repo.sh
# 2. Build runners (one-time, or when CMake configs change)
bash model_profiling/scripts/build_runners.sh
# 3. Activate venv
source .venv/bin/activate
# 4. Export model
python model_profiling/export/export_model.py \
--model mobilenet_v3_small \
--dtype fp16 \
--outdir out_mobilenet/artifacts/
# 5. Create config
cp model_profiling/configs/templates/mac_template.json \
model_profiling/configs/mac_mobilenet.json
# Edit config: set "model" to "out_mobilenet/artifacts/mobilenet_v3_small_xnnpack_fp16.pte"
# Edit config: set "output_root" to "out_mobilenet/runs/mac"
# 6. Run pipeline (automatically runs analysis and generates CSV files)
python3 model_profiling/scripts/mac_pipeline.py \
--config model_profiling/configs/mac_mobilenet.json
# 7. Generate report (base report with latency + category breakdown)
python3 model_profiling/scripts/generate_report.py \
--run-dir out_mobilenet/runs/mac
# 7a. Operator-specific bottleneck analysis
# Identifies top operators by E2E weight, portable vs delegated operators
python3 model_profiling/tools/analyze_etdump_csv.py \
--timeline-csv out_mobilenet/runs/mac/mac_sme2_on/*_all_runs_timeline.csv \
--compare out_mobilenet/runs/mac/mac_sme2_off/*_all_runs_timeline.csv \
--name1 "SME2-Off" \
--name2 "SME2-On" \
--output-dir out_mobilenet/runs/mac/ \
--verbose
# 8. Validate results (optional)
python3 model_profiling/scripts/validate_results.py \
--results out_mobilenet/runs/macSee pipeline_commands.md for detailed command reference.
-
On macOS Apple Silicon, you can learn the workflow and get operator-level breakdowns. SME2 acceleration requires Armv9 hardware: Apple M4 Macs and Armv9 Android devices will show SME2 deltas; earlier Apple Silicon will not.
-
To observe SME2 deltas and
__neonsme2kernel paths, use an SME2-capable Armv9 device (Android or Apple M4):
export ANDROID_NDK=/path/to/android-ndk
bash model_profiling/scripts/build_runners.sh
cp model_profiling/configs/templates/android_template.json \
model_profiling/configs/android.json
# Edit config: set "model" to your .pte path
# Edit config: set "output_root" to "out_<model>/runs/android"
python3 model_profiling/scripts/android_pipeline.py \
--config model_profiling/configs/android.json
# Pipeline automatically runs analysis - no separate step needed
# Generate base report:
python3 model_profiling/scripts/generate_report.py \
--run-dir out_<model>/runs/android
# Operator-specific bottleneck analysis
python3 model_profiling/tools/analyze_etdump_csv.py \
--timeline-csv out_<model>/runs/android/android_sme2_on/*_all_runs_timeline.csv \
--compare out_<model>/runs/android/android_sme2_off/*_all_runs_timeline.csv \
--name1 "SME2-Off" \
--name2 "SME2-On" \
--output-dir out_<model>/runs/android/ \
--verbose- Model-agnostic pipeline: Once you have a
.ptefile, the same pipeline commands work for any model - Config-driven experiments: JSON configs define what to run, scripts execute them
- Output organization: Results go under
out_<model>/runs/<platform>/for clear organization - Version traceability: Runners stay in
executorch/cmake-out/to track ExecuTorch version
- Command reference: See
pipeline_commands.mdfor detailed workflow - Model onboarding: See learning path documentation for adding new models
- Report generation: See agent skill
agent_skill_ml_profiling/07_report_generation.mdfor workflow including operator-specific bottleneck analysis, portable vs delegated operator identification, and kernel-level insights