ANE (Apple Neural Engine) profiling toolkit for Core ML models. Two complementary tools:
anemll-profile— runtime-measured profiler: per-op cost estimates, device placement, actual prediction throughput, graph interruptions, bandwidth and memory/compute bound classification.ane-costplan— compile-time cost analyzer: per-op cost weights from Apple'sMLComputePlanAPI (the ANE compiler's internal cost model), with device support/preference per operation.
Use both together: anemll-profile shows what actually runs slow,
ane-costplan shows what the compiler thinks is expensive. Disagreements
between the two reveal optimization opportunities.
For a compact agent-oriented workflow, see AGENTS.md.
This fork adds
ane-costplan— a compile-time cost analyzer using Apple'sMLComputePlanAPI.
Build both binaries from source:
git clone https://github.com/shipstuff/anemll-profile.git
cd anemll-profile
make # builds anemll-profile and ane-costplan
sudo make installVerify:
anemll-profile --version
ane-costplan --help 2>&1 | head -3Build from source is the only way to get both tools together. (Upstream
homebrew tap installs anemll-profile only.)
anemll-profile model.mlmodelc
anemll-profile model.mlpackage
anemll-profile /path/to/model # auto-detects .mlmodelc or .mlpackage
anemll-profile -a model.mlmodelc # include GPU in device assignment
anemll-profile --interrupt-ms 150 model.mlmodelc # change heuristic ANE boundary cost
anemll-profile -j report.json model.mlmodelc # write structured JSON reportReports:
- Op-Type Runtime Breakdown — per-op-type estimated runtime, GFLOP/s, GB/s, memory/compute bound
- ANE Graph Interruptions — non-ANE islands that interrupt ANE execution, ranked by estimated latency tax
- Measured Prediction — actual wall-clock time, iter/s, weight bandwidth GB/s
- Top Expensive Ops — the 20 slowest operations
- Conv Detail — convolution ops with channel counts and work unit efficiency
- CPU/GPU Fallback — ops not on ANE with specific compiler reasons
ane-costplan model.mlmodelc
ane-costplan -j report.json model.mlmodelc # structured JSON outputReports:
- Per-op cost weights from
MLComputePlan.estimatedCost(of:)— Apple's own cost model that drives ANE placement and scheduling decisions - Device support/preference per op — whether ANE, GPU, or CPU is preferred and what's supported
- Top 10 by cost — highlights the compiler's view of the expensive ops
| Question | Tool |
|---|---|
| How fast does this model actually run? | anemll-profile |
| What bandwidth is each op hitting? | anemll-profile |
| Why did these ops fall back to CPU? | anemll-profile |
| What does the compiler think is expensive? | ane-costplan |
| Does placement match my expectations per-op? | ane-costplan |
| Will a graph change reduce compiler cost? | ane-costplan (cheap, no prediction needed) |
| End-to-end optimization: find mismatches | both |
ane-costplan is especially useful as a fast feedback loop — it returns in
seconds (no prediction run), so you can iterate on graph modifications and
see the cost model's response before paying the full compile + profile cycle.
- Loads
MLComputePlanto get per-op device assignment and cost weights - Captures Espresso
[CostModelFeature]logs via forked/usr/bin/log stream - Parses
Unsupported opcompiler messages for ANE fallback reasons - Analyzes ordered MLComputePlan ops to find ANE graph interruption islands, including CPU or GPU detours
- Applies a heuristic ANE boundary penalty (300 ms by default) to rank interruption hot spots
- Runs actual predictions with dummy inputs to measure real throughput
- Computes weight-only DRAM bandwidth (excludes L2-resident activations)
- Loads
MLComputePlanfor the compiled model - Walks every operation in the MIL program
- Queries
plan.estimatedCost(of: op)for the compiler's cost weight per op - Queries
plan.deviceUsage(for: op)for supported + preferred device per op - Sorts by cost weight descending to surface bottlenecks
- Optionally emits structured JSON for agent workflows
No prediction runs, no log capture — pure compile-time analysis. Runs in seconds
on any compiled .mlmodelc.
ane-costplanbinary — compile-time compiler cost analyzer (this fork)
What's new upstream (also in this fork):
- Agent guide (
AGENTS.md) — first-class workflow guide for agents - ANE graph interruptions — interruption analysis highlights non-ANE islands
- Latency-ranked interruption hot spots — rank by estimated switch penalty
- Function timeline view — compact accelerator timeline per function
- Configurable switch heuristic —
--interrupt-ms/--interrupt-boundary-ms - Structured JSON export —
-j/--json FILE
- macOS 14+ (Sonoma) — requires
MLComputePlanAPI - Xcode Command Line Tools (for
clangandswiftc) FoundationandCoreMLsystem frameworks included with macOS
Recommended for agent automation:
jqfor parsing-jJSON reports
MIT