You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+71Lines changed: 71 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -156,6 +156,77 @@ No external dependencies. Uses only system frameworks + private ANE APIs resolve
156
156
157
157
This project uses Apple's private, undocumented APIs (`_ANEClient`, `_ANECompiler`, `_ANEInMemoryModelDescriptor`). These APIs are not covered by any public stability guarantee and may change or break with any macOS update. This is independent research into Apple Neural Engine architecture, using APIs discovered through runtime introspection for research and educational purposes under fair use and interoperability provisions (see *Sega v. Accolade*, 1992; DMCA §1201(f)). No Apple proprietary code or binaries are included in this repository. This project is not affiliated with or endorsed by Apple Inc. Use at your own risk.
158
158
159
+
## Hardware Characterization: Apple M5 (2026)
160
+
161
+
The M5 (Apple 10 family) introduces specific ANE behavioral constraints that differ from earlier M-series chips. This section documents the key findings from reverse-engineering efforts.
162
+
163
+
### Key M5 ANE Constraints
164
+
165
+
| Constraint | Value | Notes |
166
+
|:---|:---|:---|
167
+
|**IOSurface Alignment**| 128 bytes | All input, output, and weight surfaces must be 128-byte aligned. Failure results in silent evaluation errors or compiler rejection. |
168
+
|**MIL Version**| program(1.5) | M5 is optimized for MIL 1.5. Use `ios17` or `ios18` function targets. For packed single-input formats, `program(1.3)` remains compatible. |
169
+
|**Max Dynamic Dimension**| 4096 × 4096 | Maximum dimension for dynamic weight tensors passed as inputs. |
170
+
|**Peak Throughput**|~1.0 TFLOPS | Pure ANE compute for 4096-dim matmul operations (measured: 0.86-1.53 TFLOPS). |
171
+
|**Update Latency**|~1.8 ms | CPU-to-IOSurface `memcpy` + ANE eval for weight updates at 4096 dims (measured: 1.7-1.9 ms). |
172
+
173
+
### Dynamic Weight Injection
174
+
175
+
On M5, the traditional approach of baking weights into the compiled model (via `BLOBFILE`) does not support runtime updates—the ANE snapshots weights into private memory at load time. The only viable path for real-time weight updates is:
176
+
177
+
**Treat weights as Input Tensors using the `matmul` operator.**
178
+
179
+
```objc
180
+
// MIL pattern for dynamic weights (M5 compatible)
- **Zero-copy weight swapping**: Update weights via `memcpy` into the input IOSurface
196
+
- **~100x faster updates** vs. recompile-and-load cycle (1.8ms vs 40-170ms)
197
+
- **On-device training**: Foundation for gradient descent on ANE
198
+
199
+
### M5 Performance Benchmarks
200
+
201
+
Run the benchmark suite:
202
+
203
+
```bash
204
+
cd training
205
+
make m5_performance_suite
206
+
./m5_performance_suite
207
+
```
208
+
209
+
Expected output on M5:
210
+
211
+
```
212
+
Max Dynamic Dimension: 4096 x 4096
213
+
Peak Throughput: 1.02 TFLOPS
214
+
Weight Update Latency: 1.78 ms
215
+
Max Weight Tensor Size: 67.11 MB
216
+
```
217
+
218
+
### Implementation Notes
219
+
220
+
1.**Alignment Helper**: Use `ane_create_surface()` which automatically applies 128-byte alignment—backward compatible with M3/M4.
221
+
222
+
2.**MIL Generation**: Use `mil_gen_dynamic_matmul()` from `ane_mil_gen.h` for M5-compatible dynamic weight layers.
223
+
224
+
3.**Weight Surface**: For large weights (>16MB), use `ane_create_weights_surface()` which adds `kIOSurfaceIsGlobal` for ANE hardware access.
225
+
226
+
4.**Matmul vs Conv**: For dynamic weights, `matmul` is more stable than `conv` on M5 due to flexible hardware tiling on the NCE (Neural Compute Engine).
0 commit comments