Skip to content

Commit b0de88a

Browse files
committed
Add Alif Ethos-U learning path
1 parent fdefda9 commit b0de88a

17 files changed

Lines changed: 3580 additions & 0 deletions
Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
---
2+
title: Overview
3+
weight: 1
4+
layout: learningpathall
5+
---
6+
7+
This Learning Path covers deploying PyTorch neural network models on the **Alif Ensemble E8 DevKit** using ExecuTorch with Ethos-U55 NPU acceleration.
8+
9+
## What you'll build
10+
11+
A complete pipeline to:
12+
1. Export PyTorch models to ExecuTorch format (`.pte`)
13+
2. Optimize models for Ethos-U55 NPU using Vela compiler
14+
3. Build the ExecuTorch runtime for Cortex-M55
15+
4. Deploy and run inference on Alif E8 hardware
16+
17+
## Hardware Overview - Alif Ensemble E8 Series
18+
19+
Selecting the best hardware for machine learning (ML) models depends on effective tools. You can visualize Arm Ethos-U performance early in the development cycle using Alif's [Ensemble E8 Series Development Kit](https://alifsemi.com/ensemble-e8-series/).
20+
21+
<center>
22+
<iframe src='https://www.youtube.com/embed/jAvi2xKxkE4?si=Wd-E1PUCM4Y49uXM' allowfullscreen frameborder=0 width="800" height="400"></iframe>
23+
24+
*Alif Ensemble Series Overview*
25+
</center>
26+
27+
![Alif Ensemble E8 Board SoC Highlighted alt-text#center](./alif-ensemble-e8-board-soc-highlighted.jpg "Arm Ethos-U NPU location")
28+
29+
### Alif Ensemble E8 DevKit (DK-E8-Alpha)
30+
31+
| Component | Specification |
32+
|-----------|---------------|
33+
| **CPU** | Arm Cortex-M55 (HE core @ 160MHz) |
34+
| **NPU** | Arm Ethos-U55 (128 MAC configuration) |
35+
| **ITCM** | 256 KB (fast instruction memory) |
36+
| **DTCM** | 256 KB (fast data memory) |
37+
| **SRAM0** | 4 MB (general purpose) |
38+
| **SRAM1** | 4 MB (NPU accessible) |
39+
| **MRAM** | 2-5.5 MB (non-volatile code storage) |
40+
41+
{{% notice Note %}}
42+
The DK-E8-Alpha DevKit may use E7 silicon (AE722F80F55D5AS) which has 5.5MB MRAM and 13.5MB SRAM total. SETOOLS will auto-detect your actual chip variant. Always build for the detected silicon type.
43+
{{% /notice %}}
44+
45+
### Alif's Ensemble E8 Processor Decoded
46+
47+
![Alif's Ensemble E8 Processor alt-text#center](./ensemble-application-processor.png "Alif's Ensemble E8 Processor")
48+
49+
**Alif's Processor Labeling Convention:**
50+
|Line|Meaning|
51+
|----|-------|
52+
|AE101F|• AE – Ensemble E-series family<br>• 101F – Specific device SKU within the E8 series (quad-core Fusion processors: x2 Cortex-A32 + x2 Cortex-M55 + Ethos-U85 + x2 Ethos-U55)|
53+
|4Q|• Usually denotes package type and temperature grade|
54+
|71542LH|• Likely a lot code / internal wafer lot number used for traceability|
55+
|B4ADKA 2508|• B4ADKA - Assembly site & line identifier<br>• 2508 - year + week of manufacture (Week 08 of 2025)|
56+
|UASA37002.000.03|• UASA37002 - Identifies the silicon mask set<br>• .000.03 - means revision 3 of that mask|
57+
58+
## Software Stack
59+
60+
```
61+
┌────────────────────────────────────────────────────┐
62+
│ Your Application │
63+
├────────────────────────────────────────────────────┤
64+
│ ExecuTorch Runtime │
65+
│ ├── Program Loader │
66+
│ ├── Executor │
67+
│ └── Memory Manager │
68+
├────────────────────────────────────────────────────┤
69+
│ Delegates & Kernels │
70+
│ ├── Ethos-U Delegate (NPU acceleration) │
71+
│ ├── Cortex-M Kernels (CPU fallback) │
72+
│ └── Quantized Kernels (INT8 ops) │
73+
├────────────────────────────────────────────────────┤
74+
│ Alif SDK / CMSIS │
75+
│ ├── Device HAL │
76+
│ ├── UART Driver │
77+
│ └── GPIO Driver │
78+
├────────────────────────────────────────────────────┤
79+
│ Hardware: Cortex-M55 + Ethos-U55 │
80+
└────────────────────────────────────────────────────┘
81+
```
82+
83+
## Prerequisites
84+
85+
### Required hardware
86+
- Alif Ensemble E8 DevKit (DK-E8-Alpha)
87+
- USB-C cable (connect to **PRG USB** port)
88+
- Optional: USB-to-Serial adapter for UART debugging
89+
90+
### Required software
91+
92+
| Tool | Version | Purpose |
93+
|------|---------|---------|
94+
| Docker | Latest | Development container |
95+
| Arm GCC | 13.x or 14.x | Cross-compiler |
96+
| CMSIS-Toolbox | 2.6.0+ | Build system |
97+
| J-Link | 7.x+ | Programming/debugging |
98+
| SETOOLS | 1.107.x | Alif flashing tools |
99+
| Python | 3.10+ | ExecuTorch export |
100+
101+
## Key concepts
102+
103+
### Model quantization
104+
105+
ExecuTorch uses **INT8 quantization** for Ethos-U55:
106+
- Reduced memory footprint (4x smaller than FP32)
107+
- Faster inference on NPU
108+
- Minimal accuracy loss with proper calibration
109+
110+
### Memory layout
111+
112+
{{% notice Warning %}}
113+
Large tensors and model weights must be placed in **SRAM0** (4MB), not DTCM (256KB). Failing to do this causes linker overflow errors.
114+
{{% /notice %}}
115+
116+
Place large buffers in SRAM0 using the section attribute:
117+
118+
```c
119+
static uint8_t __attribute__((section(".bss.noinit"), aligned(16)))
120+
tensor_arena[512 * 1024]; // 512KB in SRAM0
121+
```
122+
123+
### SRAM0 power management
124+
125+
{{% notice Important %}}
126+
SRAM0 must be powered on before use via Secure Enclave services. Accessing unpowered SRAM causes HardFault crashes.
127+
{{% /notice %}}
128+
129+
```c
130+
#include "se_services_port.h"
131+
#include "services_lib_api.h"
132+
133+
uint32_t mem_error = 0;
134+
SERVICES_power_memory_req(
135+
se_services_s_handle,
136+
POWER_MEM_SRAM_0_ENABLE,
137+
&mem_error);
138+
```
139+
140+
## Example: MNIST digit classification
141+
142+
The included MNIST example demonstrates:
143+
- Loading a quantized CNN model (~100KB)
144+
- INT8 input preprocessing (28x28 grayscale image)
145+
- NPU-accelerated inference (~10-20ms)
146+
- Output processing (argmax of 10 classes)
147+
148+
```
149+
Input: 28x28 grayscale image (784 bytes INT8)
150+
151+
152+
┌─────────────────────────────────────────┐
153+
│ Conv2d(1→16) → ReLU → MaxPool │ NPU
154+
│ Conv2d(16→32) → ReLU → MaxPool │ accelerated
155+
│ Linear(1568→64) → ReLU │
156+
│ Linear(64→10) │
157+
└─────────────────────────────────────────┘
158+
159+
160+
Output: 10 class scores (10 bytes INT8)
161+
```
162+
163+
## Benefits and applications
164+
165+
NPUs like Arm's [Ethos-U55](https://www.arm.com/products/silicon-ip-cpu/ethos/ethos-u55) and [Ethos-U85](https://www.arm.com/products/silicon-ip-cpu/ethos/ethos-u85) provide significant advantages for embedded ML applications:
166+
167+
- **Hardware acceleration**: 10-50x faster inference compared to CPU-only execution
168+
- **Power efficiency**: Lower power consumption per inference operation
169+
- **Real-time capable**: Suitable for latency-sensitive applications
170+
- **On-device processing**: No cloud dependency, enhanced privacy
171+
- **Visual feedback**: RGB LED indicators provide immediate status confirmation
172+
- **Debug capabilities**: UART and RTT output for detailed performance analysis
173+
174+
The Alif [Ensemble E8 Series Development Kit](https://alifsemi.com/ensemble-e8-series/) integrates the Ethos-U55 NPU with Cortex-M55 and Cortex-A32 cores, making it ideal for prototyping TinyML applications that require both ML acceleration and general-purpose processing.

0 commit comments

Comments
 (0)