|
| 1 | +--- |
| 2 | +title: Overview |
| 3 | +weight: 1 |
| 4 | +layout: learningpathall |
| 5 | +--- |
| 6 | + |
| 7 | +This Learning Path covers deploying PyTorch neural network models on the **Alif Ensemble E8 DevKit** using ExecuTorch with Ethos-U55 NPU acceleration. |
| 8 | + |
| 9 | +## What you'll build |
| 10 | + |
| 11 | +A complete pipeline to: |
| 12 | +1. Export PyTorch models to ExecuTorch format (`.pte`) |
| 13 | +2. Optimize models for Ethos-U55 NPU using Vela compiler |
| 14 | +3. Build the ExecuTorch runtime for Cortex-M55 |
| 15 | +4. Deploy and run inference on Alif E8 hardware |
| 16 | + |
| 17 | +## Hardware Overview - Alif Ensemble E8 Series |
| 18 | + |
| 19 | +Selecting the best hardware for machine learning (ML) models depends on effective tools. You can visualize Arm Ethos-U performance early in the development cycle using Alif's [Ensemble E8 Series Development Kit](https://alifsemi.com/ensemble-e8-series/). |
| 20 | + |
| 21 | +<center> |
| 22 | +<iframe src='https://www.youtube.com/embed/jAvi2xKxkE4?si=Wd-E1PUCM4Y49uXM' allowfullscreen frameborder=0 width="800" height="400"></iframe> |
| 23 | + |
| 24 | +*Alif Ensemble Series Overview* |
| 25 | +</center> |
| 26 | + |
| 27 | + |
| 28 | + |
| 29 | +### Alif Ensemble E8 DevKit (DK-E8-Alpha) |
| 30 | + |
| 31 | +| Component | Specification | |
| 32 | +|-----------|---------------| |
| 33 | +| **CPU** | Arm Cortex-M55 (HE core @ 160MHz) | |
| 34 | +| **NPU** | Arm Ethos-U55 (128 MAC configuration) | |
| 35 | +| **ITCM** | 256 KB (fast instruction memory) | |
| 36 | +| **DTCM** | 256 KB (fast data memory) | |
| 37 | +| **SRAM0** | 4 MB (general purpose) | |
| 38 | +| **SRAM1** | 4 MB (NPU accessible) | |
| 39 | +| **MRAM** | 2-5.5 MB (non-volatile code storage) | |
| 40 | + |
| 41 | +{{% notice Note %}} |
| 42 | +The DK-E8-Alpha DevKit may use E7 silicon (AE722F80F55D5AS) which has 5.5MB MRAM and 13.5MB SRAM total. SETOOLS will auto-detect your actual chip variant. Always build for the detected silicon type. |
| 43 | +{{% /notice %}} |
| 44 | + |
| 45 | +### Alif's Ensemble E8 Processor Decoded |
| 46 | + |
| 47 | + |
| 48 | + |
| 49 | +**Alif's Processor Labeling Convention:** |
| 50 | +|Line|Meaning| |
| 51 | +|----|-------| |
| 52 | +|AE101F|• AE – Ensemble E-series family<br>• 101F – Specific device SKU within the E8 series (quad-core Fusion processors: x2 Cortex-A32 + x2 Cortex-M55 + Ethos-U85 + x2 Ethos-U55)| |
| 53 | +|4Q|• Usually denotes package type and temperature grade| |
| 54 | +|71542LH|• Likely a lot code / internal wafer lot number used for traceability| |
| 55 | +|B4ADKA 2508|• B4ADKA - Assembly site & line identifier<br>• 2508 - year + week of manufacture (Week 08 of 2025)| |
| 56 | +|UASA37002.000.03|• UASA37002 - Identifies the silicon mask set<br>• .000.03 - means revision 3 of that mask| |
| 57 | + |
| 58 | +## Software Stack |
| 59 | + |
| 60 | +``` |
| 61 | +┌────────────────────────────────────────────────────┐ |
| 62 | +│ Your Application │ |
| 63 | +├────────────────────────────────────────────────────┤ |
| 64 | +│ ExecuTorch Runtime │ |
| 65 | +│ ├── Program Loader │ |
| 66 | +│ ├── Executor │ |
| 67 | +│ └── Memory Manager │ |
| 68 | +├────────────────────────────────────────────────────┤ |
| 69 | +│ Delegates & Kernels │ |
| 70 | +│ ├── Ethos-U Delegate (NPU acceleration) │ |
| 71 | +│ ├── Cortex-M Kernels (CPU fallback) │ |
| 72 | +│ └── Quantized Kernels (INT8 ops) │ |
| 73 | +├────────────────────────────────────────────────────┤ |
| 74 | +│ Alif SDK / CMSIS │ |
| 75 | +│ ├── Device HAL │ |
| 76 | +│ ├── UART Driver │ |
| 77 | +│ └── GPIO Driver │ |
| 78 | +├────────────────────────────────────────────────────┤ |
| 79 | +│ Hardware: Cortex-M55 + Ethos-U55 │ |
| 80 | +└────────────────────────────────────────────────────┘ |
| 81 | +``` |
| 82 | + |
| 83 | +## Prerequisites |
| 84 | + |
| 85 | +### Required hardware |
| 86 | +- Alif Ensemble E8 DevKit (DK-E8-Alpha) |
| 87 | +- USB-C cable (connect to **PRG USB** port) |
| 88 | +- Optional: USB-to-Serial adapter for UART debugging |
| 89 | + |
| 90 | +### Required software |
| 91 | + |
| 92 | +| Tool | Version | Purpose | |
| 93 | +|------|---------|---------| |
| 94 | +| Docker | Latest | Development container | |
| 95 | +| Arm GCC | 13.x or 14.x | Cross-compiler | |
| 96 | +| CMSIS-Toolbox | 2.6.0+ | Build system | |
| 97 | +| J-Link | 7.x+ | Programming/debugging | |
| 98 | +| SETOOLS | 1.107.x | Alif flashing tools | |
| 99 | +| Python | 3.10+ | ExecuTorch export | |
| 100 | + |
| 101 | +## Key concepts |
| 102 | + |
| 103 | +### Model quantization |
| 104 | + |
| 105 | +ExecuTorch uses **INT8 quantization** for Ethos-U55: |
| 106 | +- Reduced memory footprint (4x smaller than FP32) |
| 107 | +- Faster inference on NPU |
| 108 | +- Minimal accuracy loss with proper calibration |
| 109 | + |
| 110 | +### Memory layout |
| 111 | + |
| 112 | +{{% notice Warning %}} |
| 113 | +Large tensors and model weights must be placed in **SRAM0** (4MB), not DTCM (256KB). Failing to do this causes linker overflow errors. |
| 114 | +{{% /notice %}} |
| 115 | + |
| 116 | +Place large buffers in SRAM0 using the section attribute: |
| 117 | + |
| 118 | +```c |
| 119 | +static uint8_t __attribute__((section(".bss.noinit"), aligned(16))) |
| 120 | + tensor_arena[512 * 1024]; // 512KB in SRAM0 |
| 121 | +``` |
| 122 | +
|
| 123 | +### SRAM0 power management |
| 124 | +
|
| 125 | +{{% notice Important %}} |
| 126 | +SRAM0 must be powered on before use via Secure Enclave services. Accessing unpowered SRAM causes HardFault crashes. |
| 127 | +{{% /notice %}} |
| 128 | +
|
| 129 | +```c |
| 130 | +#include "se_services_port.h" |
| 131 | +#include "services_lib_api.h" |
| 132 | +
|
| 133 | +uint32_t mem_error = 0; |
| 134 | +SERVICES_power_memory_req( |
| 135 | + se_services_s_handle, |
| 136 | + POWER_MEM_SRAM_0_ENABLE, |
| 137 | + &mem_error); |
| 138 | +``` |
| 139 | + |
| 140 | +## Example: MNIST digit classification |
| 141 | + |
| 142 | +The included MNIST example demonstrates: |
| 143 | +- Loading a quantized CNN model (~100KB) |
| 144 | +- INT8 input preprocessing (28x28 grayscale image) |
| 145 | +- NPU-accelerated inference (~10-20ms) |
| 146 | +- Output processing (argmax of 10 classes) |
| 147 | + |
| 148 | +``` |
| 149 | +Input: 28x28 grayscale image (784 bytes INT8) |
| 150 | + │ |
| 151 | + ▼ |
| 152 | +┌─────────────────────────────────────────┐ |
| 153 | +│ Conv2d(1→16) → ReLU → MaxPool │ NPU |
| 154 | +│ Conv2d(16→32) → ReLU → MaxPool │ accelerated |
| 155 | +│ Linear(1568→64) → ReLU │ |
| 156 | +│ Linear(64→10) │ |
| 157 | +└─────────────────────────────────────────┘ |
| 158 | + │ |
| 159 | + ▼ |
| 160 | +Output: 10 class scores (10 bytes INT8) |
| 161 | +``` |
| 162 | + |
| 163 | +## Benefits and applications |
| 164 | + |
| 165 | +NPUs like Arm's [Ethos-U55](https://www.arm.com/products/silicon-ip-cpu/ethos/ethos-u55) and [Ethos-U85](https://www.arm.com/products/silicon-ip-cpu/ethos/ethos-u85) provide significant advantages for embedded ML applications: |
| 166 | + |
| 167 | +- **Hardware acceleration**: 10-50x faster inference compared to CPU-only execution |
| 168 | +- **Power efficiency**: Lower power consumption per inference operation |
| 169 | +- **Real-time capable**: Suitable for latency-sensitive applications |
| 170 | +- **On-device processing**: No cloud dependency, enhanced privacy |
| 171 | +- **Visual feedback**: RGB LED indicators provide immediate status confirmation |
| 172 | +- **Debug capabilities**: UART and RTT output for detailed performance analysis |
| 173 | + |
| 174 | +The Alif [Ensemble E8 Series Development Kit](https://alifsemi.com/ensemble-e8-series/) integrates the Ethos-U55 NPU with Cortex-M55 and Cortex-A32 cores, making it ideal for prototyping TinyML applications that require both ML acceleration and general-purpose processing. |
0 commit comments