diff --git a/assets/contributors.csv b/assets/contributors.csv
index dedf06545e..d0ce9ef7a2 100644
--- a/assets/contributors.csv
+++ b/assets/contributors.csv
@@ -97,7 +97,7 @@ Waheed Brown,Arm,https://github.com/armwaheed,https://www.linkedin.com/in/waheed
Aryan Bhusari,Arm,,https://www.linkedin.com/in/aryanbhusari,,
Ken Zhang,Insyde,,kai-di-zhang-b1642a266,,
Ann Cheng,Arm,anncheng-arm,hello-ann,,
-Fidel Makatia Omusilibwa,,,,,
+Fidel Makatia Omusilibwa,,fidel-makatia,fidel-makatia-hsc-mieee,,
Ker Liu,,,,,
Rui Chang,,,,,
Alejandro Martinez Vicente,Arm,,,,
@@ -113,5 +113,6 @@ Steve Suzuki,Arm,,,,
Qixiang Xu,Arm,,,,
Phalani Paladugu,Arm,phalani-paladugu,phalani-paladugu,,
Richard Burton,Arm,Burton2000,,,
+Brendan Long,Arm,bccbrendan,https://www.linkedin.com/in/brendan-long-5817924/,,
Asier Arranz,NVIDIA,,asierarranz,,asierarranz.com
-Prince Agyeman,Arm,,,,
\ No newline at end of file
+Prince Agyeman,Arm,,,,
diff --git a/content/install-guides/codex-cli.md b/content/install-guides/codex-cli.md
index 52eec06472..014e3337c4 100644
--- a/content/install-guides/codex-cli.md
+++ b/content/install-guides/codex-cli.md
@@ -233,6 +233,18 @@ The Arm MCP server is listed in the output. If the arm-mcp server indicates it's
You can also verify the tools are available by asking Codex to list the available Arm MCP tools.
+### Use Arm prompt files with the MCP Server
+
+The Arm MCP Server provides a rich set of tools and knowledge base, but to make the best use of it, you should pair it with Arm-specific prompt files. These prompt files supply task-oriented context, best practices, and structured workflows that guide the agent in using MCP tools more effectively across common Arm development tasks.
+
+#### Get the prompt files
+
+Browse the [agent integrations directory for Codex](https://github.com/arm/mcp/tree/main/agent-integrations/codex) to find prompt files for specific use cases:
+
+- **Arm migration** ([arm-migration.md](https://github.com/arm/mcp/blob/main/agent-integrations/codex/arm-migration.md)): Helps the agent systematically migrate applications from x86 to Arm, including dependency analysis, compatibility checks, and optimization recommendations.
+
+Each prompt file is a Markdown configuration that you can reference in your Codex CLI sessions to enable more targeted, task-specific assistance.
+
If you're facing issues or have questions, reach out to mcpserver@arm.com.
-You're now ready to use Codex CLI with Arm-specific development assistance.
+You're now ready to use Codex CLI with the Arm MCP server for Arm-specific development assistance.
diff --git a/content/install-guides/gemini.md b/content/install-guides/gemini.md
index 6757831b8f..094c259530 100644
--- a/content/install-guides/gemini.md
+++ b/content/install-guides/gemini.md
@@ -407,6 +407,18 @@ Configured MCP servers:
- sysreport_instructions
```
+### Use Arm prompt files with the MCP Server
+
+The Arm MCP Server provides a rich set of tools and knowledge base, but to make the best use of it, you should pair it with Arm-specific prompt files. These prompt files supply task-oriented context, best practices, and structured workflows that guide the agent in using MCP tools more effectively across common Arm development tasks.
+
+#### Get the prompt files
+
+Browse the [agent integrations directory](https://github.com/arm/mcp/tree/main/agent-integrations/gemini) to find prompt files for specific use cases:
+
+- **Arm migration** ([arm-migration.toml](https://github.com/arm/mcp/blob/main/agent-integrations/gemini/arm-migration.toml)): Helps the agent systematically migrate applications from x86 to Arm, including dependency analysis, compatibility checks, and optimization recommendations.
+
+Each prompt file is a TOML configuration that you can reference in your Gemini CLI sessions to enable more targeted, task-specific assistance.
+
If you're facing issues or have questions, reach out to mcpserver@arm.com.
-You're now ready to use Gemini CLI with the Arm MCP server for Arm-specific development assistance.
\ No newline at end of file
+You're now ready to use Gemini CLI with the Arm MCP server for Arm-specific development assistance.
diff --git a/content/install-guides/github-copilot.md b/content/install-guides/github-copilot.md
index 299ca09f2a..682f293966 100644
--- a/content/install-guides/github-copilot.md
+++ b/content/install-guides/github-copilot.md
@@ -335,6 +335,20 @@ Example prompts that use the Arm MCP Server:
- `Search the Arm knowledge base for Neon intrinsics examples`
- `Find learning resources about migrating from x86 to Arm`
+## Use Arm prompt files with the MCP Server
+
+The Arm MCP Server provides a rich set of tools and knowledge base, but to make the best use of it, you should pair it with Arm-specific prompt files. These prompt files supply task-oriented context, best practices, and structured workflows that guide the agent in using MCP tools more effectively across common Arm development tasks.
+
+### Get the prompt files
+
+Browse the [agent integrations directory for Visual Studio Code](https://github.com/arm/mcp/tree/main/agent-integrations/vs-code) to find prompt files for specific use cases:
+
+- **Arm migration** ([arm-migration.prompt.md](https://github.com/arm/mcp/blob/main/agent-integrations/vs-code/arm-migration.prompt.md)): Helps the agent systematically migrate applications from x86 to Arm, including dependency analysis, compatibility checks, and optimization recommendations.
+
+Each prompt file is a Markdown configuration that you can reference in your GitHub Copilot sessions to enable more targeted, task-specific assistance.
+
+If you're facing issues or have questions, reach out to mcpserver@arm.com.
+
## Troubleshooting MCP Server connections
This section helps you resolve common issues when installing and using GitHub Copilot with the Arm MCP Server on Arm systems. If you encounter problems not covered here, contact [mcpserver@arm.com](mailto:mcpserver@arm.com) for support.
@@ -349,4 +363,4 @@ If the Arm MCP Server doesn't connect:
-You're now ready to use GitHub Copilot with the Arm MCP Server to enhance your Arm development workflow!
+You're now ready to use GitHub Copilot with the Arm MCP server for Arm-specific development assistance.
diff --git a/content/install-guides/kiro-cli.md b/content/install-guides/kiro-cli.md
index 99d8b1a7ba..9cc8ca8429 100644
--- a/content/install-guides/kiro-cli.md
+++ b/content/install-guides/kiro-cli.md
@@ -262,6 +262,18 @@ Use the `/tools` command to list the available tools:
You should see the Arm MCP server tools listed in the output. If the arm-mcp server says it's still loading, wait a moment and run `/tools` again.
-If you are facing issues or have questions, reach out to mcpserver@arm.com.
+### Use Arm prompt files with the MCP Server
-You're ready to use Kiro CLI.
+The Arm MCP Server provides a rich set of tools and knowledge base, but to make the best use of it, you should pair it with Arm-specific prompt files. These prompt files supply task-oriented context, best practices, and structured workflows that guide the agent in using MCP tools more effectively across common Arm development tasks.
+
+#### Get the prompt files
+
+Browse the [agent integrations directory for Kiro](https://github.com/arm/mcp/tree/main/agent-integrations/kiro) to find prompt files for specific use cases:
+
+- **Arm migration** ([arm-migration.md](https://github.com/arm/mcp/blob/main/agent-integrations/kiro/arm-migration.md)): Helps the agent systematically migrate applications from x86 to Arm, including dependency analysis, compatibility checks, and optimization recommendations.
+
+Each prompt file is a Markdown configuration that you can reference in your Kiro CLI sessions to enable more targeted, task-specific assistance.
+
+If you're facing issues or have questions, reach out to mcpserver@arm.com.
+
+You're now ready to use Kiro CLI with the Arm MCP server for Arm-specific development assistance.
diff --git a/content/install-guides/multipass.md b/content/install-guides/multipass.md
index 80c3d85df2..30073a8e45 100644
--- a/content/install-guides/multipass.md
+++ b/content/install-guides/multipass.md
@@ -24,7 +24,7 @@ ecosystem_dashboard: https://developer.arm.com/ecosystem-dashboard/linux?package
test_images:
- ubuntu:latest
-test_maintenance: true
+test_maintenance: false
### PAGE SETUP
weight: 1 # Defines page ordering. Must be 1 for first (or only) page.
diff --git a/content/install-guides/perf.md b/content/install-guides/perf.md
index b42d5d8fc9..a21a841f55 100644
--- a/content/install-guides/perf.md
+++ b/content/install-guides/perf.md
@@ -22,7 +22,7 @@ ecosystem_dashboard: https://developer.arm.com/ecosystem-dashboard/linux?package
test_images:
- ubuntu:latest
-test_maintenance: true
+test_maintenance: false
### PAGE SETUP
weight: 1 # Defines page ordering. Must be 1 for first (or only) page.
diff --git a/content/install-guides/pytorch.md b/content/install-guides/pytorch.md
index 8ec473b7fb..5b10ca1065 100644
--- a/content/install-guides/pytorch.md
+++ b/content/install-guides/pytorch.md
@@ -14,7 +14,7 @@ ecosystem_dashboard: https://developer.arm.com/ecosystem-dashboard/linux?package
test_images:
- ubuntu:latest
test_link: null
-test_maintenance: true
+test_maintenance: false
title: PyTorch
tool_install: true
weight: 1
diff --git a/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/_index.md b/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/_index.md
index 83256c0b0c..39919a39a6 100644
--- a/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/_index.md
+++ b/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/_index.md
@@ -1,5 +1,7 @@
---
-title: Run image classification on an Alif Ensemble E8 DevKit with ExecuTorch and Ethos-U85
+title: Run image classification on an Alif Ensemble E8 DevKit using ExecuTorch and Ethos-U85
+
+description: Deploy a MobileNetV2 image classification model to an Alif Ensemble E8 DevKit and run inference on the Ethos-U85 NPU.
draft: true
cascade:
@@ -7,25 +9,23 @@ cascade:
minutes_to_complete: 120
-who_is_this_for: This Learning Path is for embedded developers who want to deploy a neural network on an Arm Cortex-M55 microcontroller with an Ethos-U85 NPU. You will compile a MobileNetV2 model using ExecuTorch, embed it into bare-metal firmware, and run image classification on the Alif Ensemble E8 DevKit.
+who_is_this_for: This is an advanced topic for embedded developers who want to deploy a neural network model to an Arm Cortex-M55 microcontroller using ExecuTorch and an Ethos-U85 NPU.
learning_objectives:
- Compile a MobileNetV2 model for the Ethos-U85 NPU using ExecuTorch's ahead-of-time (AOT) compiler on an Arm-based cloud instance.
- Build ExecuTorch static libraries for bare-metal Cortex-M55 targets.
- - Configure CMSIS project files, memory layout, and linker scripts for a large ML workload on the Alif Ensemble E8.
- - Run real-time image classification inference on the Ethos-U85 NPU and verify results through SEGGER RTT.
+ - Configure CMSIS project files, memory layout, and linker scripts for an ML workload on the Alif Ensemble E8.
+ - Run real-time image classification inference on the Ethos-U85 NPU and verify results using SEGGER Real-Time Transfer (RTT).
prerequisites:
- - An Alif Ensemble E8 DevKit with a USB-C cable.
- - A SEGGER J-Link debug probe (the DevKit has one built in).
- - A development machine running macOS (Apple Silicon) or Linux.
- - (Optional) An AWS account or access to an Arm-based cloud instance (Graviton c7g.4xlarge recommended). You can also build ExecuTorch locally on an Arm-based machine, though the steps will differ.
- - Basic familiarity with C/C++ and embedded development concepts.
- - VS Code installed on your development machine.
+ - Experience with C/C++ and embedded development concepts.
+ - An [Alif Ensemble E8 DevKit](https://alifsemi.com/support/kits/ensemble-e8devkit/) with a USB-C cable.
+ - A SEGGER J-Link debug probe (included in the DevKit).
+ - A development machine running macOS on Apple Silicon with Visual Studio Code installed.
+ - An AWS account or access to an Arm-based cloud instance for native Arm compilation.
author: Gabriel Peterson
-### Tags
skilllevels: Advanced
subjects: ML
armips:
diff --git a/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/_review.md b/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/_review.md
deleted file mode 100644
index b9f9bfdebb..0000000000
--- a/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/_review.md
+++ /dev/null
@@ -1,40 +0,0 @@
----
-title: Knowledge check
-weight: 20
-
-layout: "learningpathall"
-
-review:
- - questions:
- question: "Which NPU peripheral on the Alif Ensemble E8 is used for the Ethos-U85?"
- explanation: "NPU_HG (High-Grade) at base address 0x49042000 is the Ethos-U85. NPU_HP is an Ethos-U55 at a different address. Using the wrong base address causes a product mismatch error."
- correct_answer: 2
- answers:
- - "NPU_HP"
- - "NPU_HG"
- - "NPU_HE"
- - questions:
- question: "Why do some ExecuTorch libraries need to be linked with --whole-archive?"
- explanation: "Libraries like libexecutorch and libcortex_m_ops_lib contain static registration constructors that register operators and PAL symbols at startup. Without --whole-archive, the linker sees these constructors as unused and discards them, causing missing operator errors at runtime."
- correct_answer: 3
- answers:
- - "Because they are too large for normal linking"
- - "Because the linker requires it for all C++ libraries"
- - "Because they contain static registration constructors that would otherwise be discarded"
- - questions:
- question: "What does the GOT (Global Offset Table) fix in the linker script address?"
- explanation: "The precompiled ExecuTorch libraries use position-independent code (PIC) that relies on the GOT for indirect function calls and vtable lookups. If the GOT isn't copied from flash to RAM at startup, these lookups resolve to address zero, causing BusFaults."
- correct_answer: 1
- answers:
- - "BusFaults caused by uninitialized indirect function call tables"
- - "Stack overflow errors during inference"
- - "Incorrect NPU command stream alignment"
- - questions:
- question: "What input data type does the MobileNetV2 model expect?"
- explanation: "The model's first operator is cortex_m::quantize_per_tensor, which converts float32 input to int8 for the NPU. The image is stored as int8 in the header to save flash space, but the application code converts it to float32 before passing it to the model."
- correct_answer: 2
- answers:
- - "int8"
- - "float32"
- - "uint8"
----
diff --git a/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/application-code.md b/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/application-code.md
index 35d5c95590..84579d964a 100644
--- a/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/application-code.md
+++ b/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/application-code.md
@@ -5,7 +5,7 @@ weight: 5
layout: "learningpathall"
---
-## Overview
+## What the application code does
The application code initializes the Ethos-U85 NPU, loads the MobileNetV2 model through ExecuTorch, runs inference on an embedded test image, and prints the classification result over SEGGER RTT.
@@ -16,7 +16,7 @@ Rather than building this code line by line, you download the complete `main.cpp
Download the working `main.cpp` from the workshop repository and place it in your project:
```bash
-cd ~/repo/alif/alif_vscode-template/mv2_runner
+cd ~/alif/alif_vscode-template/mv2_runner
curl -L -o main.cpp \
https://raw.githubusercontent.com/ArmDeveloperEcosystem/workshop-ethos-u/main/main.cpp
```
@@ -25,7 +25,7 @@ curl -L -o main.cpp \
If you prefer, you can clone the full repository with `git clone https://github.com/ArmDeveloperEcosystem/workshop-ethos-u.git` and copy `main.cpp` from there.
{{% /notice %}}
-The following sections explain what the code does. You don't need to modify anything; the downloaded file is ready to build.
+The following sections explain what the code does. The downloaded file is ready to build as-is.
## Fault handlers
@@ -124,19 +124,12 @@ The method allocator holds the loaded model graph. The temp allocator provides s
## The inference pipeline
-The `run_inference()` function follows a 10-step pipeline:
+The `run_inference()` function handles the full pipeline from model loading to output. It starts by initializing the ExecuTorch runtime and creating a zero-copy data loader that reads the compiled `.pte` model directly from flash memory. The program is then parsed and method metadata queried to determine how much planned memory the model needs.
-1. **Initialize** the ExecuTorch runtime.
-2. **Create a data loader** that reads the model directly from flash memory (zero-copy).
-3. **Load the program** (parse the `.pte` flatbuffer).
-4. **Query method metadata** to find out how many planned buffers the model needs and how large they are.
-5. **Set up planned memory** by carving sub-allocations from the SRAM1 pool.
-6. **Create the memory manager** that ties together the method, temp, and planned allocators.
-7. **Load the method** (the `forward` function of the model).
-8. **Prepare the input tensor**: convert the embedded int8 image data to float32 (the model's first operator is `quantize_per_tensor`, which expects float input).
-9. **Execute inference**: the quantize op runs on the CPU, the entire MobileNetV2 backbone runs as a single NPU command stream on the Ethos-U85, and the dequantize op runs back on the CPU.
-10. **Read the output**: find the argmax of the 1000-class output vector to get the predicted ImageNet class.
+Memory is set up next: sub-allocations are carved from the SRAM1 pool for planned buffers, and a memory manager ties together the method, temp, and planned allocators. Once memory is in place, the `forward` method is loaded.
-The NPU handles the bulk of the computation. The CPU-side overhead (ExecuTorch loading, input conversion, quantize/dequantize) is small compared to the NPU workload.
+Before inference runs, the input tensor is prepared by converting the embedded int8 image data to float32. This is needed because the model's first operator is `quantize_per_tensor`, which expects float input. Inference then runs in three stages: the quantize operator executes on the CPU, the entire MobileNetV2 backbone runs as a single NPU command stream on the Ethos-U85, and the dequantize operator runs back on the CPU. Finally, the argmax of the 1000-class output vector gives the predicted ImageNet class.
-You now have the application code in place. The next section configures the memory layout to accommodate the model and ExecuTorch runtime.
+The NPU handles the bulk of the computation. The CPU-side overhead of ExecuTorch loading, input conversion, and quantize/dequantize is small compared to the NPU workload.
+
+The application code is in place. The next section configures the memory layout to accommodate the model and ExecuTorch runtime.
diff --git a/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/aws-ec2-setup.md b/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/aws-ec2-setup.md
index c19ed3d126..73e314e440 100644
--- a/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/aws-ec2-setup.md
+++ b/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/aws-ec2-setup.md
@@ -21,7 +21,7 @@ Create an AWS EC2 instance with the following configuration:
The 16 cores speed up the ExecuTorch build significantly, and the 50 GB disk accommodates the repository, submodules, and build artifacts.
-Set up your SSH config so you can connect with a short alias (for example, `ssh alif`). This makes the `scp` commands later more convenient.
+SSH to the EC2 instance.
## Install system dependencies
@@ -74,7 +74,7 @@ pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
pip install pillow
```
-You use CPU-only PyTorch because this instance has no GPU. You only need PyTorch for model export and ahead-of-time compilation; the actual inference happens on the microcontroller.
+This instance has no GPU, so the install uses CPU-only PyTorch. PyTorch is only needed for model export and ahead-of-time compilation. The actual inference runs on the microcontroller.
Verify the installation:
@@ -85,8 +85,16 @@ print(torch.__version__, torchvision.__version__)
PY
```
+The output is similar to:
+
+```output
+2.10.0+cpu 0.25.0+cpu
+```
+
## Clone and install ExecuTorch
+Clone the ExecuTorch repository and pin it to a known-working commit:
+
```bash
mkdir -p ~/alif
cd ~/alif
@@ -111,26 +119,37 @@ python -m pip install -e . --no-build-isolation
ExecuTorch includes a setup script that downloads the Arm GNU toolchain, CMSIS, and the Vela compiler:
```bash
-cd ~/alif/executorch
./examples/arm/setup.sh --i-agree-to-the-contained-eula
```
-{{% notice Note %}}
-The setup script may fail at the `tosa_serialization_lib` build step due to a pybind11 version incompatibility. If you see an error containing `def_property family does not currently support keep_alive`, run the following commands to complete the setup manually:
+The script fails at the `tosa_serialization_lib` build step due to a pybind11 version incompatibility. This is a known issue. When you see an error containing `def_property family does not currently support keep_alive`, fix the dependency and complete the setup manually.
+
+First, install a compatible version of pybind11 and the required build tools:
```bash
pip install "pybind11<2.14" scikit-build-core setuptools_scm
+```
+
+Next, build and install the serialization library using those local packages:
+```bash
CMAKE_POLICY_VERSION_MINIMUM=3.5 pip install --no-build-isolation \
--no-dependencies \
~/alif/executorch/examples/arm/arm-scratch/tosa-tools/serialization
+```
+
+Then install the Ethos-U Vela compiler, which the setup script didn't reach due to the earlier failure:
+```bash
pip install --no-dependencies \
-r ~/alif/executorch/backends/arm/requirements-arm-ethos-u.txt
```
-The first command installs a pybind11 version that doesn't have the breaking change, along with the build tools that the serialization library needs. The second command builds and installs the serialization library using those local packages instead of downloading new ones. The third command installs the Ethos-U Vela compiler, which the setup script never reached due to the earlier failure.
-{{% /notice %}}
+Re-run the setup script to complete the remaining steps:
+
+```bash
+./examples/arm/setup.sh --i-agree-to-the-contained-eula
+```
Source the environment paths that the setup script generated:
@@ -147,12 +166,9 @@ pip install "torchao==0.15.0"
## Compile MobileNetV2 for Ethos-U85
-Source the setup paths and run the ahead-of-time compiler:
+Run the ahead-of-time compiler:
```bash
-cd ~/alif/executorch
-source examples/arm/arm-scratch/setup_path.sh
-
mkdir -p ~/alif/models
python -m examples.arm.aot_arm_compiler \
@@ -196,15 +212,13 @@ This step takes several minutes. When complete, list the output libraries:
find arm_test/cmake-out -type f -name "*.a" | sort
```
-You should see approximately 13 libraries, including `libexecutorch.a`, `libexecutorch_core.a`, `libexecutorch_delegate_ethos_u.a`, `libcortex_m_ops_lib.a`, and `libcmsis-nn.a`.
+The output lists approximately 13 libraries, including `libexecutorch.a`, `libexecutorch_core.a`, `libexecutorch_delegate_ethos_u.a`, `libcortex_m_ops_lib.a`, and `libcmsis-nn.a`.
## Package headers and libraries
Bundle the headers and libraries for transfer to your development machine:
```bash
-cd ~/alif/executorch
-
rm -rf ~/alif/et_bundle
mkdir -p ~/alif/et_bundle
cp -a arm_test/cmake-out/include ~/alif/et_bundle/
@@ -216,33 +230,33 @@ ls -lh ~/alif/et_bundle.tar.gz
## Transfer artifacts to your development machine
-Run these commands on your Mac or Linux development machine (not on the EC2 instance). The paths below use `~/repo/alif/` as the working directory; adjust these to match your own project location:
+Run these commands on your development machine, not on the EC2 instance. The paths below use `~/alif/` as the working directory; adjust these to match your own project location:
```bash
-mkdir -p ~/repo/alif/models
-mkdir -p ~/repo/alif/third_party/executorch/lib
+mkdir -p ~/alif/models
+mkdir -p ~/alif/third_party/executorch/lib
-scp alif:/home/ubuntu/alif/models/mv2_ethosu85_256.pte ~/repo/alif/models/
-scp alif:/home/ubuntu/alif/et_bundle.tar.gz ~/repo/alif/models/
+scp alif:/home/ubuntu/alif/models/mv2_ethosu85_256.pte ~/alif/models/
+scp alif:/home/ubuntu/alif/et_bundle.tar.gz ~/alif/models/
scp 'alif:/home/ubuntu/alif/executorch/arm_test/cmake-out/lib/*.a' \
- ~/repo/alif/third_party/executorch/lib/
+ ~/alif/third_party/executorch/lib/
```
Verify the transfer:
```bash
-ls -lh ~/repo/alif/models/mv2_ethosu85_256.pte
-ls ~/repo/alif/third_party/executorch/lib/*.a | wc -l
+ls -lh ~/alif/models/mv2_ethosu85_256.pte
+ls ~/alif/third_party/executorch/lib/*.a | wc -l
```
-You should see the 3.7 MB model file and 13 library files.
+The output shows the 3.7 MB model file and 13 library files.
## Convert the model to a C header
The firmware embeds the model as a byte array in flash memory. Use `xxd` to generate a C header:
```bash
-cd ~/repo/alif/models
+cd ~/alif/models
xxd -i mv2_ethosu85_256.pte > mv2_ethosu85_256_pte.h
```
@@ -266,9 +280,9 @@ The `aligned(16)` attribute is required because the Ethos-U85 needs the Vela com
On your development machine, extract the ExecuTorch headers into the VS Code template project:
```bash
-cd ~/repo/alif/alif_vscode-template
+cd ~/alif/alif_vscode-template
mkdir -p third_party/executorch
-tar -C third_party/executorch -xzf ~/repo/alif/models/et_bundle.tar.gz
+tar -C third_party/executorch -xzf ~/alif/models/et_bundle.tar.gz
```
Verify the headers are in place:
diff --git a/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/board-setup.md b/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/board-setup.md
index 213b0106aa..73d6c3728c 100644
--- a/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/board-setup.md
+++ b/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/board-setup.md
@@ -5,46 +5,65 @@ weight: 2
layout: "learningpathall"
---
-## Overview
+## Understand the Alif Ensemble E8 hardware
The Alif Ensemble E8 DevKit features a dual-core Arm Cortex-M55 processor and three neural processing units (NPUs): two Ethos-U55 and one Ethos-U85. In this Learning Path, you use the Cortex-M55 High-Performance (HP) core running at 400 MHz to orchestrate inference on the Ethos-U85 NPU.
-Before writing any ML code, you need to verify that your toolchain, debug probe, and flashing workflow all function correctly. This section walks you through hardware setup, software installation, and a sanity check build.
+Before writing any ML code, you need to verify that your toolchain, debug probe, and flashing workflow all function correctly. This section covers DevKit hardware setup, software installation, and a short validation build.
-## Connect the board
+The instructions assume macOS on Apple Silicon. If you use Arm Linux, links are provided for the equivalent Linux packages.
+
+## Connect the DevKit
+
+1. Unplug all USB cables from the DevKit before changing any jumpers.
+
+2. Verify that the jumpers are in their factory default positions, as shown in the Alif Ensemble E8 DevKit (DK-E8) User Guide on [alifsemi.com](https://alifsemi.com/support/kits/ensemble-e8devkit/).
+
+3. Connect a USB-C cable from your computer to the **PRG USB** port on the bottom edge of the DevKit.
-1. Unplug all USB cables from the board before changing any jumpers.
-2. Verify the jumpers are in their factory default positions, as shown in the Alif Ensemble E8 DevKit (DK-E8) User Guide, available on [alifsemi.com](https://alifsemi.com/support/kits/ensemble-e8devkit/).
-3. Connect a USB-C cable from your computer to the **PRG USB** port on the bottom edge of the board.
4. Confirm that a green LED illuminates near the E1 device and the UART switch (SW4).
-Leave **SW4** in its default position. This routes the on-board USB UART to **SEUART**, which the Alif Security Toolkit uses for programming.
+Leave **SW4** in its default position. This routes the on-board USB UART to **SEUART**, which the Alif Security Toolkit (SETOOLS) uses for programming.
{{% notice Note %}}
-Don't have a terminal application (PuTTY, minicom, screen) attached to SEUART while using the Security Toolkit. There is only one SEUART on the device, and two applications can't share the port.
+Close any terminal application that is connected to SEUART, such as PuTTY, minicom, or screen, before you use the Security Toolkit (SETOOLS). The DevKit exposes only one SEUART interface, so SETOOLS cannot access the port if another application is already using it.
{{% /notice %}}
+5. Create a project directory:
+
+```bash
+mkdir ~/alif
+```
+
## Install the Alif Security Toolkit
-The Security Toolkit (SETOOLS) programs firmware images onto the board.
+The Security Toolkit (SETOOLS) programs firmware images onto the DevKit.
+
+1. Download the macOS version of SETOOLS from the [Alif Ensemble E8 DevKit support page](https://alifsemi.com/support/kits/ensemble-e8devkit/).
+
+2. Extract it into `~/alif`. This creates the toolkit directory under a stable location, for example `~/alif/app-release-exec-macos/`.
+
+```bash
+cd ~/Downloads
+tar xvf APFW0003-app-release-exec-macos-SW_FW_1.107.00_DEV-4.tar -C ~/alif
+```
-1. Download SETOOLS v1.107.000 from the [Alif Ensemble E8 DevKit support page](https://alifsemi.com/support/kits/ensemble-e8devkit/).
-2. Extract it to a stable location, for example `~/alif/app-release-exec-macos/`.
3. Open a terminal in the SETOOLS directory and run:
```bash
+cd ~/alif/app-release-exec-macos
./updateSystemPackage -d
```
-On macOS, the system blocks this unsigned binary the first time. Open **System Settings > Privacy & Security**, scroll to the **Security** section, and select **Allow Anyway**. Then re-run the command.
+On macOS, the system blocks this unsigned binary the first time. After that happens, open **System Settings > Privacy & Security**, scroll to the **Security** section, and select **Allow Anyway**. Run the command again.
-When prompted for a serial port, enter the DevKit's USB modem port. It usually appears as `/dev/cu.usbmodemXXXXXXX`. If SETOOLS detects the Ensemble E8 and asks to set it as default, answer `y`.
+When prompted for a serial port, enter the DevKit's USB modem port. On macOS, it usually appears as `/dev/cu.usbmodemXXXXXXX`. If SETOOLS detects the Ensemble E8 and asks to set it as the default, answer `y`.
## Install SEGGER J-Link
SEGGER J-Link provides the debug connection for RTT (Real-Time Transfer) output, which you use later to view inference results.
-On macOS, install it with Homebrew:
+Install it with Homebrew:
```bash
brew install --cask segger-jlink
@@ -54,10 +73,10 @@ Alternatively, download it from the [SEGGER website](https://www.segger.com/down
## Set up VS Code and the Alif template
-1. Clone the Alif VS Code template repository:
+### Clone the Alif VS Code template repository
```bash
-cd ~/repo/alif
+cd ~/alif
git clone https://github.com/alifsemi/alif_vscode-template.git
cd alif_vscode-template
git checkout 8b1aa0b09eacf68a28850af00c11f0b5af03c100
@@ -68,38 +87,60 @@ git submodule update --init
The `git checkout` command pins the template to a known-working commit. This avoids breakage if the upstream template is updated.
{{% /notice %}}
-2. Open the `alif_vscode-template/` folder in VS Code.
-3. Install the recommended extensions when prompted:
+### Open the project in VS Code
+
+Open the project in VS Code from the `alif_vscode-template/` directory:
+
+```bash
+code . &
+```
+
+### Install the recommended extensions when prompted
+
+VS Code might prompt you to install the recommended extensions for this workspace. If it does, install the following:
+
- Arm CMSIS Solution
- Arm Tools Environment Manager
- Cortex-Debug
- Microsoft C/C++ Extension Pack
-4. When prompted, select **Always Allow** or **Allow for Selected Workspace**.
-5. Restart VS Code if prompted.
+
+When prompted, select **Always Allow** or **Allow for Selected Workspace**.
+
+The recommended VS Code extensions are listed in `.vscode/extensions.json`.
+
+If you don't get an automatic trigger to enable them, you can open the Extensions view and look for a "Workspace Recommendations" section to install or enable them manually.
+
+Restart VS Code if prompted.
## Install CMSIS packs
-Press **F1** in VS Code, type `Tasks: Run Task`, and select **First time pack installation**. Press **A** to accept all licenses when prompted.
+Open the Command Palette by pressing **Command+Shift+P** (or **Fn+F1**) in VS Code, type `Tasks: Run Task`, and select **First time pack installation**. Press **A** to accept all licenses when prompted.
+
+If you do not see the task in the list, open the Command Palette (**Command+Shift+P** or **Fn+F1**) and run the **Reload Window** command.
## Configure VS Code settings
-Press **F1**, select **Preferences: Open User Settings (JSON)**, and add the following entries (update the paths for your system):
+Press Fn+F1, select **Preferences: Open User Settings (JSON)**, and add the following entries.
+
+Update both paths for your system, including using your username:
```json
{
- "alif.setools.root": "/path/to/your/app-release-exec-macos",
+ "alif.setools.root": "/Users/username/alif/app-release-exec-macos",
"cortex-debug.JLinkGDBServerPath": "/Applications/SEGGER/JLink/JLinkGDBServerCLExe"
}
```
-## Sanity check: build and flash Blinky
+If you have existing settings, add only the two lines of text inside the existing braces.
+
+## Verify your toolchain: build and flash Blinky
Before moving on to ML code, verify your entire toolchain works end to end with the built-in Blinky example.
1. In VS Code, select the **CMSIS** icon in the left sidebar.
2. Select the gear icon, then set **Active Target** to **E8-HP** and **Active Project** to **blinky**.
3. Select the **Build** (hammer) icon.
-4. Press **F1**, select **Tasks: Run Task**, then select **Program with Security Toolkit (select COM port)**.
+4. Press **Fn+F1**, select **Tasks: Run Task**, then select **Program with Security Toolkit (select COM port)**.
5. Choose the DevKit's port when prompted.
-If the board's red LED blinks, your toolchain, SETOOLS, and board connection are all working correctly. You're ready to move on to model compilation.
+If the DevKit's red LED blinks, your toolchain, SETOOLS, and DevKit connection are all working correctly. You are ready to move on to model compilation.
diff --git a/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/build-flash-verify.md b/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/build-flash-verify.md
index 1cbb5221c9..48a85e77f6 100644
--- a/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/build-flash-verify.md
+++ b/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/build-flash-verify.md
@@ -10,7 +10,7 @@ layout: "learningpathall"
If you've built other projects (like Blinky), delete the cached build files. CMSIS Toolbox caches aggressively and won't pick up YAML configuration changes unless you clean first:
```bash
-cd ~/repo/alif/alif_vscode-template
+cd ~/alif/alif_vscode-template
rm -rf tmp/ out/
```
@@ -29,9 +29,9 @@ You can also clean from VS Code: press **F1** and select **CMSIS: Clean all out
If you prefer to build from the terminal, set the required environment variables first. The exact paths depend on where the Arm Tools Environment Manager installed the tools:
```bash
-export PATH="$HOME/.vcpkg/artifacts/2139c4c6/tools.open.cmsis.pack.cmsis.toolbox/2.12.0/bin:$HOME/.vcpkg/artifacts/2139c4c6/compilers.arm.arm.none.eabi.gcc/13.3.1/bin:$HOME/.vcpkg/artifacts/2139c4c6/tools.kitware.cmake/3.31.5/bin:$HOME/.vcpkg/artifacts/2139c4c6/tools.ninja.build.ninja/1.13.2:$PATH"
-export CMSIS_COMPILER_ROOT="$HOME/.vcpkg/artifacts/2139c4c6/tools.open.cmsis.pack.cmsis.toolbox/2.12.0/etc"
-export GCC_TOOLCHAIN_13_3_1="$HOME/.vcpkg/artifacts/2139c4c6/compilers.arm.arm.none.eabi.gcc/13.3.1/bin"
+export PATH="~/.vcpkg/artifacts/2139c4c6/tools.open.cmsis.pack.cmsis.toolbox/2.12.0/bin:~/.vcpkg/artifacts/2139c4c6/compilers.arm.arm.none.eabi.gcc/13.3.1/bin:~/.vcpkg/artifacts/2139c4c6/tools.kitware.cmake/3.31.5/bin:~/.vcpkg/artifacts/2139c4c6/tools.ninja.build.ninja/1.13.2:$PATH"
+export CMSIS_COMPILER_ROOT="~/.vcpkg/artifacts/2139c4c6/tools.open.cmsis.pack.cmsis.toolbox/2.12.0/etc"
+export GCC_TOOLCHAIN_13_3_1="~/.vcpkg/artifacts/2139c4c6/compilers.arm.arm.none.eabi.gcc/13.3.1/bin"
cbuild alif.csolution.yml --context mv2_runner.debug+E8-HP
```
@@ -103,18 +103,12 @@ Each line tells you something about what's happening:
If you don't see the expected output, check these common issues:
-**RTT Viewer shows nothing**: The code starts running as soon as it's flashed. If you connect RTT Viewer too late, you might miss the output. Press the board's reset button after connecting RTT Viewer.
+- **RTT Viewer shows nothing**: The code starts running as soon as it's flashed. If you connect RTT Viewer too late, you might miss the output. Press the board's reset button after connecting RTT Viewer.
+- **"ethosu_init failed"**: The NPU base address is wrong. Verify the code uses `NPU_HG_BASE` (0x49042000), not `NPU_HP_BASE`.
+- **BusFault at a low address**: The GOT sections are missing from the linker script. Verify that `*(.got)` and `*(.got.plt)` are in the `.data.at_dtcm` section.
+- **"Missing operator: cortex_m::quantize_per_tensor.out"**: `libcortex_m_ops_lib` is not in the `--whole-archive` block. Check `mv2_runner.cproject.yml`.
+- **"Memory allocation failed: 1505280B requested"**: The temp allocator pool is too small. The Ethos-U85 scratch buffer needs approximately 1.44 MB. Verify `TEMP_ALLOC_POOL_SIZE` is at least `1536 * 1024`.
+- **MRAM overflow linker error**: Verify `APP_MRAM_HP_SIZE` is set to `0x00580000` in `app_mem_regions.h`.
+- **"Vela bin ptr not aligned to 16 bytes"**: The model array in the header needs `__attribute__((aligned(16)))`.
-**"ethosu_init failed"**: The NPU base address is wrong. Verify the code uses `NPU_HG_BASE` (0x49042000), not `NPU_HP_BASE`.
-
-**BusFault at a low address**: The GOT sections are missing from the linker script. Verify that `*(.got)` and `*(.got.plt)` are in the `.data.at_dtcm` section.
-
-**"Missing operator: cortex_m::quantize_per_tensor.out"**: `libcortex_m_ops_lib` is not in the `--whole-archive` block. Check `mv2_runner.cproject.yml`.
-
-**"Memory allocation failed: 1505280B requested"**: The temp allocator pool is too small. The Ethos-U85 scratch buffer needs approximately 1.44 MB. Verify `TEMP_ALLOC_POOL_SIZE` is at least `1536 * 1024`.
-
-**MRAM overflow linker error**: Verify `APP_MRAM_HP_SIZE` is set to `0x00580000` in `app_mem_regions.h`.
-
-**"Vela bin ptr not aligned to 16 bytes"**: The model array in the header needs `__attribute__((aligned(16)))`.
-
-You've now built, flashed, and verified MobileNetV2 image classification running on the Ethos-U85 NPU through ExecuTorch. The model went from PyTorch, through the Vela compiler, into a `.pte` flatbuffer embedded in firmware, and produced a correct classification result on real hardware.
+This completes the setup and deployment of MobileNetV2 image classification on the Ethos-U85 NPU using ExecuTorch. The model went from PyTorch, through the Vela compiler, into a `.pte` flatbuffer embedded in firmware, and produced a correct classification result on real hardware.
diff --git a/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/create-project.md b/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/create-project.md
index da8b4708ae..94b7118c26 100644
--- a/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/create-project.md
+++ b/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/create-project.md
@@ -1,20 +1,20 @@
---
-title: Create the mv2_runner firmware project
+title: Create the image classification firmware project
weight: 4
layout: "learningpathall"
---
-## Overview
+## What you'll build in this section
-You now create a new CMSIS project called `mv2_runner` by duplicating the existing Blinky example and configuring it to include ExecuTorch libraries, the compiled model, and SEGGER RTT for debug output.
+In this section, you duplicate the existing Blinky example to create a new CMSIS project called `mv2_runner`, configured to include ExecuTorch libraries, the compiled model, and SEGGER RTT for debug output.
## Duplicate the Blinky project
Start by copying the working Blinky project as a template:
```bash
-cd ~/repo/alif/alif_vscode-template
+cd ~/alif/alif_vscode-template
cp -R blinky/ mv2_runner
```
@@ -44,12 +44,12 @@ Create an assets directory and copy the model header into the project:
```bash
mkdir -p mv2_runner/assets
-cp ~/repo/alif/models/mv2_ethosu85_256_pte.h mv2_runner/assets/
+cp ~/alif/models/mv2_ethosu85_256_pte.h mv2_runner/assets/
```
## Create the SEGGER RTT configuration
-Create a file called `mv2_runner/SEGGER_RTT_Conf.h` with the following content:
+RTT (Real-Time Transfer) works through the J-Link debug probe, reading and writing a memory buffer through the debug interface. It's faster than UART and doesn't require extra wiring. Create the configuration file `mv2_runner/SEGGER_RTT_Conf.h`:
```c
#ifndef SEGGER_RTT_CONF_H
@@ -68,14 +68,11 @@ Create a file called `mv2_runner/SEGGER_RTT_Conf.h` with the following content:
#endif
```
-RTT (Real-Time Transfer) works through the J-Link debug probe. It reads and writes a memory buffer through the debug interface, which is much faster than UART and doesn't need extra wiring.
-
## Install additional CMSIS packs
The project depends on two CMSIS packs that aren't installed by default. Install them from the terminal:
```bash
-cd ~/repo/alif/alif_vscode-template
cpackget add ARM::CMSIS-Compiler@2.1.0
cpackget add Keil::MDK-Middleware@8.2.0
```
@@ -196,7 +193,7 @@ project:
```
{{% notice Warning %}}
-You must update the `-L` path to match the absolute path to your `third_party/executorch/lib` directory. Each developer's path is different.
+You must update the `-L` path to match the absolute path to your `third_party/executorch/lib` directory. For example: `-L/Users/username/alif/third_party/executorch/lib`.
{{% /notice %}}
There are several important details in this configuration:
@@ -207,4 +204,4 @@ There are several important details in this configuration:
- **`C10_USING_CUSTOM_GENERATED_MACROS`** tells ExecuTorch to skip looking for a `cmake_macros.h` header that doesn't exist in the bare-metal build.
- The **c10 include path** provides the tensor type definitions that ExecuTorch's headers depend on.
-You now have the project structure ready. The next sections cover the application code, memory configuration, and image preparation before you build and flash.
+The project structure is ready. The next sections cover the application code, memory configuration, and image preparation before you build and flash.
diff --git a/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/image-preparation.md b/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/image-preparation.md
index b48f988de3..bafa0a55dd 100644
--- a/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/image-preparation.md
+++ b/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/image-preparation.md
@@ -12,7 +12,7 @@ MobileNetV2 expects a specific input format:
- **Resolution**: 224 x 224 pixels
- **Color**: RGB (3 channels)
- **Normalization**: ImageNet mean/std
-- **Layout**: NCHW (channels first)
+- **Layout**: NCHW at runtime (`1 x 3 x 224 x 224`). The generated header stores the image in CHW order.
- **Data type**: int8 (stored in the header, converted to float32 at runtime)
You use a Python script to convert any JPEG or PNG image into a C header that the firmware includes at compile time.
@@ -22,7 +22,7 @@ You use a Python script to convert any JPEG or PNG image into a C header that th
Create a lightweight virtual environment on your development machine:
```bash
-cd ~/repo/alif
+cd ~/alif
python3 -m venv venv_image_prep
source venv_image_prep/bin/activate
pip install --upgrade pip
@@ -34,10 +34,10 @@ pip install numpy pillow
Create a directory for the image and script:
```bash
-mkdir -p ~/repo/alif/image
+mkdir -p ~/alif/image
```
-Place a test image in this directory. You can use any JPEG or PNG image. For this Learning Path, a photo of my cat is used as the example:
+Place a test image in this directory. You can use any JPEG or PNG image. This Learning Path uses the provided `cat.jpg` image as the example:

@@ -59,7 +59,7 @@ mean = np.array([0.485, 0.456, 0.406]) * 255
std = np.array([0.229, 0.224, 0.225]) * 255
x = (x - mean) / std
-# NHWC -> NCHW
+# HWC -> CHW
x = np.transpose(x, (2, 0, 1))
# Quantize to int8
@@ -78,17 +78,14 @@ with open("input_image.h", "w") as f:
f.write("const unsigned int input_image_len = 3 * 224 * 224;\n")
```
-The script performs these transformations:
-1. Resizes the image to 224x224 pixels.
-2. Applies ImageNet normalization (subtracts the dataset mean, divides by standard deviation).
-3. Transposes from HWC (height, width, channels) to NCHW (batch, channels, height, width) layout.
-4. Quantizes to int8 range (-128 to 127).
-5. Writes a C header with the pixel data as a constant array.
+The script resizes the image to 224 x 224 pixels and applies ImageNet normalization by subtracting the dataset mean and dividing by the standard deviation. It then transposes the layout from HWC (height, width, channels) to CHW (channels, height, width) format. The batch dimension is added later in the application. Finally, it quantizes the values to int8 range (-128 to 127) and writes the pixel data to a C header as a constant array.
## Run the script
+From the image directory, run the preprocessing script:
+
```bash
-cd ~/repo/alif/image
+cd ~/alif/image
python prepare_image.py
```
@@ -97,8 +94,8 @@ This generates `input_image.h` in the same directory. The file is approximately
## Copy the header to the project
```bash
-cp ~/repo/alif/image/input_image.h \
- ~/repo/alif/alif_vscode-template/mv2_runner/assets/
+cp ~/alif/image/input_image.h \
+ ~/alif/alif_vscode-template/mv2_runner/assets/
```
{{% notice Note %}}
diff --git a/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/memory-configuration.md b/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/memory-configuration.md
index 020e947130..0e9399c20e 100644
--- a/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/memory-configuration.md
+++ b/content/learning-paths/embedded-and-microcontrollers/alif-image-classification/memory-configuration.md
@@ -7,7 +7,7 @@ layout: "learningpathall"
## Why memory configuration matters
-The stock Alif VS Code template divides memory equally between the two Cortex-M55 cores and allocates modest stack/heap sizes suitable for simple examples like Blinky. A MobileNetV2 model with ExecuTorch needs significantly more:
+The default Alif VS Code template divides memory equally between the two Cortex-M55 cores and allocates modest stack/heap sizes suitable for simple examples like Blinky. A MobileNetV2 model with ExecuTorch needs significantly more:
- The embedded model is approximately 3.7 MB (stored in MRAM/flash).
- The ExecuTorch runtime, operator libraries, and application code add another 800 KB of code.
@@ -19,10 +19,10 @@ You need to reconfigure the MRAM allocation, stack/heap sizes, and linker script
Open `device/ensemble/RTE/Device/AE822FA0E5597LS0_M55_HP/app_mem_regions.h`.
-Change the following values from their stock defaults:
+Change the following values from their defaults:
-| Define | Stock value | New value | Purpose |
-|--------|------------|-----------|---------|
+| Define | Default value | New value | Purpose |
+|--------|--------------|-----------|---------|
| `APP_MRAM_HE_BASE` | `0x80000000` | `0x80580000` | Move HE core out of the way |
| `APP_MRAM_HE_SIZE` | `0x00200000` | `0x00000000` | Give HE core zero MRAM |
| `APP_MRAM_HP_BASE` | `0x80200000` | `0x80000000` | HP core starts at MRAM base |
@@ -30,13 +30,13 @@ Change the following values from their stock defaults:
| `APP_HP_STACK_SIZE` | `0x00002000` | `0x00004000` | 16 KB stack (doubled) |
| `APP_HP_HEAP_SIZE` | `0x00004000` | `0x00010000` | 64 KB heap (quadrupled) |
-The stock template splits MRAM 2 MB / 2 MB between the two cores. Since you're only using the HP core, you give it the entire 5.5 MB of available MRAM. The increased stack and heap accommodate ExecuTorch's initialization code, which uses more stack depth and a few small dynamic allocations.
+The default template splits MRAM 2 MB / 2 MB between the two cores. Since you're only using the HP core, you give it the entire 5.5 MB of available MRAM. The increased stack and heap accommodate ExecuTorch's initialization code, which uses more stack depth and a few small dynamic allocations.
## Edit the linker script
Open `device/ensemble/RTE/Device/AE822FA0E5597LS0_M55_HP/linker_gnu_mram.ld.src`.
-You need three changes to this file.
+Make the following three changes to this file.
### Add SRAM1 to the zero-initialization table
@@ -99,7 +99,7 @@ Add the GOT entries after `KEEP(*(.jcr*))`:
```
{{% notice Note %}}
-This was the hardest bug to find during development. Without these two lines, the firmware boots, loads the model, but crashes with a BusFault when ExecuTorch tries to call any virtual function. The GOT is like a phone book for indirect calls. If you don't copy it from flash to RAM at startup, every lookup finds address zero and the CPU faults.
+This issue can be difficult to diagnose. Without these two lines, the firmware boots and loads the model, but crashes with a BusFault when ExecuTorch calls a virtual function. The GOT stores addresses for indirect calls. If the startup code does not copy it from flash to RAM, those lookups resolve to address zero and the CPU faults.
{{% /notice %}}
### Add SRAM section wildcards
@@ -182,4 +182,4 @@ The key fields are:
You can view the completed versions of these edited files in the [workshop repository](https://github.com/ArmDeveloperEcosystem/workshop-ethos-u) for reference.
-The memory layout and flash configuration are now ready. The next section covers preparing the test image.
+The memory layout and flash configuration are complete. The next section covers preparing the test image.
diff --git a/content/learning-paths/embedded-and-microcontrollers/device-connect-strands/_index.md b/content/learning-paths/embedded-and-microcontrollers/device-connect-strands/_index.md
new file mode 100644
index 0000000000..34ab1ff490
--- /dev/null
+++ b/content/learning-paths/embedded-and-microcontrollers/device-connect-strands/_index.md
@@ -0,0 +1,59 @@
+---
+title: Connect AI agents to edge devices using Device Connect and Strands
+
+minutes_to_complete: 30
+
+who_is_this_for: This is an introductory topic for software developers who want to connect AI agents to edge devices. You'll use Device Connect, Arm's platform for structured device access, and Strands, AWS's open-source agent SDK. The examples cover both physical and simulated devices.
+
+learning_objectives:
+ - Understand how Device Connect and Strands work together to give AI agents structured access to Arm-based edge devices
+ - Set up a Python environment with the Device Connect SDK and agent tools installed from source
+ - Start a simulated robot that registers itself on the local network and is discovered automatically by an agent
+ - Discover and invoke the robot using the Device Connect agent tools and the robot_mesh Strands tool
+
+prerequisites:
+ - A development machine with Python 3.12 installed
+ - Git installed
+ - Basic familiarity with Python virtual environments and command-line tools
+ - (Optional) A Raspberry Pi for testing a full device-to-device (D2D) setup
+
+author:
+ - Annie Tallund
+ - Kavya Sri Chennoju
+
+
+
+### Tags
+skilllevels: Introductory
+subjects: ML
+armips:
+ - Cortex-A
+ - Neoverse
+operatingsystems:
+ - Linux
+ - macOS
+tools_software_languages:
+ - Python
+ - Docker
+ - strands-agents
+
+further_reading:
+ - resource:
+ title: Strands Agents SDK documentation
+ link: https://strandsagents.com/
+ type: website
+ - resource:
+ title: Strands robots repository
+ link: https://github.com/strands-labs/robots/tree/dev
+ type: website
+ - resource:
+ title: Device Connect integration guide
+ link: https://github.com/atsyplikhin/robots/blob/feat/device-connect-integration-draft/strands_robots/device_connect/GUIDE.md
+ type: website
+
+### FIXED, DO NOT MODIFY
+# ================================================================================
+weight: 1 # _index.md always has weight of 1 to order correctly
+layout: "learningpathall" # All files under learning paths have this same wrapper
+learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
+---
diff --git a/content/learning-paths/embedded-and-microcontrollers/device-connect-strands/_next-steps.md b/content/learning-paths/embedded-and-microcontrollers/device-connect-strands/_next-steps.md
new file mode 100644
index 0000000000..c3db0de5a2
--- /dev/null
+++ b/content/learning-paths/embedded-and-microcontrollers/device-connect-strands/_next-steps.md
@@ -0,0 +1,8 @@
+---
+# ================================================================================
+# FIXED, DO NOT MODIFY THIS FILE
+# ================================================================================
+weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation.
+title: "Next Steps" # Always the same, html page title.
+layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing.
+---
diff --git a/content/learning-paths/embedded-and-microcontrollers/device-connect-strands/background.md b/content/learning-paths/embedded-and-microcontrollers/device-connect-strands/background.md
new file mode 100644
index 0000000000..5f5b1e9328
--- /dev/null
+++ b/content/learning-paths/embedded-and-microcontrollers/device-connect-strands/background.md
@@ -0,0 +1,71 @@
+---
+title: Learn Device Connect and Strands architecture for edge devices
+weight: 2
+
+# FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+## Why connect AI agents to edge devices?
+
+Arm processors are at the heart of a remarkable range of systems - from Cortex-M microcontrollers in industrial sensors to Neoverse servers running in the cloud. That breadth of hardware is one of Arm's greatest strengths, but it raises a practical question for AI developers: how do you give an agent structured, safe access to devices that are physically distributed and built on different software stacks?
+
+Device Connect is Arm's answer to that question. It's a platform layer that handles device registration, discovery, and remote procedure calls across a network of devices, with no bespoke networking code required. Strands is an open-source agent SDK from AWS that takes a model-driven approach to building AI agents - an LLM calls Python tools in a structured reasoning loop, and the SDK handles the rest. When you combine them, an agent can ask "which devices are online and what can they do?" and then invoke a function on a specific device, turning natural language intent into physical action.
+
+This Learning Path puts both tools through their paces. It starts with a single machine, for example a laptop, where a simulated robot and an agent discover each other automatically, then extends to a two-machine setup where a Raspberry Pi joins the same device mesh over the network.
+
+## Device Connect architecture layers
+
+**Device layer**
+
+A device is any process that registers itself on the mesh and exposes callable functions. In this Learning Path you'll create a simulated robot arm, namely the simulated robotic arm SO-100 from Hugging Face, from the `strands-robots` SDK. The moment this object is created, it registers on the local network under a unique device ID (for example, `so100_sim-abc23`) and begins publishing a presence heartbeat. No explicit registration call is required. Device Connect uses Zenoh as its underlying messaging transport, which handles low-level connectivity and routing automatically.
+
+**Agent layer**
+
+Two interfaces sit at this layer. The `device-connect-agent-tools` package exposes `discover_devices()` and `invoke_device()` as plain Python functions you can call directly from a script or REPL, with no LLM involved. The `robot_mesh` tool from `robots` wraps the same capabilities as a Strands agent tool, which means an LLM can also call them during a reasoning loop. Both share the same underlying Device Connect transport, so anything you can do with one you can do with the other.
+
+The diagram below shows how these layers communicate at runtime:
+
+```
+┌──────────────────────────────────────┐
+│ Agent layer │
+│ discover_devices · invoke_device │
+│ robot_mesh Strands tool │
+└──────────────┬───────────────────────┘
+ │ Device Connect
+ │ (device-to-device discovery & RPC)
+┌──────────────▼───────────────────────┐
+│ Device layer │
+│ Simulated SO-100 arm |
+| - so100-abc123 │
+│ heartbeat · execute · getStatus │
+└──────────────────────────────────────┘
+```
+
+## How device discovery works
+
+When the `SO-100 arm` instance starts, Device Connect automatically announces the device on the local network. Any process running `discover_devices()` or `robot_mesh(action='peers')` on the same network will hear the announcement and add the device to its live table of available hardware.
+
+## What the simulated robot provides
+
+When you run `Robot('so100')`, the SDK downloads the MuJoCo physics model for the SO-100 arm (this happens once on first run) and starts a local simulation. The robot exposes three functions that any agent can call via RPC:
+
+- `execute` - start a task with a given instruction and policy provider
+- `getStatus` - query the current task state
+- `stop` - halt the current task
+
+For this Learning Path, the `policy_provider='mock'` argument is used, which means `execute` accepts the call and returns `{'status': 'accepted'}` without actually running a motion policy. This keeps the focus on the connectivity and invocation patterns rather than robotics.
+
+Once you have the flow working end to end, replacing `'mock'` with a real policy is a one-line change.
+
+## What you'll learn in this Learning Path
+
+By working through the remaining sections you'll:
+
+- Clone the sample repository and install the Device Connect SDK, agent tools, and Strands robot runtime from source into a single virtual environment.
+- Start a simulated robot that registers itself on the local device mesh.
+- Discover and invoke the robot using `device-connect-agent-tools` directly.
+- Discover and command the robot through the `robot_mesh` Strands tool, including an emergency stop.
+- Optionally extend the setup to a Raspberry Pi connected over the network, discovering and commanding it from your laptop through the Device Connect infrastructure.
+
+The next section covers the environment setup.
\ No newline at end of file
diff --git a/content/learning-paths/embedded-and-microcontrollers/device-connect-strands/run-example.md b/content/learning-paths/embedded-and-microcontrollers/device-connect-strands/run-example.md
new file mode 100644
index 0000000000..759306eac5
--- /dev/null
+++ b/content/learning-paths/embedded-and-microcontrollers/device-connect-strands/run-example.md
@@ -0,0 +1,196 @@
+---
+title: Run the example end to end
+weight: 4
+
+# FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+## Overview
+
+This section runs Device Connect's device-to-device discovery. There are two ways to walk through this setup. Optionally, you can connect an external device.
+
+### Option 1: run on a single machine
+
+For a proof-of-concept, follow the steps using two terminal windows on your machine with the virtual environment set up.
+
+### Option 2: run with real hardware
+
+If you have access to an external device, you can use it with this setup as well. You'll need:
+
+- Your machine with the virtual environment set up. This machine will be referred to as the host.
+- A Raspberry Pi (or any similar device) connected to the same network as your host machine. This machine is your target.
+- An SSH connection or keyboard and monitor attached to the target.
+
+You'll need two terminal windows open at the same time: one to keep the simulated robot running, and one to invoke it from the agent side. Both terminals need the virtual environment activated.
+
+| Machine | Terminal | Purpose |
+|---------|----------|---------|
+| Host or Target | 1 | Simulated robot - keep running throughout |
+| Host | 2 | Agent tool invocations |
+
+Make sure you are in the repository directory and that your virtual environment is activated:
+
+```bash
+cd ~/strands-device-connect/robots
+source .venv/bin/activate
+```
+
+## Start the simulated robot
+
+In terminal 1, run the following command to create and start the simulated SO-100 robot arm:
+
+```python
+python <<'PY'
+import logging
+logging.basicConfig(level=logging.INFO)
+from strands_robots import Robot
+r = Robot('so100')
+r.run()
+PY
+```
+
+When the `Robot('so100')` object is created, the SDK downloads the MuJoCo physics model for the SO-100 arm. This download happens only on the first run and takes a minute or two. After that, it starts the simulation and registers the robot on the Device Connect device mesh. The robot publishes a presence heartbeat every 0.5 seconds under a unique device ID, for example `so100-abc123`.
+
+You should see INFO-level log output similar to:
+
+```output
+device_connect_sdk.device.so100-abc123 - INFO - Using ZENOH messaging backend
+device_connect_sdk.device.so100-abc123 - INFO - Connected to ZENOH broker: []
+device_connect_sdk.device.so100-abc123 - INFO - Driver connected: strands_sim
+device_connect_sdk.device.so100-abc123 - INFO - Subscribed to commands on device-connect.default.so100-abc123.cmd
+🤖 so100-abc123 is online. Ctrl+C to stop.
+```
+
+Leave this process running. The simulated robot is only discoverable as long as this process is alive.
+
+## Control the robot using the robot_mesh Strands tool
+
+The `robot_mesh` tool wraps the same discovery and invocation primitives as a Strands agent tool. You can call it directly from a Python script or attach it to an LLM agent; the API is identical either way.
+
+### Discover available robots on the device mesh
+
+Start by confirming which robots are currently visible on the mesh:
+
+```python
+python <<'PY'
+from strands_robots.tools.robot_mesh import robot_mesh
+print(robot_mesh(action='peers'))
+PY
+```
+
+The output is similar to:
+
+```output
+Discovered 1 device(s):
+ [robot] so100-abc123 - idle
+ Functions: execute, getFeatures, getStatus, reset, step, stop
+```
+
+The peer ID (for example `so100-abc123`) is assigned at startup and changes each run. Note the actual ID shown in your terminal - you'll need it in the next step.
+
+### Execute an instruction
+
+Send a task to one of the discovered robots. Replace `so100-abc123` with the peer ID shown in your `peers` output:
+
+```python
+python <<'PY'
+from strands_robots.tools.robot_mesh import robot_mesh
+print(robot_mesh(
+ action='tell',
+ target='so100-abc123',
+ instruction='pick up the cube',
+ policy_provider='mock',
+))
+PY
+```
+
+You will see the following output:
+
+```output
+-> so100-abc123: pick up the cube
+ {"status": "success", "content": [...]}
+```
+
+{{% notice Robot output in terminal 1 %}}
+The robot also logs event updates as it processes the task. If you switch back to terminal 1, it logs the execution of the task, similar to:
+
+```output
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+*** EVENT so100-abc123::stateUpdate [111aaabb]
+ payload: sim_time=4.34, step_count=2070, running_policies={'so100': {'steps': 814, 'instruction': 'pick up the cube'}}
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+```
+{{% /notice %}}
+
+
+### Broadcast emergency stop to all devices
+
+The emergency stop broadcasts a halt command to every device on the mesh simultaneously. This is useful when you want to stop all robots without knowing their individual peer IDs:
+
+```python
+python <<'PY'
+from strands_robots.tools.robot_mesh import robot_mesh
+print(robot_mesh(action='emergency_stop'))
+PY
+```
+
+The output is similar to:
+
+```output
+E-STOP: 1/1 devices stopped
+```
+
+## Optional: discover and invoke the robot using the agent tools
+
+The `device-connect-agent-tools` package gives you direct programmatic access to the mesh, without involving an LLM. This is useful for testing, scripting, or validating the stack before wiring it up to an agent. Open terminal 2, activate the virtual environment, then run:
+
+```python
+python <<'PY'
+from device_connect_agent_tools import connect, discover_devices, invoke_device
+
+connect()
+
+devices = discover_devices(device_type='')
+print(f'Found {len(devices)} robot(s):')
+for d in devices:
+ print(f' {d["device_id"]}')
+
+if devices:
+ result = invoke_device(
+ devices[0]['device_id'],
+ 'execute',
+ {'instruction': 'pick up the cube', 'policy_provider': 'mock'},
+ )
+ print(f'Execute result: {result}')
+
+ status = invoke_device(devices[0]['device_id'], 'getStatus')
+ print(f'Status: {status}')
+PY
+```
+
+{{% notice About the snippet %}}
+When an agent calls `execute(instruction="pick up the cup", policy_provider="groot")`, Device Connect handles the RPC delivery, and the policy handles the actual arm movement.
+
+`discover_devices(device_type='')` returns all devices on the mesh regardless of type. If you pass `device_type='strands_robot'` you can filter to only `Robot()` instances. `invoke_device` sends an RPC to the named device; here `policy_provider='mock'` tells the robot to accept the task without executing real motion, which is appropriate for this connectivity test.
+{{% /notice %}}
+
+
+The output is similar to:
+
+```output
+Found 1 robot(s):
+ so100-abc123 - idle
+Execute result: {'success': True, 'result': {'status': 'success', 'content': [...]}}
+Status: {'success': True, 'result': {...}} # full sim state dict
+```
+
+## What you've learned and what's next
+
+In this section you've:
+
+- Started a simulated SO-100 robot that registered itself on the Device Connect device mesh.
+- Used `device-connect-agent-tools` to discover the robot and invoke an RPC call against it.
+- Used the `robot_mesh` Strands tool to list peers, send an instruction, and trigger an emergency stop.
+
+This showcases the ease of setting up a mesh on a local network. In the next section, you can extend the configuration to a Docker-based approach, opening up a new category of possibilities with agent integration.
\ No newline at end of file
diff --git a/content/learning-paths/embedded-and-microcontrollers/device-connect-strands/run-infra-example.md b/content/learning-paths/embedded-and-microcontrollers/device-connect-strands/run-infra-example.md
new file mode 100644
index 0000000000..7d3adbc131
--- /dev/null
+++ b/content/learning-paths/embedded-and-microcontrollers/device-connect-strands/run-infra-example.md
@@ -0,0 +1,199 @@
+---
+title: Run with full Device Connect infrastructure (optional)
+weight: 5
+
+# FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+## Why add infrastructure?
+
+The previous section ran entirely on a local network, with Device Connect handling device-to-device discovery automatically. That approach is fast and requires zero configuration, but it has natural limits: both the robot and the agent must be on the same LAN, device state is ephemeral, and there's no registry you can query by device type.
+
+This section goes one step further. You'll run the Zenoh router, an etcd state store, and a registry service on your machine using Docker, then connect a Raspberry Pi on the same network as the remote device. You can use a different device as long as you can access it, but the Raspberry Pi will be used as an example. This device is also referred to as the target.
+
+The agent running on your machine will discover the robot running on the Pi through the infrastructure, as if both were part of the same managed fleet. The Pi doesn't need to run Docker - it just needs Python and the packages from setup.
+
+Confirm Docker and Docker Compose v2 are available on the host before continuing:
+
+```bash
+docker --version
+docker compose version
+```
+
+## Clone and bring up the Docker image
+
+```bash
+cd ~/strands-device-connect
+git clone --depth 1 https://github.com/arm/device-connect.git
+
+```
+
+## Machine and terminal layout
+
+This section involves two machines. Keep track of which commands run where:
+
+| Machine | Terminal | Purpose |
+|---------|----------|---------|
+| Host | 1 | Docker Compose infrastructure |
+| Host | 2 | Agent tool invocations |
+| Target | - | Robot process |
+
+## Step 1 - start the infrastructure on your host machine
+
+In host terminal 1, bring up the Device Connect infrastructure stack. The Compose file is inside the `device-connect` repository you cloned during setup:
+
+```bash
+cd ~/strands-device-connect/device-connect/packages/device-connect-server
+docker compose -f infra/docker-compose-dev.yml up -d
+cd ../../..
+```
+
+Confirm the services are healthy:
+
+```bash
+docker compose -f infra/docker-compose-dev.yml ps
+```
+
+The output is similar to:
+
+```output
+NAME STATUS PORTS
+zenoh-router running 0.0.0.0:7447->7447/tcp
+etcd running 0.0.0.0:2379->2379/tcp
+device-registry running 0.0.0.0:8080->8080/tcp
+```
+
+All three services must show `running` before you continue. The router on port 7447 is the single rendezvous point for all device traffic. Every device on any machine points at this address to join the mesh.
+
+## Step 2 - find your host's IP address
+
+The Raspberry Pi needs to connect to the Device Connect router on your host by IP address. Find it now:
+
+```bash
+# macOS
+ipconfig getifaddr en0
+
+# Linux
+hostname -I | awk '{print $1}'
+```
+
+Note the address returned - for the rest of this section it's referred to as `HOST_IP`. For example, if the command returns `192.168.1.42`, replace every occurrence of `HOST_IP` below with that address.
+
+## Step 3 - prepare the Raspberry Pi
+
+On the Raspberry Pi, follow the same repository and environment setup from the setup section of this Learning Path: install Python 3.12, clone the `robots` repository, create the virtual environment, and install the packages with the same editable install commands.
+
+Once the environment is ready, export the three variables that tell the SDK to route traffic through the Device Connect router on your host rather than using local network discovery:
+
+```bash
+export MESSAGING_BACKEND=zenoh
+export ZENOH_CONNECT=tcp/HOST_IP:7447
+export DEVICE_CONNECT_ALLOW_INSECURE=true
+```
+
+Replace `HOST_IP` with the address you noted in Step 2. `DEVICE_CONNECT_ALLOW_INSECURE=true` disables mTLS for this local development setup - don't use this flag in production.
+
+## Step 4 - start the robot on the Raspberry Pi
+
+On the Raspberry Pi, with the environment active and the variables set, start the simulated SO-100 robot:
+
+```python
+python <<'PY'
+import logging
+logging.basicConfig(level=logging.INFO)
+from strands_robots import Robot
+r = Robot('so100')
+r.run()
+PY
+```
+
+Because `ZENOH_CONNECT` points at your host, the SDK routes traffic through the Device Connect router instead of using local network discovery. The robot registers with the persistent registry and you should see output similar to:
+
+```output
+INFO:strands_robots.mesh:Zenoh session started
+INFO:strands_robots.mesh:Peer ID: so100-abc123
+device_connect_sdk.device.so100-abc123 - INFO - Using ZENOH messaging backend
+device_connect_sdk.device.so100-abc123 - INFO - Connected to ZENOH broker: ['tcp/192.168.1.42:7447']
+device_connect_sdk.device.so100-abc123 - INFO - Device registered: registration_id=ecfff6a7-...
+```
+
+Note the peer ID (for example `so100-abc123`). You'll need it in the `tell` command below. Leave this process running on the Pi.
+
+## Discover and invoke using the robot_mesh Strands tool
+
+In host terminal 2 (with the environment variables set), use `robot_mesh` to confirm the Pi's robot is visible as a peer:
+
+```python
+python <<'PY'
+from strands_robots.tools.robot_mesh import robot_mesh
+print(robot_mesh(action='peers'))
+PY
+```
+
+The output is similar to:
+
+```output
+Discovered 1 device(s):
+ [robot] so100-abc123 - idle
+ Functions: execute, getFeatures, getState, getStatus, stop
+```
+
+Send an instruction to the robot. Replace `so100-abc123` with the peer ID shown in your output:
+
+```python
+python <<'PY'
+from strands_robots.tools.robot_mesh import robot_mesh
+print(robot_mesh(
+ action='tell',
+ target='so100-abc123',
+ instruction='pick up the cube',
+ policy_provider='mock',
+))
+PY
+```
+
+The output is similar to:
+
+```output
+-> so100-abc123: pick up the cube
+ {"status": "accepted"}
+```
+
+Trigger an emergency stop across every device registered with the infrastructure:
+
+```python
+python <<'PY'
+from strands_robots.tools.robot_mesh import robot_mesh
+print(robot_mesh(action='emergency_stop'))
+PY
+```
+
+The output is similar to:
+
+```output
+E-STOP: 1/1 devices stopped
+```
+
+The stop broadcast reaches every device in the registry - whether it is running on your host, the Pi on your desk, or a machine in a remote lab.
+
+## Shut down cleanly
+
+Stop the robot process on the Pi with `Ctrl+C`, then bring down the infrastructure on your host:
+
+```bash
+cd ~/strands-device-connect/device-connect/packages/device-connect-server
+docker compose -f infra/docker-compose-dev.yml down
+```
+
+This removes the containers and clears the in-container etcd state. Your virtual environment and cloned repositories remain intact on both machines.
+
+## What you've learned and what's next
+
+In this section you've:
+
+- Started a persistent Device Connect infrastructure stack on your host - a router, etcd, and a device registry.
+- Connected a Raspberry Pi as a remote device by pointing its SDK at the router's TCP address.
+- Discovered the Pi's robot from your host by querying the persistent registry and sent commands to it across the network.
+
+This is a deliberately simple two-device setup, but it demonstrates the foundation for something much larger. Once devices register through a shared infrastructure, agents can discover and command any of them without caring where they run - a fleet of robot arms, a network of sensors, or a mix of physical and simulated devices all become equally reachable. Adding more devices is just a matter of pointing them at the same router. That's the core of what Device Connect makes possible: a mesh of heterogeneous devices that agents can reason about and act on, at any scale.
diff --git a/content/learning-paths/embedded-and-microcontrollers/device-connect-strands/setup.md b/content/learning-paths/embedded-and-microcontrollers/device-connect-strands/setup.md
new file mode 100644
index 0000000000..9730e2282a
--- /dev/null
+++ b/content/learning-paths/embedded-and-microcontrollers/device-connect-strands/setup.md
@@ -0,0 +1,69 @@
+---
+title: Set up the developer environment
+weight: 3
+
+# FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+## Verify the required tools
+
+Before cloning any repositories, confirm that Python 3.12 and Git are available:
+
+```bash
+python3.12 --version
+git --version
+```
+
+These instructions are tested on Python 3.12. Earlier versions of Python 3 may work but are not validated against the `feat/device-connect-integration-draft` branch used in this Learning Path.
+
+## Clone the repository
+
+The code run in this Learning Path sits in the `robots` repository. It contains the robot runtime and the `robot_mesh` Strands tool.
+
+```bash
+mkdir ~/strands-device-connect
+cd strands-device-connect
+git clone https://github.com/atsyplikhin/robots.git
+```
+
+## Check out the integration branch
+
+The Device Connect integration code for `robots` lives on the `feat/device-connect-integration-draft` branch. This branch adds the `RobotDeviceDriver` adapter and the updated `robot_mesh` tool that routes calls through the Device Connect SDK rather than the raw Zenoh mesh.
+
+```bash
+cd ~/strands-device-connect/robots
+git checkout feat/device-connect-integration-draft
+cd ..
+```
+
+## Create a Python virtual environment
+
+Create a single virtual environment at the workspace root, then activate it:
+
+```bash
+python3.12 -m venv .venv
+source .venv/bin/activate
+```
+
+Now install the packages and make sure they are available on your `PYTHONPATH` environment variable:
+
+```bash
+pip install -e ".[sim]"
+export PYTHONPATH="$PWD:$PYTHONPATH"
+```
+
+## How discovery works - no configuration needed
+
+The `strands-robots` SDK uses Device Connect's built-in device-to-device discovery: every `Robot()` instance announces itself on the local network at startup, and any process running `discover_devices()` or `robot_mesh(action='peers')` on the same network segment will find it automatically.
+
+This means discovery works as long as the device process and the agent process are on the same LAN or on the same machine. Discovery is typically available on home and office networks. If you are behind a firewall or VPN that blocks local network traffic, devices will not discover each other - that scenario requires the infrastructure-backed setup with a Zenoh router, which is covered later in this Learning Path.
+
+## What you've set up and what's next
+
+At this point you've:
+
+- Cloned `robots` with the `feat/device-connect-integration-draft` branch checked out.
+- Created a Python 3.12 virtual environment with the Device Connect SDK, agent tools, and robot simulation runtime all installed.
+
+The next section walks you through starting a simulated robot and invoking it from both the agent tools and the `robot_mesh` Strands tool.
\ No newline at end of file
diff --git a/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/1-overview.md b/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/1-overview.md
index 788cb474a7..0e87ef1f6f 100644
--- a/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/1-overview.md
+++ b/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/1-overview.md
@@ -1,55 +1,55 @@
---
-title: Overview
+title: Understand ExecuTorch deployment on NXP with Ethos-U
weight: 2
### FIXED, DO NOT MODIFY
layout: learningpathall
---
-## Hardware Overview - NXP's FRDM i.MX 93 Board
+## Before you begin
-Selecting the best hardware for machine learning (ML) models depends on effective tools. You can visualize ML performance early in the development cycle by using NXP's [FRDM i.MX 93](https://www.nxp.com/design/design-center/development-boards-and-designs/frdm-i-mx-93-development-board:FRDM-IMX93) board.
+This Learning Path assumes your FRDM i.MX 93 board is already set up and you can transfer files between your host machine and the board.
-
-
+If you still need to set up Linux, serial console access, and file transfer, follow the Learning Path [Linux on an NXP FRDM i.MX 93 board](/learning-paths/embedded-and-microcontrollers/linux-nxp-board/) before continuing.
-*Unboxing NXP's FRDM i.MX 93 board*
-
+ExecuTorch is designed to scale from servers to endpoints, and Arm systems often scale within a *single device*.
+The FRDM i.MX 93 platform combines:
-
+- An application processor running Linux (Cortex-A) that handles system services and orchestration
+- A Cortex-M33 microcontroller core that runs real-time firmware
+- An Ethos-U65 NPU that accelerates TinyML inference
-### NXP's FRDM i.MX 93 Processor Decoded
+This Learning Path focuses on a concrete milestone: successful bring-up of an ExecuTorch `executor_runner` firmware on Cortex-M33 on this NXP platform.
-
+This example keeps the Linux side intentionally simple. Linux loads and starts the Cortex-M33 firmware through RemoteProc, and you stage a compiled ExecuTorch `.pte` model so the firmware can run it.
-**NXP's Processor Labeling Convention:**
-|Line|Meaning|
-|----|-------|
-|MIMX9352|• MI – Microcontroller IC
• MX93 – i.MX 93 family
• 52 – Variant:
• Dual-core Arm Cortex-A55
• Single Cortex-M33
• Includes **Ethos-U65 NPU**|
-|CVVXMAB|• C - Commercial temperature grade (0°C to 95°C)
• VVX - Indicates package type and pinout (BGA, pitch, etc.)
• MAB - Specific configuration (e.g., NPU present, security level, memory interfaces)
-|
-|1P87F|• Silicon mask set identifier|
-|SBBM2410E|• NXP traceability code|
+## What you’ll build and validate
-## Software Overview - NXP's MCUXpresso IDE
+By the end of this Learning Path, you'll have:
-NXP generously provides free software for working with their boards, the [MCUXpresso Integrated Development Environment (IDE)](https://www.nxp.com/design/design-center/software/development-software/mcuxpresso-software-and-tools-/mcuxpresso-integrated-development-environment-ide:MCUXpresso-IDE). In this learning path, you will instead use [MCUXpresso for Visual Studio Code](https://www.nxp.com/design/design-center/software/development-software/mcuxpresso-software-and-tools-/mcuxpresso-for-visual-studio-code:MCUXPRESSO-VSC).
+- A `.pte` model artifact compiled for `ethos-u65-256`
+- A Cortex-M33 `executor_runner` firmware image built against prebuilt ExecuTorch libraries
+- A repeatable deployment flow that loads the firmware, runs inference, and reports results through the remoteproc trace buffer
-## Software Overview - Visual Studio Code
+## What you need before you continue
-[Visual Studio Code](https://code.visualstudio.com/) is a free integrated development environment provided by Microsoft. It is platform independent, full featured, and accomodating of many engineering frameworks. You will use Visual Studio Code to both configure NXP's software and connect to NXP's hardware.
+After you complete the Linux setup Learning Path, you should have:
-## Software Overview - TinyML.
+- A way to log in to the board (serial console and/or SSH)
+- A way to transfer files (for example, `scp`)
+
+## NXP's MCUXpresso IDE
+
+NXP provides free software for working with their boards, the [MCUXpresso Integrated Development Environment (IDE)](https://www.nxp.com/design/design-center/software/development-software/mcuxpresso-software-and-tools-/mcuxpresso-integrated-development-environment-ide:MCUXpresso-IDE). In this Learning Path, you use [MCUXpresso for Visual Studio Code](https://www.nxp.com/design/design-center/software/development-software/mcuxpresso-software-and-tools-/mcuxpresso-for-visual-studio-code:MCUXPRESSO-VSC).
+
+MCUXpresso matters here because it gives you a predictable way to build and manage Cortex-M firmware on a platform where Linux is running at the same time.
+
+## TinyML
This Learning Path uses TinyML. TinyML is machine learning tailored to function on devices with limited resources, constrained memory, low power, and fewer processing capabilities.
-For a learning path focused on creating and deploying your own TinyML models, please see [Introduction to TinyML on Arm using PyTorch and ExecuTorch](/learning-paths/embedded-and-microcontrollers/introduction-to-tinyml-on-arm/)
+For a Learning Path focused on creating and deploying your own TinyML models, see [Introduction to TinyML on Arm using PyTorch and ExecuTorch](/learning-paths/embedded-and-microcontrollers/introduction-to-tinyml-on-arm/)
-## Benefits and applications
+In this Learning Path, you focus on deployment and observation: building the two runtime artifacts (the `.pte` model and the `executor_runner` firmware), bringing them up on the board, and confirming the Ethos-U acceleration path is active.
-NPUs, like Arm's [Ethos-U65](https://www.arm.com/products/silicon-ip-cpu/ethos/ethos-u65) NPU are available on physical devices specifically made for developers. Development boards like NXP's [FRDM i.MX 93](https://www.nxp.com/design/design-center/development-boards-and-designs/frdm-i-mx-93-development-board:FRDM-IMX93) also connect to displays via a HDMI cable. Additionally the board accepts video inputs. This is useful for for ML performance visualization due to:
-- visual confirmation that your ML model is running on the physical device
-- image and video inputs for computer vision models running on the device
-- clearly indicated instruction counts
-- confirmation of total execution time and
-- visually appealing output for prototypes and demos
+The next section covers booting the FRDM i.MX 93 and establishing a console connection so you can log in and transfer files.
diff --git a/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/10-deploy-executorchrunner-nxp-board.md b/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/10-deploy-executorchrunner-nxp-board.md
index ccc9e90bfa..da44cc3a28 100644
--- a/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/10-deploy-executorchrunner-nxp-board.md
+++ b/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/10-deploy-executorchrunner-nxp-board.md
@@ -6,283 +6,222 @@ weight: 11
layout: learningpathall
---
-## Connect to the FRDM-IMX93 board
+## Deployment overview
-The FRDM-IMX93 board runs Linux on the Cortex-A55 cores. You need network or serial access to deploy the firmware.
+This section is where the heterogeneous system comes together.
+Linux on the application cores manages the lifecycle of Cortex-M33 through RemoteProc, and your Cortex-M33 firmware brings up ExecuTorch and the Ethos-U65 delegate.
-Find your board's IP address using the serial console or check your router's DHCP leases.
+Your success criteria is simple and observable: the remoteproc trace buffer shows a completed inference run with no bus errors.
-Connect via SSH:
+## Prerequisites
-{{< tabpane code=false >}}
-{{< tab header="Windows/Linux" >}}
-```bash
-ssh root@192.168.1.24
-```
+Before deploying, verify the following on your FRDM-IMX93 board:
-Alternative with PuTTY on Windows:
-- Host: `192.168.1.24`
-- Port: `22`
-- Connection type: SSH
-- Username: `root`
-{{< /tab >}}
-{{< tab header="macOS" >}}
-```bash
-ssh root@192.168.1.24
-```
-{{< /tab >}}
-{{< /tabpane >}}
+1. **Ethos-U kernel driver is loaded.** The Linux `ethosu` driver must be bound so the NPU is powered and clocked:
-Replace `192.168.1.24` with your board's IP address.
+ ```bash { command_line="root@frdm-imx93" output_lines="2" }
+ ls /dev/ethosu*
+ /dev/ethosu0
+ ```
-## Copy the firmware to the board
+ If `/dev/ethosu0` doesn't exist, the NPU isn't powered and the firmware will hang at NPU initialization.
-Copy the built firmware file to the board's firmware directory:
+2. **DDR memory is reserved for the CM33.** The NXP BSP reserves two DDR regions by default:
+
+ ```dts
+ reserved-memory {
+ model@c0000000 {
+ reg = <0 0xc0000000 0 0x400000>; /* 4MB for .pte model */
+ no-map;
+ };
+ ethosu_region@A8000000 {
+ reg = <0 0xa8000000 0 0x8000000>; /* 128MB for NPU working memory */
+ no-map;
+ };
+ };
+ ```
+
+## Copy files to the board
+
+From the `Executorch_runner_cm33` project, copy both the firmware and the `.pte` model to the board:
-{{< tabpane code=false >}}
-{{< tab header="Windows/Linux" >}}
-```bash
-scp debug/executorch_runner_cm33.elf root@192.168.1.24:/lib/firmware/
-```
-{{< /tab >}}
-{{< tab header="macOS" >}}
```bash
-scp debug/executorch_runner_cm33.elf root@192.168.1.24:/lib/firmware/
+scp debug/executorch_runner_cm33.elf root@:/lib/firmware/
+scp mobilenetv2_u65.pte root@:/tmp/
```
-{{< /tab >}}
-{{< /tabpane >}}
-Verify the file was copied:
+## Connect to the board
+
+SSH into the board for the remaining steps:
-```bash { command_line="root@frdm-imx93" output_lines="2" }
-ls -lh /lib/firmware/executorch_runner_cm33.elf
--rw-r--r-- 1 root root 601K Oct 24 10:30 /lib/firmware/executorch_runner_cm33.elf
+```bash
+ssh root@
```
-## Load the firmware on Cortex-M33
+Replace `` with your board's actual IP address.
-The Cortex-M33 firmware is managed by the RemoteProc framework running on Linux.
+## Load the model to DDR
-Stop any currently running firmware:
+The `executor_runner` firmware reads the `.pte` model from DDR at address `0xC0000000`. Write the model into DDR using `/dev/mem`:
-```bash { command_line="root@frdm-imx93" }
-echo stop > /sys/class/remoteproc/remoteproc0/state
+```bash { command_line="root@frdm-imx93" output_lines="8" }
+python3 -c "
+import mmap, os
+pte = open('/tmp/mobilenetv2_u65.pte', 'rb').read()
+fd = os.open('/dev/mem', os.O_RDWR | os.O_SYNC)
+m = mmap.mmap(fd, len(pte), mmap.MAP_SHARED, mmap.PROT_WRITE, offset=0xC0000000)
+m.write(pte)
+m.close()
+os.close(fd)
+print(f'Wrote {len(pte)} bytes to 0xC0000000')
+"
+Wrote 3507872 bytes to 0xC0000000
```
-Set the new firmware:
+{{% notice Note %}}
+You can also load the model via U-Boot if the `.pte` file is on the SD card's first partition. At the U-Boot prompt, run `fatload mmc 0:1 0xc0000000 mobilenetv2_u65.pte` followed by `boot`. The model remains in DDR across Linux boot because the region is marked `no-map`.
+{{% /notice %}}
-```bash { command_line="root@frdm-imx93" }
-echo executorch_runner_cm33.elf > /sys/class/remoteproc/remoteproc0/firmware
-```
+## Run inference
-Start the Cortex-M33 with the new firmware:
+Start the Cortex-M33 firmware through RemoteProc. RemoteProc is the control plane for this platform: it gives you a consistent way to stop, replace, and start the Cortex-M33 image without manually resetting the system.
```bash { command_line="root@frdm-imx93" }
+echo stop > /sys/class/remoteproc/remoteproc0/state
+echo executorch_runner_cm33.elf > /sys/class/remoteproc/remoteproc0/firmware
echo start > /sys/class/remoteproc/remoteproc0/state
+sleep 15
+cat /sys/kernel/debug/remoteproc/remoteproc0/trace0
```
-Verify the firmware loaded successfully:
+{{% notice Note %}}
+If no firmware is running, the `stop` command prints an error. That is expected and can be ignored.
+{{% /notice %}}
+
+## Expected output
-```bash { command_line="root@frdm-imx93" output_lines="2-5" }
-dmesg | grep remoteproc | tail -n 5
-[12345.678] remoteproc remoteproc0: powering up imx-rproc
-[12345.679] remoteproc remoteproc0: Booting fw image executorch_runner_cm33.elf, size 614984
-[12345.680] remoteproc remoteproc0: header-less resource table
-[12345.681] remoteproc remoteproc0: remote processor imx-rproc is now up
+You should see output similar to:
+
+```output
+NPU config match
+NPU arch match
+cmd_end_reached 0x1
+bus_status_error 0x0
+1 inferences finished
+Output[0]: dtype=6, numel=1000, nbytes=4000
+Program complete, exiting.
```
-The message "remote processor imx-rproc is now up" confirms successful loading.
+The key indicators of a successful inference run:
-## Load a model to DDR memory
+| Output | Meaning |
+|--------|---------|
+| `NPU config match` | The compiled model's NPU configuration matches the hardware |
+| `NPU arch match` | The compiled model's architecture version matches the hardware |
+| `cmd_end_reached 0x1` | The NPU executed all 116 operators in the command stream |
+| `bus_status_error 0x0` | No AXI bus errors during NPU memory access |
+| `numel=1000` | MobileNet V2 output: 1000 ImageNet classification scores (one per class) |
-The executor_runner loads `.pte` model files from DDR memory at address 0x80100000.
+The model runs with uninitialized input data, so the output scores don't correspond to a real image classification. To get meaningful predictions, feed a real 224x224 RGB image as input.
-Copy your `.pte` model to the board:
+{{% notice Note %}}
+If the trace buffer shows `Program identifier '' != expected 'ET12'`, the `.pte` model wasn't loaded into DDR at `0xC0000000`. Reload the model using the steps above.
+{{% /notice %}}
-{{< tabpane code=false >}}
-{{< tab header="Windows/Linux" >}}
-```bash
-scp model.pte root@192.168.1.24:/tmp/
-```
-{{< /tab >}}
-{{< tab header="macOS" >}}
-```bash
-scp model.pte root@192.168.1.24:/tmp/
-```
-{{< /tab >}}
-{{< /tabpane >}}
+## Re-run inference
-Write the model to DDR memory:
+The `executor_runner` runs inference once when the firmware starts. To re-run, reload the firmware:
```bash { command_line="root@frdm-imx93" }
-dd if=/tmp/model.pte of=/dev/mem bs=1M seek=2049
+echo stop > /sys/class/remoteproc/remoteproc0/state
+sleep 2
+echo start > /sys/class/remoteproc/remoteproc0/state
+sleep 15
+cat /sys/kernel/debug/remoteproc/remoteproc0/trace0
```
-The seek value of 2049 corresponds to address 0x80100000 (2049 MB = 0x801 in hex).
+The trace buffer resets at the start of each firmware load, so you'll always see fresh output.
-Verify the model was written:
-
-```bash { command_line="root@frdm-imx93" output_lines="2-5" }
-xxd -l 64 -s 0x80100000 /dev/mem
-80100000: 504b 0304 1400 0000 0800 0000 2100 a3b4 PK..........!...
-80100010: 7d92 5801 0000 6c04 0000 1400 0000 7661 }.X...l.......va
-80100020: 6c75 652f 7061 7261 6d73 2e70 6b6c 6500 lue/params.pkl.
-80100030: ed52 cd4b 0241 1cfd 66de 49b6 9369 1ad9 .R.K.A..f.I..i..
-```
+## What you've accomplished and what's next
-Non-zero bytes confirm the model is present in memory.
+In this section:
-## Monitor Cortex-M33 output
+- You used Linux RemoteProc to load and boot a custom Cortex-M33 firmware image
+- You validated an end-to-end ExecuTorch inference run that delegates computation to the Ethos-U65 NPU
-The executor_runner outputs debug information via UART. Connect a USB-to-serial adapter to the M33 UART pins on the FRDM board.
+Next, you can iterate on `.pte` models (and measure how operator coverage and model shape affect runtime behavior) while keeping the firmware bring-up path stable.
-Open a serial terminal (115200 baud, 8N1):
+## Update the firmware
-{{< tabpane code=false >}}
-{{< tab header="Windows/Linux" >}}
-```bash
-screen /dev/ttyUSB0 115200
-```
+To deploy a new version of the firmware:
-Alternative with minicom:
-```bash
-minicom -D /dev/ttyUSB0 -b 115200
-```
-{{< /tab >}}
-{{< tab header="macOS" >}}
-```bash
-screen /dev/tty.usbserial-* 115200
-```
+1. Build the updated firmware on your development machine
+2. Copy to the board:
-Alternative with minicom:
```bash
-minicom -D /dev/tty.usbserial-* -b 115200
-```
-{{< /tab >}}
-{{< /tabpane >}}
-
-You should see output from the ExecuTorch runtime:
-
-```output
-ExecuTorch Runtime Starting...
-Loading model from 0x80100000
-Model loaded successfully
-Initializing Ethos-U NPU delegate
-NPU initialized
-Running inference...
-Inference complete: 45.2ms
+scp debug/executorch_runner_cm33.elf root@:/lib/firmware/
```
-{{% notice Tip %}}
-If you don't see UART output, verify the serial connection settings (115200 baud, 8N1) and check that the UART pins are correctly connected.
-{{% /notice %}}
-
-## Test inference
-
-The executor_runner automatically runs inference when it starts. Check the UART output for inference results and timing.
-
-To restart inference, you can reload the firmware:
+3. Re-run inference on the board:
```bash { command_line="root@frdm-imx93" }
echo stop > /sys/class/remoteproc/remoteproc0/state
+sleep 2
echo start > /sys/class/remoteproc/remoteproc0/state
+sleep 15
+cat /sys/kernel/debug/remoteproc/remoteproc0/trace0
```
-Monitor the UART console to see the new inference run.
-
-## Verify deployment success
-
-Confirm your deployment is working correctly:
-
-1. **RemoteProc status shows "running":**
-
-```bash { command_line="root@frdm-imx93" output_lines="2" }
-cat /sys/class/remoteproc/remoteproc0/state
-running
-```
-
-2. **Firmware is loaded:**
-
-```bash { command_line="root@frdm-imx93" output_lines="2" }
-cat /sys/class/remoteproc/remoteproc0/firmware
-executorch_runner_cm33.elf
-```
-
-3. **Model is in DDR memory** (non-zero bytes at 0x80100000)
-
-4. **UART shows inference output** with timing information
-
-## Troubleshooting
-
+{{% notice Troubleshooting %}}
**RemoteProc fails to load firmware:**
-Check file permissions:
+Check that the file exists and has correct permissions:
```bash { command_line="root@frdm-imx93" }
+ls -la /lib/firmware/executorch_runner_cm33.elf
chmod 644 /lib/firmware/executorch_runner_cm33.elf
```
-Verify the file exists:
+**`Program identifier '' != expected 'ET12'`:**
-```bash { command_line="root@frdm-imx93" }
-ls -la /lib/firmware/executorch_runner_cm33.elf
-```
+The `.pte` model is not present at DDR address `0xC0000000`. Reload the model using the `/dev/mem` method or via U-Boot.
-**Model not found error:**
+**Firmware hangs (no trace output):**
-Verify the model was written to memory:
+Verify the Ethos-U kernel driver is loaded:
```bash { command_line="root@frdm-imx93" }
-xxd -l 256 -s 0x80100000 /dev/mem | head
+ls /dev/ethosu*
+dmesg | grep ethosu
```
-If all zeros, re-run the `dd` command to write the model.
-
-**No UART output:**
+If `/dev/ethosu0` does not exist, the NPU is not powered and the firmware cannot initialize it.
-Check the serial connection:
-- Baud rate: 115200
-- Data bits: 8
-- Parity: None
-- Stop bits: 1
+**Memory allocation failed for planned buffer:**
-Try a different USB port or serial terminal program.
+This occurs when a large model's activation tensors exceed the DTCM method allocator. The firmware automatically uses DDR for models that need more than 12KB of planned buffers. If you see this error, verify the `ethosu_region@A8000000` (128MB) is reserved in the device tree.
-**Firmware crashes or hangs:**
+**BUS FAULT or vtable corruption:**
-Check kernel logs for errors:
+The SDK linker script patch has not been applied. Run the patch script and rebuild:
-```bash { command_line="root@frdm-imx93" }
-dmesg | grep -i error | tail
+```bash
+./patches/apply_patches.sh
+cmake --preset debug
+cmake --build debug
```
-This might indicate memory configuration issues. Reduce the memory pool sizes in `CMakeLists.txt` and rebuild.
+**Firmware crashes after NPU init:**
-## Update the firmware
+Check kernel logs:
-To deploy a new version of the firmware:
-
-1. Build the updated firmware on your development machine
-2. Copy to the board:
-
-{{< tabpane code=false >}}
-{{< tab header="Windows/Linux" >}}
-```bash
-scp debug/executorch_runner_cm33.elf root@:/lib/firmware/
-```
-{{< /tab >}}
-{{< tab header="macOS" >}}
-```bash
-scp debug/executorch_runner_cm33.elf root@:/lib/firmware/
+```bash { command_line="root@frdm-imx93" }
+dmesg | grep -i error | tail
```
-{{< /tab >}}
-{{< /tabpane >}}
-3. Restart RemoteProc:
+This might indicate memory configuration issues. Verify that both DDR regions (`0xC0000000`--`0xC03FFFFF` and `0xA8000000`--`0xAFFFFFFF`) are reserved in the device tree.
+{{% /notice %}}
-```bash { command_line="root@frdm-imx93" }
-echo stop > /sys/class/remoteproc/remoteproc0/state
-echo start > /sys/class/remoteproc/remoteproc0/state
-```
+## Summary
-4. Monitor UART output to verify the new firmware is running
+You've completed the full bring-up flow for ExecuTorch on the NXP FRDM i.MX 93. Along the way, you set up a reproducible build environment, compiled two `.pte` model artifacts targeting the Ethos-U65, built and patched the Cortex-M33 `executor_runner` firmware, and deployed it through Linux RemoteProc to confirm a successful end-to-end inference run. The remoteproc trace buffer confirmed zero bus errors and full NPU operator coverage, establishing a stable foundation for iterating on models and firmware independently.
\ No newline at end of file
diff --git a/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/2-boot-nxp.md b/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/2-boot-nxp.md
index 831aa0ec48..dd5ba0d52f 100644
--- a/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/2-boot-nxp.md
+++ b/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/2-boot-nxp.md
@@ -1,81 +1,77 @@
---
# User change
-title: "Boot the NXP FRDM i.MX 93 Board"
+title: "Boot the NXP FRDM i.MX 93 board"
weight: 3
# Do not modify these elements
layout: "learningpathall"
---
+## Connect to the board
-In this section, you will prepare the NXP [FRDM i.MX 93](https://www.nxp.com/design/design-center/development-boards-and-designs/frdm-i-mx-93-development-board:FRDM-IMX93) board for ML development.
+This section walks through powering on the board and establishing a serial console connection. If your board is already running Linux and you can log in, skip ahead to the next section.
-## Unbox the NXP Board
+You need a serial terminal to see the boot console and log in.
-Follow NXP's getting started instructions: [Getting Started with FRDM-IMX93](https://www.nxp.com/document/guide/getting-started-with-frdm-imx93:GS-FRDM-IMX93):
-* Stop when you complete section "1.6 Connect Power Supply"
+{{% notice macOS %}}
+On macOS as your host, you'll need the following set up before getting started:
-## Connect to the NXP Board
+- Install the [Silicon Labs USB-to-UART driver](https://www.silabs.com/developer-tools/usb-to-uart-bridge-vcp-drivers?tab=downloads)
+- Install [picocom](https://github.com/npat-efault/picocom)
+ ```bash
+ brew install picocom
+ ```
+{{% /notice %}}
-Prior to logging in to the NXP board, you need to configure `picocom`. This allows you to connect to the board using a USB cable.
+Connect the board's **DEBUG** USB-C connector to your host machine.
-{{% notice macOS %}}
+Find the board's serial device:
-1. Install the Silicon Labs driver:
+ ```bash { output_lines = "2-5" }
+ ls /dev/tty.*
+ ...
+ /dev/tty.usbmodem
+ /dev/tty.usbmodem
+ ...
+ ```
- https://www.silabs.com/developer-tools/usb-to-uart-bridge-vcp-drivers?tab=downloads
-
-2. Install [picocom](https://github.com/npat-efault/picocom):
- ```bash
- brew install picocom
+ The exact device names vary per board. Look for entries containing `usbmodem`.
+
+Open a serial connection using the first `usbmodem` device:
+
+ ```bash { output_lines = "2-4" }
+ sudo picocom -b 115200 /dev/tty.usbmodem
+ picocom v3.1
+ ...
+ Terminal ready
```
+
+Connect the board's **POWER** USB-C connector to your host machine. You should see a red and a white LED on the board.
+
+Wait for the boot log to scroll past in the picocom window. When it finishes, you'll see a login prompt:
+
+ ```output
+ NXP i.MX Release Distro 6.6-scarthgap imx93frdm ttyLP0
+
+ imx93frdm login:
+ ```
+
+{{% notice Tip %}}
+If you miss the login prompt, hold the board's power button for two seconds to power off, then press it again to reboot.
{{% /notice %}}
-1. Establish a USB-to-UART (serial) connection:
- - Connect the board's "DEBUG" USB-C connector to your laptop
- - Find the NXP board's USB connections in your computer's terminal:
- ```bash { output_lines = "2-7" }
- ls /dev/tty.*
- # output lines
- ...
- /dev/tty.debug-console
- /dev/tty.usbmodem56D70442811
- /dev/tty.usbmodem56D70442813
- ...
- ```
-
- - Connect to the NXP board:
- ```bash { output_lines = "2-5" }
- sudo picocom -b 115200 /dev/tty.usbmodem56D70442811
- # output lines
- picocom v3.1
- ...
- Terminal ready
- ```
-2. Log in to the NXP board:
- - Connect the board's "POWER" USB-C connector to your laptop
- - At this point you should see one red and one white light on the board
- - Next you should see scrolling text in your `picocom` window, as the NXP board boots
- - The last line should say `login:`
- ```bash { output_lines = "1-9" }
- # output lines
- ...
- [ OK ] Reached target Graphical Interface.
- Starting Record Runlevel Change in UTMP...
- [ OK ] Finished Record Runlevel Change in UTMP.
-
- NXP i.MX Release Distro 6.6-scarthgap imx93frdm ttyLP0
-
- imx93frdm login:
- ```
-
-3. [Optional] Troubleshooting:
- - Restart the NXP board, to get to the `login:` prompt:
- - Hold the NXP board's power button for 2-seconds, until the lights turn off
- - Hold the NXP board's power button again for 2-seconds, until the lights turn on
-
-## [Optional] Run the Built-In NXP Demos
-* Connect the NXP board to a monitor via HDMI
-* Connect a mouse to the NXP board's USB-A port
+## Run the built-in NXP demos (optional)
+
+Connect the board to a monitor via HDMI and plug a mouse into the board's USB-A port. NXP includes several ML demos that run out of the box.

+
+## What you've learned and what's next
+
+In this section you've:
+
+- Connected to the board via serial console
+- Booted the NXP FRDM i.MX 93 board and confirmed Linux is running
+- Verified you can access the login prompt
+
+With the board running and Linux accessible, the next step is setting up the build environment for ExecuTorch.
diff --git a/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/4-environment-setup.md b/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/4-environment-setup.md
index 47aa4bc6d2..2ff692e783 100644
--- a/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/4-environment-setup.md
+++ b/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/4-environment-setup.md
@@ -1,6 +1,6 @@
---
# User change
-title: "Enviroment Setup"
+title: "Set up the ExecuTorch build environment"
weight: 5 # 1 is first, 2 is second, etc.
@@ -8,30 +8,33 @@ weight: 5 # 1 is first, 2 is second, etc.
layout: "learningpathall"
---
-For detailed instructions on setting up your ExecuTorch build environment, please see the official PyTorch documentation: [Environment Setup](https://docs.pytorch.org/executorch/stable/using-executorch-building-from-source.html#environment-setup)
+## For macOS: build ExecuTorch in a Docker container
-{{% notice macOS %}}
+On macOS, it’s easiest to build ExecuTorch in an Ubuntu container. This keeps your toolchain consistent with the rest of the Learning Path and avoids gaps in macOS-native cross-compilers (for example, the Arm GNU Toolchain doesn’t provide an “AArch64 GNU/Linux target” for macOS).
-Use a Docker container to build ExecuTorch:
-* The [Arm GNU Toolchain](https://developer.arm.com/Tools%20and%20Software/GNU%20Toolchain) currently does not have a "AArch64 GNU/Linux target" for macOS
-* You will use this toolchain's `gcc-aarch64-linux-gnu` and `g++-aarch64-linux-gnu` compilers on the next page of this learning path
+This container isn’t part of the runtime deployment. It’s a build environment that produces the artifacts you later move onto the FRDM i.MX 93:
-1. Install and start [Docker Desktop](https://www.docker.com/)
+- Prebuilt ExecuTorch libraries you link into Cortex-M33 firmware
+- `.pte` model files compiled for Ethos-U65
-2. Create a directory for building a `ubuntu-24-container`:
+Keeping this step reproducible helps you focus on the actual bring-up milestone: booting custom firmware on Cortex-M33 and using Ethos-U65 for inference.
+
+Start by installing and launching [Docker Desktop](https://www.docker.com/).
+
+Next, create a working directory for your container build:
```bash
mkdir ubuntu-24-container
```
-3. Create a `dockerfile` in the `ubuntu-24-container` directory:
+Now create a `Dockerfile` and switch into the directory:
```bash
cd ubuntu-24-container
touch Dockerfile
```
-4. Add the following commands to your `Dockerfile`:
+Add the following content to your `Dockerfile` to install a few basic tools in the image:
```dockerfile
FROM ubuntu:24.04
@@ -44,51 +47,54 @@ Use a Docker container to build ExecuTorch:
curl vim git
```
- The `ubuntu:24.04` container image includes Python 3.12, which will be used for this learning path.
+ The `ubuntu:24.04` container image includes Python 3.12, which you use later in this Learning Path.
-5. Create the `ubuntu-24-container`:
+ Build the container image:
```bash
docker build -t ubuntu-24-container .
```
-6. Run the `ubuntu-24-container`:
+Run the container and open an interactive shell:
```bash { output_lines = "2-3" }
docker run -it ubuntu-24-container /bin/bash
# Output will be the Docker container prompt
- ubuntu@:/#
+ root@:/#
```
- [OPTIONAL] If you already have an existing container:
- - Get the existing CONTAINER ID:
- ```bash { output_lines = "2-4" }
- docker ps -a
- # Output
- CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
- 0123456789ab ubuntu-24-container "/bin/bash" 27 hours ago Exited (255) 59 minutes ago. container_name
- ```
- - Log in to the existing container:
- ```bash
- docker start 0123456789ab
- docker exec -it 0123456789ab /bin/bash
- ```
+If you already created a container before, reuse it instead of creating a new one.
-{{% /notice %}}
+First, list your containers to find the container ID:
+
+```bash { output_lines = "2-4" }
+docker ps -a
+# Output
+CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
+0123456789ab ubuntu-24-container "/bin/bash" 27 hours ago Exited (255) 59 minutes ago. container_name
+```
-After logging in to the Docker container, navigate to the ubuntu home directory:
+Then start the container and attach a shell:
```bash
-cd /home/ubuntu
+docker start 0123456789ab
+docker exec -it 0123456789ab /bin/bash
```
-1. **Install dependencies:**
+Once you’re inside the container, move to your home directory:
+
+```bash
+cd /root
+```
- ```bash { output_lines = "1" }
- # Use "sudo apt ..." if you are not logged in as root
+## Install dependencies
+
+Install the packages ExecuTorch needs to build. If you’re not running as root, prefix the commands with `sudo`.
+
+ ```bash
apt update
apt install -y \
- python-is-python3 python3.12-dev python3.12-venv \
+ python-is-python3 python3.12-dev python3.12-venv python3-pip \
gcc g++ \
make cmake \
build-essential \
@@ -96,25 +102,43 @@ cd /home/ubuntu
libboost-all-dev
```
-2. Clone ExecuTorch:
+## Create a Python virtual environment
+
+Create and activate a virtual environment so your Python packages stay scoped to this project:
+ ```bash { output_lines = "3" }
+ python3 -m venv .venv
+ source .venv/bin/activate
+ ```
+
+## Get the ExecuTorch source code
+
+Clone ExecuTorch and initialize its submodules:
+
```bash
git clone https://github.com/pytorch/executorch.git
cd executorch
git fetch --tags
- git checkout v1.0.0
+ git checkout c70a742344e30158dc370d7d35d60ed07660fee0
git submodule sync
git submodule update --init --recursive
```
-3. Create a Virtual Environment:
- ```bash { output_lines = "3" }
- python3 -m venv .venv
- source .venv/bin/activate
- # Your prompt will prefix with (.venv)
- ```
+{{% notice EthosUCompileSpec parameters %}}
+The `EthosUCompileSpec` parameters used in this guide:
-4. Configure your git username and email globally:
- ```bash
- git config --global user.email "you@example.com"
- git config --global user.name "Your Name"
- ```
+| Parameter | Value | Description |
+| ----------------- | --------------------- | ---------------------------------------------- |
+| `target` | `ethos-u65-256` | Targets the Ethos-U65 with 256 MAC units |
+| `system_config` | `Ethos_U65_High_End` | High-end system configuration for optimal performance |
+| `memory_mode` | `Shared_Sram` | Uses shared SRAM memory mode |
+{{% /notice %}}
+
+## What you've learned and what's next
+
+In this section you've:
+
+- Set up an Ubuntu 24.04 Docker container for building ExecuTorch (macOS users)
+- Installed required dependencies and created a Python virtual environment
+- Cloned the ExecuTorch repository and checked out the correct version
+
+With your build environment configured and the ExecuTorch source checked out, the next step is building and installing the ExecuTorch package.
diff --git a/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/6-build-executorch.md b/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/6-build-executorch.md
index 48ef3db142..dd20c9a86e 100644
--- a/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/6-build-executorch.md
+++ b/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/6-build-executorch.md
@@ -1,6 +1,6 @@
---
# User change
-title: "Build ExecuTorch"
+title: "Build and install ExecuTorch"
weight: 7 # 1 is first, 2 is second, etc.
@@ -8,152 +8,78 @@ weight: 7 # 1 is first, 2 is second, etc.
layout: "learningpathall"
---
-For a full tutorial on building ExecuTorch please see learning path [Introduction to TinyML on Arm using PyTorch and ExecuTorch](/learning-paths/embedded-and-microcontrollers/introduction-to-tinyml-on-arm/).
+## Overview
-## Install ExecuTorch
+With the ExecuTorch source checked out and your virtual environment active, you can now build ExecuTorch and set up the Arm toolchain for Ethos-U cross-compilation.
+
+For a full tutorial on building ExecuTorch, see the Learning Path [Introduction to TinyML on Arm using PyTorch and ExecuTorch](/learning-paths/embedded-and-microcontrollers/introduction-to-tinyml-on-arm/).
-1. Upgrade pip and install build tools:
+## Install ExecuTorch
- ```bash
- pip install --upgrade pip setuptools wheel
- ```
+Upgrade pip and install build tools:
-2. Build and install the `executorch` pip package:
+```bash
+pip install --upgrade pip setuptools wheel
+```
- ```bash
- ./install_executorch.sh
- ```
+Build and install the `executorch` pip package:
-## Build troubleshooting
+```bash
+./install_executorch.sh
+```
-If the `install_executorch.sh` script fails, manually install the dependencies using the following commands:
+After the installation finishes, verify the package is available:
```bash
-pip install torch torchvision
-pip install --no-build-isolation .
-pip install --no-build-isolation third-party/ao
+pip list | grep executorch
```
## Set up the Arm toolchain
Initialize the Arm-specific environment and accept the EULA:
-1. Run the setup script:
-
- ```bash
- ./examples/arm/setup.sh --i-agree-to-the-contained-eula
- ```
-
-2. Source the environment variables:
-
- ```bash
- source ./examples/arm/arm-scratch/setup_path.sh
- ```
-
-## Apply the Ethos-U65 patch
-
-As of this writing, ExecuTorch does not officially support the Ethos-U65. You must patch the `compile_spec.py` file to enable U65 compilation targets within the build system.
-
-1. Create and apply the patch by running the following command block:
-
- ```bash
- cat > /tmp/patch_u65.py << 'PATCH'
- import os
-
- # Locate the file within the virtual environment
- filepath = "/root/executorch/.venv/lib/python3.12/site-packages/executorch/backends/arm/ethosu/compile_spec.py"
-
- with open(filepath, 'r') as f:
- content = f.read()
-
- # 1. Inject U65 Configuration Support
- old_code = ' elif "ethos-u85" in target_lower:'
- new_code = ''' elif "ethos-u65" in target_lower:
- self.tosa_spec = TosaSpecification.create_from_string("TOSA-1.0+INT")
- default_system_config = "Ethos_U65_High_End"
- default_memory_mode = "Shared_Sram"
- elif "ethos-u85" in target_lower:'''
-
- content = content.replace(old_code, new_code)
-
- # 2. Inject U65 Compile Spec Builder Logic
- old_check = ''' if "u55" in target_lower:
- return CompileSpecBuilder(
- TosaSpecification.create_from_string("TOSA-0.80+BI+u55")
- )
- if "u85" in self.target:'''
-
- new_check = ''' if "u55" in target_lower:
- return CompileSpecBuilder(
- TosaSpecification.create_from_string("TOSA-0.80+BI+u55")
- )
- if "u65" in target_lower:
- return CompileSpecBuilder(
- TosaSpecification.create_from_string("TOSA-1.0+INT")
- )
- if "u85" in self.target:'''
-
- content = content.replace(old_check, new_check)
-
- with open(filepath, 'w') as f:
- f.write(content)
-
- print(f"Patched {filepath}! U65 support added.")
- PATCH
-
- python3 /tmp/patch_u65.py
- ```
-
-2. Verify the patch by running:
-
- ```bash
- python3 -c "from executorch.backends.arm.ethosu import EthosUCompileSpec; EthosUCompileSpec(target='ethos-u65-256'); print('U65 OK')"
- ```
-
- If successful, you see the output `U65 OK`.
-
-## Additional troubleshooting
-
-1. Allocate at least 4 GB of swap space:
+```bash
+./examples/arm/setup.sh --i-agree-to-the-contained-eula
+```
- ```bash
- fallocate -l 4G /swapfile
- chmod 600 /swapfile
- mkswap /swapfile
- swapon /swapfile
- ```
+Source the environment variables:
- Deallocate the swap space after you complete this learning path (optional):
+```bash
+source ./examples/arm/ethos-u-scratch/setup_path.sh
+```
- ```bash
- swapoff /swapfile
- rm /swapfile
- ```
+{{% notice Troubleshooting %}}
+If `install_executorch.sh` fails, install the dependencies manually:
- {{% notice macOS %}}
+```bash
+pip install torch torchvision
+pip install --no-build-isolation .
+pip install --no-build-isolation third-party/ao
+```
- Increase the "Swap" space in Docker settings to 4 GB:
- 
+If `buck2` hangs during the build:
- {{% /notice %}}
+```bash
+ps aux | grep buck
+pkill -f buck
+```
-2. Kill the `buck2` process if it hangs:
+To clean the build environment and start fresh:
- ```bash
- ps aux | grep buck
- pkill -f buck
- ```
+```bash
+./install_executorch.sh --clean
+git submodule sync
+git submodule update --init --recursive
+./install_executorch.sh
+```
+{{% /notice %}}
-3. Clean the build environment and reinitialize all submodules:
+## What you've learned and what's next
- ```bash
- ./install_executorch.sh --clean
- git submodule sync
- git submodule update --init --recursive
- ```
+In this section you've:
-4. Try `install_executorch.sh` again:
+- Installed the ExecuTorch package and verified its availability
+- Set up the Arm toolchain with Ethos-U support
+- Configured the environment for cross-compiling to Ethos-U65
- ```bash
- ./install_executorch.sh
- ```
\ No newline at end of file
+With ExecuTorch installed and the Arm toolchain configured, you can now compile `.pte` model files targeting the Ethos-U65 NPU.
diff --git a/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/7-build-executorch-pte.md b/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/7-build-executorch-pte.md
index df634fed95..0802662121 100644
--- a/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/7-build-executorch-pte.md
+++ b/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/7-build-executorch-pte.md
@@ -1,6 +1,6 @@
---
# User change
-title: "Build the ExecuTorch .pte"
+title: "Build ExecuTorch models for Ethos-U65"
weight: 8 # 1 is first, 2 is second, etc.
@@ -8,9 +8,10 @@ weight: 8 # 1 is first, 2 is second, etc.
layout: "learningpathall"
---
-Embedded systems like the NXP board require two ExecuTorch runtime components: a `.pte` file and an `executor_runner` file.
+## ExecuTorch deployment components
+
+On the FRDM i.MX93, you deploy ExecuTorch as two cooperating artifacts:
-**ExecuTorch Runtime Files for Embedded Systems**
|Component|Role in Deployment|What It Contains|Why It's Required|
|---------|------------------|----------------|-----------------|
|**`.pte file`** (e.g., `mobilenetv2_u65.pte`)|The model itself, exported from ExecuTorch|Serialized and quantized operator graph + weights + metadata|Provides the neural network to be executed|
@@ -81,7 +82,13 @@ Embedded systems like the NXP board require two ExecuTorch runtime components: a
## Build the ExecuTorch .pte for Ethos-U65
-This section shows you how to build `.pte` files for the Ethos-U65 NPU. You will compile two models: a simple addition model to verify the setup, and MobileNet V2 for real-world inference.
+This section shows you how to build `.pte` files for the Ethos-U65 NPU.
+You compile two models:
+
+- A simple add model to validate the toolchain and U65 compile spec
+- MobileNet V2 to demonstrate a realistic workload and confirm NPU operator coverage
+
+The `.pte` file is the handoff point between “host build time” and “device run time”. In the later deployment steps, Linux is responsible for loading firmware, and the Cortex-M33 firmware is responsible for loading and executing the `.pte` using the Ethos-U delegate.
### Compile a simple add model
@@ -155,6 +162,10 @@ This script creates a basic addition model, quantizes it, and compiles it for th
This script compiles the [MobileNet V2](https://pytorch.org/hub/pytorch_vision_mobilenet_v2/) computer vision model for the Ethos-U65. MobileNet V2 is a convolutional neural network (CNN) used for image classification and object detection.
+{{% notice Note %}}
+MobileNet V2 uses `export_for_training()` instead of `export()` because it contains batch normalization layers that must be traced in training mode before quantization can fuse them. The simple add model above has no such layers, so `export()` is sufficient.
+{{% /notice %}}
+
1. Create the MobileNet V2 compilation script:
```bash
@@ -204,7 +215,7 @@ This script compiles the [MobileNet V2](https://pytorch.org/hub/pytorch_vision_m
python3 compile_mv2_u65.py
```
- If successful, you see the Vela compiler summary indicating 100% NPU utilization:
+ If successful, you see the Vela compiler summary indicating 100% NPU utilization. This is an important bring-up signal because it shows the graph was lowered onto the Ethos-U path rather than falling back to CPU execution.
```output
Network summary for out
@@ -225,12 +236,45 @@ This script compiles the [MobileNet V2](https://pytorch.org/hub/pytorch_vision_m
ls -la mobilenetv2_u65.pte
```
-{{% notice Note %}}
-The `EthosUCompileSpec` parameters used in this guide:
+### Copy models to the SD card
+
+The `executor_runner` firmware loads `.pte` models from a fixed DDR address (`0xC0000000`). The model must be written to DDR before Linux boots, using U-Boot.
+
+1. Copy the `.pte` files out of the Docker container:
-| Parameter | Value | Description |
-| ----------------- | --------------------- | ---------------------------------------------- |
-| `target` | `ethos-u65-256` | Targets the Ethos-U65 with 256 MAC units |
-| `system_config` | `Ethos_U65_High_End` | High-end system configuration for optimal performance |
-| `memory_mode` | `Shared_Sram` | Uses shared SRAM memory mode |
+ ```bash
+ docker cp :/root/executorch/model_u65.pte .
+ docker cp :/root/executorch/mobilenetv2_u65.pte .
+ ```
+
+ Replace `` with your Docker container ID.
+
+2. Write the `.pte` files to the first partition of the SD card so U-Boot can load them:
+
+{{< tabpane code=false >}}
+{{< tab header="Linux" >}}
+sudo mount /dev/sdX1 /mnt
+sudo cp model_u65.pte mobilenetv2_u65.pte /mnt/
+sudo umount /mnt
+{{< /tab >}}
+{{< tab header="macOS" >}}
+cp model_u65.pte mobilenetv2_u65.pte /Volumes//
+{{< /tab >}}
+{{< /tabpane >}}
+
+{{% notice Parameter placeholders %}}
+For the commands above:
+- Replace `/dev/sdX1` with your SD card's first partition.
+- Replace `` with the mounted volume name.
{{% /notice %}}
+
+## What you've learned and what's next
+
+In this section you've:
+
+- Compiled a simple add model to verify the Ethos-U65 toolchain setup
+- Built a MobileNet V2 `.pte` file with 100% NPU operator coverage
+- Staged the models on your SD card for deployment
+
+With both `.pte` files ready, you can now build the Cortex-M33 `executor_runner` firmware that will load and execute them.
+
diff --git a/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/9-build-executorch-runner-for-cm33.md b/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/9-build-executorch-runner-for-cm33.md
index de7f900cae..961e90eecc 100644
--- a/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/9-build-executorch-runner-for-cm33.md
+++ b/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/9-build-executorch-runner-for-cm33.md
@@ -1,63 +1,44 @@
---
-title: Build the executor_runner firmware
+title: Build Cortex-M33 firmware for ExecuTorch
weight: 10
-
-### FIXED, DO NOT MODIFY
+# FIXED, DO NOT MODIFY
layout: learningpathall
---
+## ExecuTorch deployment components
+
+In this section, you build the Cortex-M33 `executor_runner` firmware. You then deploy it to the FRDM i.MX 93 board.
+
+This is a key milestone. You're proving you can control the real-time ML runtime for Ethos-U65 workloads.
+
+On i.MX 93, Linux runs on the application cores, but the real-time ML runtime that talks to Ethos-U65 runs as **firmware on Cortex-M33**.
+When you can build and boot your own `executor_runner`, you've proven that the microcontroller side of the system is under your control and ready to host ML workloads.
+
+The project includes prebuilt ExecuTorch static libraries in the repository, so this section focuses on pulling the runner project, applying the required SDK patches, wiring it up to your toolchain, and building it.
+
+In architectural terms:
+
+- The application core (Linux) loads the firmware image and manages its lifecycle
+- The Cortex-M33 firmware owns model execution and the delegate path into Ethos-U65
+- The `.pte` model remains a separate artifact that you update independently
+
## Set up MCUXpresso for VS Code
-Install the MCUXpresso extension in VS Code:
+Install the [MCUXpresso extension for VS Code](https://marketplace.visualstudio.com/items?itemName=NXPSemiconductors.mcuxpresso).
-{{< tabpane code=false >}}
-{{< tab header="Windows/Linux" >}}
-1. Open VS Code and press `Ctrl+Shift+X` to open Extensions
-2. Search for "MCUXpresso for VS Code"
-3. Click **Install** on the NXP extension
-{{< /tab >}}
-{{< tab header="macOS" >}}
-1. Open VS Code and press `Cmd+Shift+X` to open Extensions
-2. Search for "MCUXpresso for VS Code"
-3. Click **Install** on the NXP extension
-{{< /tab >}}
-{{< /tabpane >}}
+In VS Code, open **Extensions**, search for "MCUXpresso", and install the extension published by NXP.
-Configure the ARM toolchain path:
+## Install MCUXpresso SDK and Arm toolchain
-{{< tabpane code=false >}}
-{{< tab header="Windows/Linux" >}}
-1. Open Settings with `Ctrl+,`
-2. Search for **MCUXpresso: Toolchain**
-3. Set the toolchain path to: `/opt/arm-gnu-toolchain-14.2.rel1-x86_64-arm-none-eabi/bin`
-{{< /tab >}}
-{{< tab header="macOS" >}}
-1. Open Settings with `Cmd+,`
-2. Search for **MCUXpresso: Toolchain**
-3. Set the toolchain path to: `/opt/arm-gnu-toolchain-14.2.rel1-x86_64-arm-none-eabi/bin`
-{{< /tab >}}
-{{< /tabpane >}}
+Use the MCUXpresso Installer to install the SDK and toolchain components.
-Install the MCUXpresso SDK for FRDM-MIMX93:
+In VS Code, open the Command Palette and run **MCUXpresso for VS Code: Open MCUXpresso Installer**. Select the following items, then select **Install**:
-{{< tabpane code=false >}}
-{{< tab header="Windows/Linux" >}}
-1. Open Command Palette: `Ctrl+Shift+P`
-2. Type: **MCUXpresso: Install MCUXpresso SDK**
-3. Search for "FRDM-MIMX93" or select **MCIMX93-EVK**
-4. Select the latest SDK and click **Install**
-{{< /tab >}}
-{{< tab header="macOS" >}}
-1. Open Command Palette: `Cmd+Shift+P`
-2. Type: **MCUXpresso: Install MCUXpresso SDK**
-3. Search for "FRDM-MIMX93" or select **MCIMX93-EVK**
-4. Select the latest SDK and click **Install**
-{{< /tab >}}
-{{< /tabpane >}}
+- **MCUXpresso SDK Developer** (under Software Kits)
+- **Arm GNU Toolchain (Latest)** (under Arm components)
+- **Standalone Toolchain Add-ons (Latest)** (under Arm components)
-{{% notice Note %}}
-If the FRDM-MIMX93 development board is not listed in the current MCUXpresso SDK catalog, you can alternatively select **MCIMX93-EVK** as they share the same i.MX93 SoC with Cortex-M33 core architecture. The SDK compatibility ensures seamless development across both platforms.
-{{% /notice %}}
+
## Clone the executor_runner repository
@@ -68,130 +49,159 @@ git clone https://github.com/fidel-makatia/Executorch_runner_cm33.git
cd Executorch_runner_cm33
```
-The repository contains the complete runtime source code and build configuration for Cortex-M33.
-
-## Copy ExecuTorch libraries
+The repository contains the complete runtime source code, pre-built ExecuTorch libraries, and build configuration for Cortex-M33 with Ethos-U65 NPU support.
-The executor_runner requires prebuilt ExecuTorch libraries with Ethos-U NPU support from your Docker container.
+## Configure the project for FRDM-MIMX93
-Find your ExecuTorch build container:
+Open the project in VS Code:
-```bash { output_lines = "2-3" }
-docker ps -a
-CONTAINER ID IMAGE COMMAND CREATED STATUS
-abc123def456 executorch "/bin/bash" 2 hours ago Exited
+```bash
+code .
```
-Copy the libraries:
+If the MCUXpresso extension doesn't automatically pick up the project, import it:
-```bash
-docker cp abc123def456:/home/ubuntu/executorch/cmake-out/lib/. ./executorch/lib/
-docker cp abc123def456:/home/ubuntu/executorch/. ./executorch/include/executorch/
-```
+1. Open the Command Palette
+2. Run **MCUXpresso for VS Code: Import Project**
+3. Select the `Executorch_runner_cm33` folder
+4. When prompted, choose **Arm GNU Toolchain**
-Replace `abc123def456` with your actual container ID.
+
-{{% notice Note %}}
-In some Docker containers, the `cmake-out` folder might not exist. If you don't see the libraries, run the following command to build them:
+## Set environment variables
-```bash
-./examples/arm/run.sh --build-only
-```
+Set three environment variables so the build can find your toolchain, your SDK, and the MCUXpresso Python environment. These must be set before building or running the SDK patch script.
-The libraries will be generated in `arm_test/cmake-out`.
-{{% /notice %}}
+Do this once for your user account, then restart VS Code so the changes take effect.
-Verify the libraries:
+### Required variables
-```bash { output_lines = "2-5" }
-ls -lh executorch/lib/
--rw-r--r-- 1 user user 2.1M libexecutorch.a
--rw-r--r-- 1 user user 856K libexecutorch_core.a
--rw-r--r-- 1 user user 1.3M libexecutorch_delegate_ethos_u.a
-```
+| Variable | Description |
+|----------|-------------|
+| `ARMGCC_DIR` | Path to the Arm GCC toolchain root, i.e. a directory starting with `arm-gnu-toolchain-14.2.rel1*` |
+| `SdkRootDirPath` | Path to the folder that contains the `mcuxsdk/` subdirectory |
+| `MCUX_VENV_PATH` | Path to the MCUXpresso Python venv executables |
-## Configure the project for FRDM-MIMX93
+Find the installed toolchain directory name:
-Open the project in VS Code:
+{{< tabpane code=false >}}
+{{< tab header="Linux/macOS" >}}
+ls ~/.mcuxpressotools/arm-gnu-toolchain-*
+{{< /tab >}}
+{{< tab header="Windows (PowerShell)" >}}
+dir $env:USERPROFILE\.mcuxpressotools\arm-gnu-toolchain-*
+{{< /tab >}}
+{{< /tabpane >}}
-```bash
-code .
-```
+Use the directory name from the output for the `ARMGCC_DIR` variable below. The name looks like `arm-gnu-toolchain-14.2.rel1-darwin-arm64-arm-none-eabi` (the version number may differ).
-Initialize the MCUXpresso project:
+{{< tabpane code=false >}}
+{{< tab header="Linux/macOS" >}}
+export ARMGCC_DIR="$HOME/.mcuxpressotools/"
+export SdkRootDirPath="$HOME/mcuxsdk_root"
+export MCUX_VENV_PATH="$HOME/.mcuxpressotools/.venv/bin"
+{{< /tab >}}
+{{< tab header="Windows (PowerShell)" >}}
+[Environment]::SetEnvironmentVariable("ARMGCC_DIR", "$env:USERPROFILE\.mcuxpressotools\", "User")
+[Environment]::SetEnvironmentVariable("SdkRootDirPath", "$env:USERPROFILE\mcuxsdk_root", "User")
+[Environment]::SetEnvironmentVariable("MCUX_VENV_PATH", "$env:USERPROFILE\.mcuxpressotools\.venv\Scripts", "User")
+{{< /tab >}}
+{{< /tabpane >}}
+
+These quick checks catch most path mistakes before you start debugging build errors:
{{< tabpane code=false >}}
-{{< tab header="Windows/Linux" >}}
-1. Press `Ctrl+Shift+P` to open Command Palette
-2. Type: **MCUXpresso: Import Repository**
-3. Select the current folder
-4. Choose **MIMX9352_cm33** as the target processor
+{{< tab header="Linux/macOS" >}}
+test -x "$ARMGCC_DIR/bin/arm-none-eabi-gcc" && echo "OK: toolchain" || echo "FAIL: ARMGCC_DIR"
+test -d "$SdkRootDirPath/mcuxsdk" && echo "OK: SDK" || echo "FAIL: SdkRootDirPath"
{{< /tab >}}
-{{< tab header="macOS" >}}
-1. Press `Cmd+Shift+P` to open Command Palette
-2. Type: **MCUXpresso: Import Repository**
-3. Select the current folder
-4. Choose **MIMX9352_cm33** as the target processor
+{{< tab header="Windows (PowerShell)" >}}
+if (Test-Path "$env:ARMGCC_DIR\bin\arm-none-eabi-gcc.exe") { "OK: toolchain" } else { "FAIL: ARMGCC_DIR" }
+if (Test-Path "$env:SdkRootDirPath\mcuxsdk") { "OK: SDK" } else { "FAIL: SdkRootDirPath" }
{{< /tab >}}
{{< /tabpane >}}
-VS Code generates the MCUXpresso configuration.
+## Apply required SDK patches
+
+The MCUXpresso SDK ships with a linker script and an Ethos-U driver log header that need two fixes before the firmware can run correctly. The patch script in the repository reads `SdkRootDirPath` to locate your SDK and applies both fixes automatically.
+
+Using a `bash` shell, navigate to the `Executorch_runner_cm33` directory and run:
+
+```bash
+./patches/apply_patches.sh
+```
+
+You should see output confirming each patch:
+
+```output
+=== ExecuTorch Runner SDK Patches ===
+
+[1/2] Linker script GOT fix: .../MIMX9352xxxxM_ram.ld
+ APPLIED: Added *(.got) and *(.got.plt) inside .data section.
+
+[2/2] Ethos-U driver log redirect: .../ethosu_log.h
+ APPLIED: LOG_ERR/LOG_WARN/LOG_INFO/LOG_DEBUG now write to remoteproc trace buffer.
-## Configure memory settings
+=== All patches applied successfully ===
+```
+
+The two patches and what they fix:
+
+| Patch | What it changes | What happens without it |
+|-------|----------------|------------------------|
+| **GOT initialization** | Adds `*(.got)` and `*(.got.plt)` inside the `.data` section of the SDK linker script so the startup code copies the Global Offset Table from flash to RAM | The GOT stays zeroed out after boot. Every C++ virtual call resolves to address zero, causing a **BUS FAULT** the first time the firmware calls `load_method` |
+| **NPU log redirect** | Replaces the Ethos-U driver's `LOG_ERR`/`LOG_INFO` macros so they write to the remoteproc trace buffer instead of the UART | NPU error messages go to UART only. When you read `trace0` from Linux, driver errors are invisible, making NPU failures difficult to diagnose |
+
+{{% notice Note %}}
+Run the patch script once after installing the SDK. If you run it again, the script detects that the patches are already in place and skips them. If you reinstall or update the SDK, run the script again.
+{{% /notice %}}
+
+## Pre-built ExecuTorch libraries
+
+The repository includes pre-built static libraries in `executorch/lib/`, cross-compiled for Cortex-M33 with size optimization (`-Os`, MinSizeRel):
+
+| Library | Size | Purpose |
+|---------|------|---------|
+| `libexecutorch.a` | 52KB | ExecuTorch runtime |
+| `libexecutorch_core.a` | 217KB | Core runtime (gc-sections removes unused code) |
+| `libexecutorch_delegate_ethos_u.a` | 19KB | Ethos-U NPU delegate backend |
+| `libquantized_ops_lib_selective.a` | 7KB | Registers only `quantize_per_tensor.out` and `dequantize_per_tensor.out` |
+| `libquantized_kernels.a` | 242KB | Kernel implementations (gc-sections removes unused code) |
+| `libkernels_util_all_deps.a` | 308KB | Kernel utilities (gc-sections removes unused code) |
+
+{{% notice Note %}}
+The selective quantized ops library registers only the two CPU operators needed at the NPU delegation boundary. The full `libquantized_ops_lib.a` registers all quantized operators and pulls in approximately 92KB of kernel code, which overflows the 128KB ITCM. If you rebuild the libraries from source, you must build this selective library separately.
+{{% /notice %}}
-The Cortex-M33 has 108KB of RAM. The default memory configuration allocates:
-- 16KB for the method allocator (activation tensors)
-- 8KB for the scratch allocator (temporary operations)
+## Understand the memory configuration
-These settings are in `CMakeLists.txt`:
+The Cortex-M33 has 128KB of ITCM (code) and 108KB of DTCM (data). The firmware also uses reserved DDR regions for the model and NPU working memory. The key settings are defined in `CMakeLists.txt`:
```cmake
target_compile_definitions(${MCUX_SDK_PROJECT_NAME} PRIVATE
- ET_ARM_BAREMETAL_METHOD_ALLOCATOR_POOL_SIZE=0x4000 # 16KB
- ET_ARM_BAREMETAL_SCRATCH_TEMP_ALLOCATOR_POOL_SIZE=0x2000 # 8KB
- ET_MODEL_PTE_ADDR=0x80100000 # DDR address for model
+ ET_ARM_BAREMETAL_METHOD_ALLOCATOR_POOL_SIZE=0x6000 # 24KB method allocator
+ ET_ARM_BAREMETAL_SCRATCH_TEMP_ALLOCATOR_POOL_SIZE=0x200000 # 2MB scratch allocator
+ ET_MODEL_PTE_ADDR=0xC0000000 # DDR address where U-Boot loads the .pte model
)
```
+| Setting | Value | Description |
+|---------|-------|-------------|
+| Method allocator | 24KB (`0x6000`) | Method metadata and small model activations |
+| Scratch allocator | 2MB (`0x200000`) | NPU scratch buffer (MobileNet V2 needs approximately 1.5MB) |
+| Model address | `0xC0000000` | Start of the 4MB reserved DDR region |
+
{{% notice Note %}}
-If you see "region RAM overflowed" errors during build, reduce these pool sizes. For example, change to 0x2000 (8KB) and 0x1000 (4KB) respectively.
+The i.MX93 device tree reserves two DDR regions: `model@c0000000` (4MB for the `.pte` model) and `ethosu_region@A8000000` (128MB for NPU working memory). The NPU scratch buffer is placed at `0xA8000000` and planned buffers for large models at `0xA8200000`, both inside the 128MB ethosu_region. The Ethos-U65 accesses memory via the AXI bus and cannot reach the CM33's tightly-coupled DTCM, so all NPU buffers must be in DDR.
{{% /notice %}}
## Build the firmware
-Configure the build system:
-
-{{< tabpane code=false >}}
-{{< tab header="Windows/Linux" >}}
-1. Press `Ctrl+Shift+P`
-2. Type: **CMake: Configure**
-3. Select **ARM GCC** as the kit
-4. Choose **Debug** or **Release**
-{{< /tab >}}
-{{< tab header="macOS" >}}
-1. Press `Cmd+Shift+P`
-2. Type: **CMake: Configure**
-3. Select **ARM GCC** as the kit
-4. Choose **Debug** or **Release**
-{{< /tab >}}
-{{< /tabpane >}}
-
-Build the project:
-
-Press `F7` or:
+Build the project from VS Code. In the left sidebar, open **Explorer**, then in the MCUXpresso **Projects** view select the build icon next to `executorch_runner_cm33`.
-{{< tabpane code=false >}}
-{{< tab header="Windows/Linux" >}}
-1. Press `Ctrl+Shift+P`
-2. Type: **CMake: Build**
-{{< /tab >}}
-{{< tab header="macOS" >}}
-1. Press `Cmd+Shift+P`
-2. Type: **CMake: Build**
-{{< /tab >}}
-{{< /tabpane >}}
+
-Watch the build output:
+The build output shows the progress:
```output
[build] Scanning dependencies of target executorch_runner_cm33.elf
@@ -202,50 +212,47 @@ Watch the build output:
[build] Build finished with exit code 0
```
-Verify the build succeeded:
+Verify the memory usage to ensure the firmware fits in the Cortex-M33:
-```bash { output_lines = "2" }
-ls -lh build/executorch_runner_cm33.elf
--rwxr-xr-x 1 user user 601K executorch_runner_cm33.elf
-```
-
-Check memory usage to ensure it fits in the Cortex-M33:
-
-```bash { output_lines = "2-3" }
-arm-none-eabi-size build/executorch_runner_cm33.elf
- text data bss dec hex filename
- 52408 724 50472 103604 19494 executorch_runner_cm33.elf
+```output
+Memory region Used Size Region Size %age Used
+ m_interrupts: 1140 B 1144 B 99.65%
+ m_text: 103476 B 129928 B 79.64%
+ m_data: 61984 B 108 KB 56.05%
```
-The total RAM usage (data + bss) is approximately 51KB, well within the 108KB limit.
+The text section uses approximately 80% of the 128KB ITCM, and data uses approximately 56% of the 108KB DTCM.
-## Troubleshooting
+You now have everything you need to deploy the `.elf` binary on your NXP board.
-**ARM toolchain not found:**
+{{% notice Troubleshooting %}}
+**SDK patches not applied:**
-Add the toolchain to your PATH:
+If you see a BUS FAULT during `load_method` or vtable corruption errors, the GOT linker script patch has not been applied. Run:
```bash
-export PATH=/opt/arm-gnu-toolchain-14.2.rel1-x86_64-arm-none-eabi/bin:$PATH
+./patches/apply_patches.sh
```
-**Cannot find ExecuTorch libraries:**
+Then rebuild the project.
-Verify the libraries were copied correctly:
+**Region `m_text` overflowed:**
-```bash
-ls executorch/lib/libexecutorch*.a
-```
+The 128KB ITCM is nearly full. Verify that `CMakeLists.txt` links `libquantized_ops_lib_selective.a` (not the full `libquantized_ops_lib.a`). The selective library registers only the two operators needed for NPU delegation.
-If missing, re-copy from the Docker container.
+**`resolve_operator` error for `quantized_decomposed::*`:**
-**Region RAM overflowed:**
+The quantized operator kernels are not linked. Verify that `CMakeLists.txt` links `libquantized_ops_lib_selective.a` with `--whole-archive` and that `libquantized_kernels.a` and `libkernels_util_all_deps.a` are also listed.
-Edit `CMakeLists.txt` and reduce the memory pool sizes:
+{{% /notice %}}
-```cmake
-ET_ARM_BAREMETAL_METHOD_ALLOCATOR_POOL_SIZE=0x2000 # 8KB
-ET_ARM_BAREMETAL_SCRATCH_TEMP_ALLOCATOR_POOL_SIZE=0x1000 # 4KB
-```
+## What you've learned and what's next
+
+In this section you've:
+
+- Set up MCUXpresso for VS Code with the SDK and Arm toolchain
+- Cloned the executor_runner project with prebuilt ExecuTorch libraries
+- Applied critical SDK patches for GOT initialization and NPU logging
+- Built the Cortex-M33 firmware and verified it fits within memory constraints
-Then rebuild with `F7`.
\ No newline at end of file
+With the firmware binary built and its memory usage verified, you're ready to deploy it to the FRDM i.MX 93 and run your first inference.
\ No newline at end of file
diff --git a/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/_index.md b/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/_index.md
index 58a799b2ba..2c4c653c0b 100644
--- a/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/_index.md
+++ b/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/_index.md
@@ -1,27 +1,24 @@
---
-title: Observing Ethos-U on a Physical Device, Built on Arm
+title: Deploy ExecuTorch firmware on NXP FRDM i.MX 93 for Ethos-U65 acceleration
-draft: true
-cascade:
- draft: true
-
minutes_to_complete: 120
-who_is_this_for: This is an introductory topic for developers and data scientists new to Tiny Machine Learning (TinyML), who want to observe ExecuTorch performance on a physical device.
+who_is_this_for: This is an introductory topic for developers and data scientists new to TinyML who want to observe ExecuTorch performance on a physical device.
learning_objectives:
- - Identify suitable physical Arm-based devices for TinyML applications.
- - Optionally, configure physical embedded devices.
- - Deploy a TinyML ExecuTorch model to NXP's FRDM i.MX 93 applicaiton processor (board).
-
+ - Bring up a custom ExecuTorch `executor_runner` firmware on the FRDM i.MX 93 Cortex-M33 using Linux RemoteProc
+ - Compile an ExecuTorch `.pte` model for Ethos-U65 and run inference with NPU acceleration
+ - Understand how heterogeneous Arm systems split responsibilities across application cores, microcontrollers, and NPUs
prerequisites:
- - Purchase of a NXP [FRDM i.MX 93](https://www.nxp.com/design/design-center/development-boards-and-designs/frdm-i-mx-93-development-board:FRDM-IMX93) board.
- - A USB Mini-B to USB Type-A cable, or a USB Mini-B to USB Type-C cable.
- - Basic knowledge of Machine Learning concepts.
- - A computer running Linux or macOS.
- - VS Code
+ - An NXP [FRDM i.MX 93](https://www.nxp.com/design/design-center/development-boards-and-designs/frdm-i-mx-93-development-board:FRDM-IMX93) development board
+ - A USB Mini-B to USB Type-A cable, or a USB Mini-B to USB Type-C cable
+ - Completion of [Use Linux on an NXP FRDM i.MX 93 board](/learning-paths/embedded-and-microcontrollers/linux-nxp-board/) (Linux setup, login access, and file transfer)
+ - Basic knowledge of Machine Learning concepts
+ - A host computer to compile ExecuTorch libraries
-author: Waheed Brown, Fidel Makatia Omusilibwa
+author:
+- Waheed Brown
+- Fidel Makatia Omusilibwa
### Tags
skilllevels: Introductory
diff --git a/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/build-nxp.png b/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/build-nxp.png
new file mode 100644
index 0000000000..4d89176aaa
Binary files /dev/null and b/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/build-nxp.png differ
diff --git a/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/import-project.png b/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/import-project.png
new file mode 100644
index 0000000000..8baefef046
Binary files /dev/null and b/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/import-project.png differ
diff --git a/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/imx-93-application-processor-soc.png b/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/imx-93-application-processor-soc.png
deleted file mode 100644
index 838d47f6d5..0000000000
Binary files a/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/imx-93-application-processor-soc.png and /dev/null differ
diff --git a/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/increase-swap-space-to-4-gb.jpg b/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/increase-swap-space-to-4-gb.jpg
deleted file mode 100644
index bd36c08bac..0000000000
Binary files a/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/increase-swap-space-to-4-gb.jpg and /dev/null differ
diff --git a/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/increase-the-memory-limit-to-12-gb.jpg b/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/increase-the-memory-limit-to-12-gb.jpg
deleted file mode 100644
index 441c20636b..0000000000
Binary files a/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/increase-the-memory-limit-to-12-gb.jpg and /dev/null differ
diff --git a/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/mcuxpresso-installer.png b/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/mcuxpresso-installer.png
new file mode 100644
index 0000000000..630f8f62a1
Binary files /dev/null and b/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/mcuxpresso-installer.png differ
diff --git a/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/nxp-frdm-imx-93-board-soc-highlighted.png b/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/nxp-frdm-imx-93-board-soc-highlighted.png
deleted file mode 100644
index b50ace3a21..0000000000
Binary files a/content/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/nxp-frdm-imx-93-board-soc-highlighted.png and /dev/null differ
diff --git a/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/_index.md b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/_index.md
new file mode 100644
index 0000000000..0f859e9710
--- /dev/null
+++ b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/_index.md
@@ -0,0 +1,49 @@
+---
+title: Find CPU Cycle Hotspots with Arm Performix
+
+draft: true
+cascade:
+ draft: true
+
+minutes_to_complete: 30
+
+who_is_this_for: Cloud Engineers looking to optimize their workload running on a Linux-based Arm system.
+
+learning_objectives:
+ - Run the CPU Cycle Hotspot recipe in Arm Performix
+ - Identify which functions in your program use the most CPU cycles, so you can target the best candidates for optimization.
+
+prerequisites:
+ - Access to Arm Performix
+ - Basic understand on C++
+
+author: Kieran Hejmadi
+
+### Tags
+skilllevels: Introductory
+subjects: Performance and Architecture
+armips:
+ - Neoverse
+tools_software_languages:
+ - Arm Performix
+ - C++
+ - Runbook
+operatingsystems:
+ - Linux
+
+
+further_reading:
+ - resource:
+ title: Flame Graphs
+ link: https://www.brendangregg.com/flamegraphs.html
+ type: blog
+
+
+
+
+### FIXED, DO NOT MODIFY
+# ================================================================================
+weight: 1 # _index.md always has weight of 1 to order correctly
+layout: "learningpathall" # All files under learning paths have this same wrapper
+learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
+---
diff --git a/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/_next-steps.md b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/_next-steps.md
new file mode 100644
index 0000000000..727b395ddd
--- /dev/null
+++ b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/_next-steps.md
@@ -0,0 +1,8 @@
+---
+# ================================================================================
+# FIXED, DO NOT MODIFY THIS FILE
+# ================================================================================
+weight: 21 # The weight controls the order of the pages. _index.md always has weight 1.
+title: "Next Steps" # Always the same, html page title.
+layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing.
+---
diff --git a/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/comparison-time.jpg b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/comparison-time.jpg
new file mode 100644
index 0000000000..1bff267336
Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/comparison-time.jpg differ
diff --git a/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/comparison.jpg b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/comparison.jpg
new file mode 100644
index 0000000000..2788aabb90
Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/comparison.jpg differ
diff --git a/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/flame-graph-comparison.jpg b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/flame-graph-comparison.jpg
new file mode 100644
index 0000000000..7c145158eb
Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/flame-graph-comparison.jpg differ
diff --git a/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/flame-graph-example.jpg b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/flame-graph-example.jpg
new file mode 100644
index 0000000000..00ce3856e7
Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/flame-graph-example.jpg differ
diff --git a/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/green_mandelbrot-1-thread.jpg b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/green_mandelbrot-1-thread.jpg
new file mode 100644
index 0000000000..60484a1718
Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/green_mandelbrot-1-thread.jpg differ
diff --git a/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/hotspot-config.jpg b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/hotspot-config.jpg
new file mode 100644
index 0000000000..dab941e333
Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/hotspot-config.jpg differ
diff --git a/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/how-to-1.md b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/how-to-1.md
new file mode 100644
index 0000000000..86779b2816
--- /dev/null
+++ b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/how-to-1.md
@@ -0,0 +1,27 @@
+---
+title: Background Information
+weight: 2
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+## Overview
+
+A flame graph is a visualization built from many sampled call stacks that shows where a program spends CPU time. In a performance investigation it is often the first thing to generate when it is not yet clear which parts of the codebase are affecting execution time. Instead of guessing where the bottleneck is, you take a quick sample of real execution and use the resulting graph to identify the hottest code paths that deserve deeper analysis.
+
+## Example Flame Graph
+
+Take a look at the example flame graph below.
+
+
+
+The x axis represents the relative number of samples attributed to code paths ordered alphabetically, **not** a timeline. A wider block means that function appeared in more samples and therefore consumed more of the measured resource, typically CPU time. The y axis represents call stack depth. Frames at the bottom are closer to the root of execution such as a thread entry point, and frames above them are functions called by the frames below. A common workflow is to start with the widest blocks, then move upward through the stack to understand which callees dominate that hot path. Each sample captures a snapshot of the current call stack. Many samples are then aggregated by grouping identical stacks and summing their counts. This merging step is what makes flame graphs compact and readable. Reliable stack walking matters, and frame pointers are a common mechanism used to reconstruct the function call hierarchy consistently. When frame pointers are present, it is easier to unwind through nested calls and produce accurate stacks that merge cleanly into stable blocks.
+
+This learning path is not meant as a detailed explanation of flame graphs, if you would like to learn more please read [this blog](https://www.brendangregg.com/flamegraphs.html) by the original creator, Brendan Gregg.
+
+## Tooling options
+
+On Linux, flame graphs are commonly generated from samples collected with `perf`. perf periodically interrupts the running program and records a stack trace, then the collected stacks are converted into a folded format and rendered as the graph. Sampling frequency is important. If the frequency is too low you may miss short lived hotspots, and if it is too high you may introduce overhead or skew the results. To make the output informative, compile with debug symbols and preserve frame pointers so stacks resolve to meaningful function names and unwind reliably. A typical build uses `-g` and `-fno-omit-frame-pointer`.
+
+Arm has also developed a tool that simplifies this workflow through the CPU Cycle hotspot recipe in Arm Performix, making it easier to configure collection, run captures, and explore the resulting call hierarchies without manually stitching together the individual steps. This is the tooling solution we will use in this learning path.
\ No newline at end of file
diff --git a/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/how-to-2.md b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/how-to-2.md
new file mode 100644
index 0000000000..1e35444afb
--- /dev/null
+++ b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/how-to-2.md
@@ -0,0 +1,44 @@
+---
+title: Setup
+weight: 3
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+## Setup
+
+This learning path uses a hands-on worked example to make sampling-based profiling and flame graphs practical. You’ll build a C++11 program that generates a fractal bitmap by computing the Mandelbrot set, then mapping each pixel’s iteration count to a pixel value. You’ll have the full source code, so you can rebuild the program, profile it, and connect what you see in the flame graph back to the exact functions and loops responsible for the runtime.
+
+A fractal is a pattern that shows detail at many scales, often with self-similar structure. Fractals are usually generated by repeatedly applying a simple mathematical rule. In the Mandelbrot set, each pixel corresponds to a complex number, which is iterated through a basic recurrence. How quickly the value “escapes” (or whether it stays bounded) determines the pixel’s color and produces the familiar Mandelbrot image.
+
+You don’t need to understand the Mandelbrot algorithm in detail to follow this learning path—we’ll use it primarily as a convenient, compute-heavy workload for profiling. If you'd like to learn more, please refer to the [Wikipedia](https://en.wikipedia.org/wiki/Mandelbrot_set) page for more information.
+
+
+## Connect to Target
+
+Please refer to the [installation guide](https://learn.arm.com/install-guides/atp) if it is your first time setting up Arm Performix. In this learning path, I will be connecting to an AWS Graviton3 metal instance (`m7g.metal`) with 64 Neoverse V1 cores. From the host machine, test the connection to the remote server by navigating to `'Targets`->`Test Connection`. You should see the successul connection below.
+
+.
+
+## Build Application on Remote Server
+
+Next, connect to the remote server, for example using SSH or VisualStudio Code, and clone the Mandelbrot repository. This is available under the [Arm Education License](https://github.com/arm-university/Mandelbrot-Example?tab=License-1-ov-file) for teaching and learning. Create a new directory where you will store and build this example. Next, run the commands below.
+
+```bash
+git clone https://github.com/arm-university/Mandelbrot-Example.git
+cd Mandelbrot-Example && mkdir images builds
+git checkout single-thread
+```
+
+Install a C++ compiler, for example using your operating system's package manager.
+
+```bash
+sudo dnf update && sudo dnf install g++ gcc
+```
+
+Build the application.
+
+```bash
+./build.sh
+```
\ No newline at end of file
diff --git a/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/how-to-3.md b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/how-to-3.md
new file mode 100644
index 0000000000..022d6c5c51
--- /dev/null
+++ b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/how-to-3.md
@@ -0,0 +1,56 @@
+---
+title: Assess Baseline Performance
+weight: 4
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+## Run CPU Cycle Hotspot Recipe
+
+As shown in the `main.cpp` file below, the program generates a 1920×1080 bitmap image of our fractal. To identify performance bottlenecks, we’ll run the CPU Cycle Hotspot recipe in Arm Performix (APX). APX uses sampling to estimate where the CPU spends most of its time, allowing it to highlight the hottest functions—especially useful in larger applications where it isn’t obvious ahead of time which functions will dominate runtime.
+
+**Please Note**: You will need to replace the first string argument in the `myplot.draw()` function with the absolute path to the image folder and rebuild the application. If not, the image will be written to the `/tmp/atperf/tools/atperf-agent` directory from where the binary is run. As the name suggests, this folder is periodically deleted.
+
+```cpp
+#include "Mandelbrot.h"
+#include
+
+using namespace std;
+
+int main(){
+
+ Mandelbrot::Mandelbrot myplot(1920, 1080);
+ myplot.draw("./images/green.bmp", Mandelbrot::Mandelbrot::GREEN);
+
+ return 0;
+}
+```
+
+Open up APX from the host machine. Click on the `CPU Cycle Hotspot` recipe. If this is the first time running the recipe on this target machine you may need to click the install tools button.
+
+
+
+Next we will configure the recipe. We will choose to launch a new process, APX will automatically start collecting metric when the program starts and stop when the program exits.
+
+Provide an absolute path to the recently built binary, `mandelbrot`.
+
+Finally, we will use the default sampling rate of `Normal`. If your application is a short running program, you may want to consider a higher sample rate, this will be at the tradeoff of more data to store and process.
+
+
+
+## Analyse Results
+
+A flame graph should be generated. The default colour mode is to label the 'hottest function', those which are sampled and utilizing CPU most frequently, in the darkest shade. Here we can see that the `__complex_abs__` function is being called during ~65% of samples. This is then calling the `__hypot` symbol in `libm.so`.
+
+
+
+
+To understand deeper, we can map the the lines of source code to the functions. To do this right clight on a specific function and select 'View Source Code'. At the time of writing (ATP Engine 0.44.0), you may need to copy the source code onto your host machine.
+
+
+
+Finally, looking in our images directory we can see the bitmap fractal.
+
+
+
diff --git a/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/how-to-4.md b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/how-to-4.md
new file mode 100644
index 0000000000..3ea0dcaf94
--- /dev/null
+++ b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/how-to-4.md
@@ -0,0 +1,88 @@
+---
+title: Optimize
+weight: 5
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+Now we can leverage the insights surfaced by Arm Performix to focus the optimizations around the hottest functions. Looking at the source code, we understand the the hypotenuse function, `__hypot`, is being invoked by the `Mandelbrot::getIterations` function to calculate the absolute value of a complex number. You may consider trying to used an optimized version of `libm`
+
+Looking at the `Mandelbrot::getIterations` function, there are some obvious ways to optimize.
+
+```cpp
+ while (iterations < MAX_ITERATIONS){
+ z = (z*z) + c;
+ if (abs(z) > THRESHOLD){
+ break;
+ }
+ iterations++;
+ }
+```
+
+### Optimization 1 - Limiting Loop Boundary
+
+
+We can see that the number of iterations is of the absolute value is limited by the loop boundary, MAX_ITERATIONS. Our first optimization could be to reduce MAX_ITERATIONS. This is defined as 1024 a static const integer in the `Mandelbrot.h` header. We could half this to 512 and assess the perceived image quality on our fractal.
+
+```cpp
+public:
+...
+ static const int MAX_ITERATIONS = (1<<10);
+...
+```
+
+On the remote server, reduce `MAX_ITERATIONS` in `Mandelbrot.h` and rename the output file string in `main.cpp` to something else and rebuild the binary with the following command.
+
+```bash
+./build.sh
+```
+
+Next, click on the refresh icon in the top right to rerun the recipe. Next we select the comparison mode to view differences in the run. Navigating to the 'Run Details' tab, we observe a reducion in run duration from 1m 0s to 0m 32s, almost proportional to the reduction in `MAX_ITERATIONS`. However, we need to see if the tradeoff between image quality and runtime was worth it.
+
+Looking at the change in image quality, there is neglible difference in perceived image quality when halfing MAX_ITERATIONS.
+
+
+
+### Optimization 2 - Parallelising Hot Function
+
+Fortunately, our loop does not contain any loop-carried dependencies, where the result of an iterations depends on a future or previous iteration. As such we can parallelize our hot function to fun on multiple threads if our CPU has multiple cores.
+
+The repository contains a parallel version in the main branch.
+
+```bash
+git checkout main
+```
+
+This branch parallelized the `Mandelbrot::draw` function, which is earlier function in the stack that eventually calls the `__hypot` function.
+
+Build the example, this creates a binary `./builds/mandelbrot-parallel` which takes in a numerical command line arguments to set the number of threads.
+
+```bash
+./build.sh
+```
+
+Rerun the recipe with the new binary from Arm Performix running on the host.
+
+To assess the change, we can compare with a previous run. Looking under the `Run Details` tab, we can see the execution time has reduced further from 0m 32s to 7s with 32 threads.
+
+
+
+The percentage point of samples has not changed significantly, but we see with 64 threads the % of sampling landing on the `Mandelbrot::draw` function has reduced by 7%. This suggests that if we want to further improve the execution time, further optimizations on the `Mandelbrot::draw` function will yield the greatest benefit.
+
+.
+
+**Please Note:** The total run duration is the runtime for both the tooling setup and data analysis, not the runtime of the application. Using a command line tool such as `time` we observe the application duration is now ~ 1s. Resulting in almost a 100x improvement in runtime!
+
+### (Optional Challenge) Additional optimizations
+
+You may have noticed our build script uses the `-O0` flag, which ensures the compiler does not add any additional optimizations. You can experiment with additional optimization levels, loop boundary sizes and threads. Please see our learning path introducing [basic compiler flags](https://learn.arm.com/learning-paths/servers-and-cloud-computing/cplusplus_compilers_flags/) for more information. Additionally, you may wish to look at vectorized libraries that could replace the hypotenuse function in `libm`, such as the [Arm Performance Libraries](https://developer.arm.com/documentation/101004/2601/Arm-Performance-Libraries-Math-Functions/Arm-Performance-Libraries-Vector-Math-Functions--Accuracy-Table).
+
+
+## Summary
+
+In this learning path, we reduced the runtime of the Mandelbrot example by focusing on the hottest code paths—cutting execution time from around 1 minute to ~1 second through targeted optimization and parallelization. While this example is relatively simple and the optimizations are more obvious, the same principle applies to real-world workloads: optimize what matters most first, based on measurement.
+
+The cpu_hotspot recipe is designed to quickly identify an application’s hottest (most CPU-time-dominant) functions, giving you a clear, evidence-based starting point for performance work. By surfacing where execution time is actually being spent, it helps ensure any optimizations are targeted at the parts of the code most likely to deliver the largest performance gains, rather than relying on guesswork.
+
+This is often one of the first profiling steps you’ll run when assessing an application’s performance characteristics—especially to determine which functions dominate runtime and should be prioritized. Once hotspots are identified, you can follow up with deeper, function-specific analysis, such as memory investigations or top-down studies, and even build microbenchmarks around hot functions to explore lower-level bottlenecks and uncover additional optimization opportunities.
\ No newline at end of file
diff --git a/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/install-tools.jpg b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/install-tools.jpg
new file mode 100644
index 0000000000..421f6cce2f
Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/install-tools.jpg differ
diff --git a/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/plot-1-thead-512-iterations.jpg b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/plot-1-thead-512-iterations.jpg
new file mode 100644
index 0000000000..3787e8c4a0
Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/plot-1-thead-512-iterations.jpg differ
diff --git a/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/plot-1-thread-max-iterations.jpg b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/plot-1-thread-max-iterations.jpg
new file mode 100644
index 0000000000..fd9e98d5da
Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/plot-1-thread-max-iterations.jpg differ
diff --git a/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/runtime-difference.jpg b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/runtime-difference.jpg
new file mode 100644
index 0000000000..8a2529c62b
Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/runtime-difference.jpg differ
diff --git a/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/single-thread-flame-graph.jpg b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/single-thread-flame-graph.jpg
new file mode 100644
index 0000000000..e4ae59cf38
Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/single-thread-flame-graph.jpg differ
diff --git a/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/successful-connection.jpg b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/successful-connection.jpg
new file mode 100644
index 0000000000..311417a01e
Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/successful-connection.jpg differ
diff --git a/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/view-with-source-code.jpg b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/view-with-source-code.jpg
new file mode 100644
index 0000000000..dccc52d0da
Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/view-with-source-code.jpg differ
diff --git a/content/learning-paths/servers-and-cloud-computing/dynatrace-azure/activegate-installation.md b/content/learning-paths/servers-and-cloud-computing/dynatrace-azure/activegate-installation.md
index 02be2776cf..7b36db9b37 100644
--- a/content/learning-paths/servers-and-cloud-computing/dynatrace-azure/activegate-installation.md
+++ b/content/learning-paths/servers-and-cloud-computing/dynatrace-azure/activegate-installation.md
@@ -18,7 +18,7 @@ At the end of the installation, ActiveGate will be:
* Installed and running as a system service
* Listening on port **9999** for Dynatrace communication
* Connected to your Dynatrace SaaS environment
-* Verified on **Arm64 (Aarch64)** architecture
+* Verified on **Arm64 (`aarch64`)** architecture
## Verify OneAgent installation
@@ -51,11 +51,12 @@ This is your Dynatrace SaaS environment URL.
## Navigate to the ActiveGate deployment page
-From the Dynatrace dashboard:
+From the Dynatrace main dashboard:
-- Select Deployment status
-- Choose ActiveGate
-- Install ActiveGate
+* Select **Search** on the upper left and search for **Deployment**.
+* Select **Deployment status**.
+* Choose **ActiveGate**.
+* Select **Install ActiveGate**.

@@ -65,8 +66,9 @@ Dynatrace will generate an installation command specifically for your environmen
On the installer configuration page:
-- Platform → Linux
-- Architecture → ARM64
+* Platform -> Linux
+* Architecture -> ARM64
+* Select **Generate token** to create an authentication token

@@ -95,24 +97,29 @@ wget -O Dynatrace-ActiveGate-Linux-arm.sh \
**Verify signature:**
```console
-wget https://ca.dynatrace.com/dt-root.cert.pem ; ( echo 'Content-Type: multipart/signed; protocol="application/x-pkcs7-signature"; micalg="sha-256"; boundary="--SIGNED-INSTALLER"'; echo ; echo ; echo '----SIGNED-INSTALLER' ; cat Dynatrace-ActiveGate-Linux-arm64-1.331.24.20260210-044521.sh ) | openssl cms -verify -CAfile dt-root.cert.pem > /dev/null
+wget https://ca.dynatrace.com/dt-root.cert.pem
+( echo 'Content-Type: multipart/signed; protocol="application/x-pkcs7-signature"; micalg="sha-256"; boundary="--SIGNED-INSTALLER"'; echo; echo; echo '----SIGNED-INSTALLER'; cat Dynatrace-ActiveGate-Linux-arm.sh ) | openssl cms -verify -CAfile dt-root.cert.pem > /dev/null
```
+

**Install ActiveGate as the privileged user:**
+Copy the command under "Install ActiveGate as the privileged user" from your dashboard and prepend "sudo " to it to launch the install:
+
```console
-sudo /bin/bash Dynatrace-ActiveGate-Linux-arm64-1.331.24.20260210-044521.sh
+sudo /bin/bash Dynatrace-ActiveGate-Linux-arm.sh
```
The installer automatically performs the following tasks:
-- Downloads ActiveGate components
-- Installs the ActiveGate service
-- Configures communication with Dynatrace SaaS
-- Starts the ActiveGate service
+* Downloads ActiveGate components
+* Installs the ActiveGate service
+* Configures communication with Dynatrace SaaS
+* Starts the ActiveGate service
+
+The output is similar to:
-The output is similar to:
```output
2026-03-12 05:59:21 UTC Starting Dynatrace ActiveGate AutoUpdater...
2026-03-12 05:59:21 UTC Checking if Dynatrace ActiveGate AutoUpdater is running ...
@@ -133,12 +140,14 @@ sudo systemctl status dynatracegateway
```
The output is similar to:
+
```output
● dynatracegateway.service - Dynatrace ActiveGate service
Loaded: loaded (/etc/systemd/system/dynatracegateway.service; enabled; preset: enabled)
Active: active (running) since Thu 2026-03-12 05:59:07 UTC; 1min 7s ago
Process: 20280 ExecStart=/opt/dynatrace/gateway/dynatracegateway start (code=exited, status=0/SUCCESS)
```
+
This confirms that ActiveGate started successfully.
## Verify the ActiveGate communication port
@@ -152,6 +161,7 @@ sudo ss -tulnp | grep 9999
```
The output is similar to:
+
```console
tcp LISTEN 0 50 *:9999 *:* users:(("java",pid=20319,fd=403))
```
@@ -168,9 +178,9 @@ Deployment Status → ActiveGates
You should see your ActiveGate instance listed with:
-- Host name
-- Version
-- Status: Connected
+* Host name
+* Version
+* Status: Connected

@@ -178,7 +188,7 @@ You should see your ActiveGate instance listed with:
## Test application monitoring with Nginx
-To validate that Dynatrace is collecting monitoring data correctly, deploy a simple web server on the virtual machine. Dynatrace OneAgent will automatically detect and monitor the process.
+To validate that Dynatrace is collecting monitoring data correctly, deploy a simple web server on the virtual machine. Dynatrace OneAgent automatically detects and monitors the process.
### Install Nginx
@@ -188,7 +198,8 @@ Update the package index and install the Nginx web server.
sudo apt update
sudo apt install -y nginx
```
-## Check the Nginx service status.
+
+## Check the Nginx service status
```console
sudo systemctl status nginx
@@ -218,20 +229,22 @@ Processes
You should see a process similar to:
-- nginx
-- Dynatrace automatically begins collecting metrics such as:
-- CPU usage
-- memory consumption
-- network activity
-- request throughput
+* `nginx`
+
+Dynatrace automatically begins collecting metrics such as:
+
+* CPU usage
+* memory consumption
+* network activity
+* request throughput

-## What you've accomplished
+## What you've accomplished
You've successfully installed Dynatrace ActiveGate on your Azure Ubuntu Arm64 virtual machine. Your environment now includes:
-- Dynatrace OneAgent performing host monitoring
-- ActiveGate routing monitors traffic securely
-- Communication with Dynatrace SaaS through port 9999
-- Full compatibility with Arm64-based Cobalt 100 processors
+* Dynatrace OneAgent performing host monitoring
+* ActiveGate routing monitoring traffic securely
+* Communication with Dynatrace SaaS through port 9999
+* Full compatibility with Arm64-based Cobalt 100 processors
diff --git a/content/learning-paths/servers-and-cloud-computing/dynatrace-azure/background.md b/content/learning-paths/servers-and-cloud-computing/dynatrace-azure/background.md
index 42f64494e5..bc683c9fac 100644
--- a/content/learning-paths/servers-and-cloud-computing/dynatrace-azure/background.md
+++ b/content/learning-paths/servers-and-cloud-computing/dynatrace-azure/background.md
@@ -20,7 +20,7 @@ Dynatrace automatically maps dependencies between services, hosts, containers, a
There are three main components of Dynatrace:
-- **Dynatrace OneAgent:** a lightweight monitoring agent installed on hosts that automatically collects metrics, logs, and traces from applications and infrastructure. Learn more in the [Dynatrace OneAgent documentation](https://docs.dynatrace.com/docs/ingest-from/dynatrace-oneagent).
+- **Dynatrace OneAgent:** a lightweight monitoring agent installed on hosts that automatically collects metrics, logs, and traces from applications and infrastructure. Learn more in the [Dynatrace OneAgent documentation](https://docs.dynatrace.com/docs/ingest-from/dynatrace-oneagent).
- **Dynatrace ActiveGate:** a secure gateway component that routes monitoring traffic, enables cloud integrations, and provides additional monitoring capabilities such as Kubernetes monitoring and synthetic monitoring. Learn more in the [Dynatrace ActiveGate documentation](https://docs.dynatrace.com/docs/ingest-from/dynatrace-activegate).
diff --git a/content/learning-paths/servers-and-cloud-computing/dynatrace-azure/firewall.md b/content/learning-paths/servers-and-cloud-computing/dynatrace-azure/firewall.md
index 9d4b90aa85..496c191b1d 100644
--- a/content/learning-paths/servers-and-cloud-computing/dynatrace-azure/firewall.md
+++ b/content/learning-paths/servers-and-cloud-computing/dynatrace-azure/firewall.md
@@ -10,7 +10,7 @@ layout: learningpathall
To allow external traffic on port **9999** for Dynatrace ActiveGate running on an Azure virtual machine, open the port in the Network Security Group (NSG) attached to the virtual machine's network interface or subnet.
-{{% notice Note %}} For more information about Azure setup, see [Getting started with Microsoft Azure Platform](/learning-paths/servers-and-cloud-computing/csp/azure/).{{% /notice %}}
+{{% notice Note %}}For more information about Azure setup, see [Getting started with Microsoft Azure Platform](/learning-paths/servers-and-cloud-computing/csp/azure/).{{% /notice %}}
## Create a firewall rule in Azure
@@ -36,8 +36,7 @@ Configure it using the following details:
- **Destination port ranges:** **9999**
- **Protocol:** TCP
- **Action:** Allow
-- **Priority:** 1000
-- **Name:** dynatrace-activegate
+- **Name:** allow-tcp-9999
After filling in the details, select **Add** to save the rule.
diff --git a/content/learning-paths/servers-and-cloud-computing/dynatrace-azure/instance.md b/content/learning-paths/servers-and-cloud-computing/dynatrace-azure/instance.md
index 1da75b2837..d944e6561d 100644
--- a/content/learning-paths/servers-and-cloud-computing/dynatrace-azure/instance.md
+++ b/content/learning-paths/servers-and-cloud-computing/dynatrace-azure/instance.md
@@ -22,7 +22,7 @@ While the steps to create this instance are included here for convenience, you c
## Create an Arm-based Azure virtual machine
-Creating a virtual machine based on Azure Cobalt 100 is no different to creating any other virtual machine in Azure. To create an Azure virtual machine:
+Creating a virtual machine based on Azure Cobalt 100 is no different from creating any other virtual machine in Azure. To create an Azure virtual machine:
- Launch the Azure portal and navigate to **Virtual Machines**.
- Select **Create**, and select **Virtual Machine** from the drop-down list.
@@ -63,7 +63,7 @@ Your virtual machine should be ready and running in a few minutes. You can SSH i

-{{% notice Note %}}To learn more about Arm-based virtual machine in Azure, see “Getting Started with Microsoft Azure” in [Get started with Arm-based cloud instances](/learning-paths/servers-and-cloud-computing/csp/azure).{{% /notice %}}
+{{% notice Note %}}To learn more about Arm-based virtual machines in Azure, see "Getting Started with Microsoft Azure" in [Get started with Arm-based cloud instances](/learning-paths/servers-and-cloud-computing/csp/azure/).{{% /notice %}}
## What you've accomplished and what's next
diff --git a/content/learning-paths/servers-and-cloud-computing/dynatrace-azure/native-intstallation.md b/content/learning-paths/servers-and-cloud-computing/dynatrace-azure/native-intstallation.md
index a070f5365e..e20406e84f 100644
--- a/content/learning-paths/servers-and-cloud-computing/dynatrace-azure/native-intstallation.md
+++ b/content/learning-paths/servers-and-cloud-computing/dynatrace-azure/native-intstallation.md
@@ -13,9 +13,9 @@ To install Dynatrace OneAgent on an Azure Ubuntu 24.04 LTS Arm64 virtual machine
At the end of the installation, Dynatrace is:
* Installed and running as a host monitoring agent
-* Connected to the Dynatrace SaaS environment
+* Connected to your Dynatrace SaaS environment
* Monitoring system processes and services automatically
-* Verified on Arm64 (aarch64) architecture
+* Verified on Arm64 (`aarch64`) architecture
## Update the system and install required tools
@@ -34,21 +34,26 @@ Confirm that the virtual machine is running on the Arm64 architecture.
uname -m
```
-output is similar to:
+The output is similar to:
+
```output
aarch64
```
+
This confirms the system is using the Arm64 architecture required for Cobalt 100 processors.
## Create your Dynatrace trial environment
-Fill in the required information:
+In a web browser, go to [Dynatrace](https://dynatrace.com) and select **Try it free** followed by **Start trial**.
+
+Enter your email and then complete the requested fields:
-- First name
-- Last name
-- Work email address
-- Company name
-- Country
+* Create and supply a password for your trial account
+* First name
+* Last name
+* Work email address
+* Company name
+* Country
After submitting the form, Dynatrace creates a new SaaS monitoring environment for you.
@@ -56,17 +61,16 @@ This process usually takes 1–2 minutes.
## Access your Dynatrace environment
-After the environment is created, you will receive an email with a link similar to:
+After the environment is created, your browser shows a button named **Launch Dynatrace**. Select it to open your Dynatrace environment.
-```console
-https://.live.dynatrace.com
-```
+Make a note of the environment ID assigned to your account because it appears in your dashboard URL.
**Example:**
```text
-https://qzo72404.live.dynatrace.com
+https://mbp77458.apps.dynatrace.com/ui/apps/dynatrace.launcher/getting-started
```
+
The Environment ID uniquely identifies your Dynatrace tenant and is required for agent installation.

@@ -75,26 +79,26 @@ The Environment ID uniquely identifies your Dynatrace tenant and is required for
From the Dynatrace dashboard:
-- Select Deploy Dynatrace
-- Choose OneAgent
-- Select Linux
+* Choose **OneAgent**
+* Choose **Setup**

This page generates the installation command tailored for your environment.
-## Select ARM Architecture
+## Select Arm64 architecture
+
+On the installer page, confirm these selections:
-In the installer page:
+* Platform -> Linux
+* Architecture -> ARM64
+* Monitoring mode -> Full-stack monitoring
-- Cloud platform → Linux
-- Select architecture → ARM64
-- Select monitoring mode:
- >Full-stack monitoring
+Then select **Generate token** to create an authentication token.

-## Copy OneAgent Installer Command
+## Copy OneAgent installer command
Dynatrace generates an installer command that includes your environment ID and API token.:
@@ -106,82 +110,81 @@ wget -O Dynatrace-OneAgent-Linux-arm.sh \
**Example:**
-```text
+```console
wget -O Dynatrace-OneAgent-Linux-arm.sh \
"https://qzo72404.live.dynatrace.com/api/v1/deployment/installer/agent/unix/default/latest?arch=arm" \
--header="Authorization: Api-Token DT_API_TOKEN"
```
-- The API token allows secure access to the Dynatrace installer.
+
+The API token allows secure access to the Dynatrace installer.
Run this command on the virtual machine to download the installer.
-**Verify signature**
+## Verify installer signature
For security, verify the installer signature using Dynatrace’s root certificate.
```console
-wget https://ca.dynatrace.com/dt-root.cert.pem ; ( echo 'Content-Type: multipart/signed; protocol="application/x-pkcs7-signature"; micalg="sha-256"; boundary="--SIGNED-INSTALLER"'; echo ; echo ; echo '----SIGNED-INSTALLER' ; cat Dynatrace-OneAgent-Linux-x86-1.331.49.20260227-104933.sh ) | openssl cms -verify -CAfile dt-root.cert.pem > /dev/null
+wget https://ca.dynatrace.com/dt-root.cert.pem
+( echo 'Content-Type: multipart/signed; protocol="application/x-pkcs7-signature"; micalg="sha-256"; boundary="--SIGNED-INSTALLER"'; echo; echo; echo '----SIGNED-INSTALLER'; cat Dynatrace-OneAgent-Linux-arm.sh ) | openssl cms -verify -CAfile dt-root.cert.pem > /dev/null
```
-Run it on the VM.
+
+This command validates the downloaded installer before you run it.

## Install OneAgent as the privileged user
-Run:
+Copy the command under **Install OneAgent as the privileged user** from your Dynatrace dashboard and prepend `sudo` to run the install.
```console
-sudo /bin/sh Dynatrace-OneAgent-Linux-x86-1.331.49.20260227-104933.sh --set-monitoring-mode=fullstack --set-app-log-content-access=true
+sudo /bin/sh Dynatrace-OneAgent-Linux-arm.sh --set-monitoring-mode=fullstack --set-app-log-content-access=true
```
-The output is similar to:
+The output is similar to:
+
```output
-2026-03-12 05:59:21 UTC Starting Dynatrace ActiveGate AutoUpdater...
-2026-03-12 05:59:21 UTC Checking if Dynatrace ActiveGate AutoUpdater is running ...
-2026-03-12 05:59:21 UTC Dynatrace ActiveGate AutoUpdater is running.
-2026-03-12 05:59:21 UTC Cleaning autobackup...
-2026-03-12 05:59:21 UTC Removing old installation log files...
-2026-03-12 05:59:21 UTC
-2026-03-12 05:59:21 UTC --------------------------------------------------------------
-2026-03-12 05:59:21 UTC Installation finished successfully.
+Starting Dynatrace OneAgent installer...
+Installing OneAgent...
+Setting agent configuration...
+Installation finished successfully.
```
The installer performs several tasks automatically:
-- Downloads monitoring components
-- Configures kernel instrumentation
-- Installs the OneAgent system service
-- Registers the host with your Dynatrace environment
+* Downloads monitoring components
+* Configures kernel instrumentation
+* Installs the OneAgent system service
+* Registers the host with your Dynatrace environment
-## Verify OneAgent Service
+## Verify OneAgent service
Check that the Dynatrace monitoring service is running.
-This confirms the monitoring agent started successfully.
```console
sudo systemctl status oneagent
```
-The output is similar to:
+The output is similar to:
+
```output
-● dynatracegateway.service - Dynatrace ActiveGate service
- Loaded: loaded (/etc/systemd/system/dynatracegateway.service; enabled; preset: enabled)
- Active: active (running) since Thu 2026-03-12 05:59:07 UTC; 1min 7s ago
- Process: 20280 ExecStart=/opt/dynatrace/gateway/dynatracegateway start (code=exited, status=0/SUCCESS)
- Main PID: 20316 (dynatracegatewa)
+● oneagent.service - Dynatrace OneAgent
+ Loaded: loaded (/etc/systemd/system/oneagent.service; enabled; preset: enabled)
+ Active: active (running)
```
This confirms the monitoring agent started successfully.
-## Verify Dynatrace Processes
+## Verify Dynatrace processes
-This confirms the monitoring agent started successfully.
+You can also check the OneAgent processes from the terminal.
```console
ps aux | grep oneagent
```
-The output is similar to:
+The output is similar to:
+
```output
dtuser 17754 0.0 0.0 307872 4388 ? Ssl 05:48 0:00 /opt/dynatrace/oneagent/agent/lib64/oneagentwatchdog -bg -config=/opt/dynatrace/oneagent/agent/conf/watchdog.conf
dtuser 17761 0.2 0.3 1183000 59136 ? Sl 05:48 0:06 oneagentos -Dcom.compuware.apm.WatchDogTimeout=900 -watchdog.restart_file_location=/var/lib/dynatrace/oneagent/agent/watchdog/watchdog_restart_file -Dcom.compuware.apm.WatchDogPipe=/var/lib/dynatrace/oneagent/agent/watchdog/oneagentos_pipe_17754
@@ -193,7 +196,7 @@ azureus+ 23847 0.0 0.0 9988 2772 pts/0 S+ 06:33 0:00 grep --color=
This confirms the monitoring agent started successfully.
-## Confirm Host Detection in Dynatrace
+## Confirm host detection in Dynatrace
Return to the Dynatrace web interface.
@@ -206,7 +209,7 @@ Infrastructure & Operations
You should see:
-```outout
+```output
Host name:
OS: Linux
Architecture: ARM64
@@ -215,7 +218,7 @@ Monitoring mode: Full Stack

-## Check Automatic Process Discovery
+## Check automatic process discovery
Dynatrace automatically discovers running applications and services.
@@ -227,10 +230,10 @@ Hosts → Processes
Dynatrace identifies services such as:
-- system processes
-- web servers
-- databases
-- container runtimes
+* system processes
+* web servers
+* databases
+* container runtimes

@@ -238,10 +241,10 @@ Dynatrace identifies services such as:
You've successfully installed Dynatrace OneAgent on your Azure Ubuntu Arm64 virtual machine. Your installation includes:
-- Dynatrace OneAgent is installed and running as a system service
-- Automatic startup enabled through systemd
-- Secure connection to the Dynatrace SaaS platform
-- Full-stack monitoring of system resources and processes
-- Arm64-native monitoring on Azure Cobalt 100 processors
+* Dynatrace OneAgent installed and running as a system service
+* Automatic startup enabled through systemd
+* Secure connection to the Dynatrace SaaS platform
+* Full-stack monitoring of system resources and processes
+* Arm64-native monitoring on Azure Cobalt 100 processors
Next, you'll install Dynatrace ActiveGate to enable additional capabilities such as Kubernetes monitoring, secure data routing, and extension support.
diff --git a/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/1-setup.md b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/1-setup.md
new file mode 100644
index 0000000000..37e3c67f12
--- /dev/null
+++ b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/1-setup.md
@@ -0,0 +1,71 @@
+---
+title: Set up the target environment and compile the application
+weight: 2
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+To analyze performance bottlenecks, you need an environment and a sample application to profile. In this section, you configure an Arm Performix connection and build a Mandelbrot set generator.
+
+A Mandelbrot set generator is a classic computer science application used to test computational performance. It calculates a famous mathematical fractal by performing intense, repeated mathematical operations (often floating-point) for every pixel in a large image. Because the math for each pixel is independent of the others, it is a highly parallelizable workload that is perfect for demonstrating CPU optimizations like vectorization and loop unrolling.
+
+## Before you begin
+
+Make sure Arm Performix installed on your host machine. The host machine is your local computer where the Arm Performix GUI runs, and it can be a Windows, macOS, or Linux machine. The target machine is the Linux server where your application is compiled and where the application runs.
+
+If you do not have Arm Performix installed, see the [Arm Performix install guide](/install-guides/atp/).
+
+From the host machine, open the Arm Performix application and navigate to the **Targets** tab. Set up an SSH connection to the target that runs the workload, and test the connection. For the examples in this guide, you connect to an Arm Neoverse-based server.
+
+The Arm Performix collection agent requires Python and `binutils` to run on the target machine.
+
+Connect to your target machine using SSH and install these required OS packages.
+
+For Ubuntu and other Debian-based distributions, run the following command:
+
+```bash
+sudo apt-get install python3 python3-venv binutils
+```
+
+## Build the sample application on the target machine
+
+Download the sample application, which is a Mandelbrot set generator provided under the [Arm Education License](https://github.com/arm-university/Mandelbrot-Example?tab=License-1-ov-file). Create a directory to store and build the example, then run the following commands:
+
+```bash
+cd $HOME
+git clone https://github.com/arm-university/Mandelbrot-Example.git
+cd Mandelbrot-Example && mkdir images builds
+```
+
+Install a C++ compiler using your operating system's package manager. For Ubuntu and other Debian-based distributions, run the following command:
+
+```bash
+sudo apt install build-essential
+```
+
+Run the provided setup script to build the application:
+
+```bash
+./build.sh
+```
+
+When the build completes, a binary named `mandelbrot-parallel` is created in the `./builds` directory.
+
+The application requires one argument: the number of threads to use. Run this new executable with 4 threads:
+
+```bash
+./builds/mandelbrot-parallel 4
+```
+
+The application generates a bitmap image file in your `./images` directory that looks similar to the following fractal:
+
+
+
+## What you've accomplished and what's next
+
+In this section:
+- You set up the target machine and established an SSH connection.
+- You built the Mandelbrot sample application.
+
+Next, you will use the CPU Microarchitecture recipe to identify performance bottlenecks in the application.
\ No newline at end of file
diff --git a/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/2-run-cpu-uarch.md b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/2-run-cpu-uarch.md
new file mode 100644
index 0000000000..f40ee8c9b4
--- /dev/null
+++ b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/2-run-cpu-uarch.md
@@ -0,0 +1,76 @@
+---
+title: Identify application bottlenecks with the CPU Microarchitecture recipe
+weight: 3
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+## Run the CPU Microarchitecture recipe
+
+To identify performance bottlenecks, run the CPU Microarchitecture recipe in Arm Performix. Arm Performix uses microarchitectural sampling to show which instruction pipeline stages dominate program latency, and then highlights ways to improve those bottlenecks.
+
+Start by reviewing the code in `main.cpp`, the program generates a 1920×1080 bitmap image of the fractal.
+
+```cpp
+#include "Mandelbrot.h"
+#include
+
+using namespace std;
+
+int main(int argc, char* argv[]){
+
+ const int NUM_THREADS = std::stoi(argv[1]);
+ std::cout << "Number of Threads = " << NUM_THREADS << std::endl;
+
+ Mandelbrot::Mandelbrot myplot(1920, 1080, NUM_THREADS);
+ myplot.draw("/home/ec2-user/Mandelbrot-final/Mandelbrot-Example/images/Green-Parallel-512.bmp", Mandelbrot::Mandelbrot::GREEN);
+
+ return 0;
+}
+```
+
+When Arm Performix launches the executable on the target machine, it does so from a temporary agent directory, `/tmp/atperf/tools/atperf-agent`. If your code uses a relative path to save the image, the image is written to that temporary folder and might be deleted.
+
+To prevent this, edit the `myplot.draw()` line in `main.cpp` to use the absolute path to your project's image folder (for example, `/home/ubuntu/Mandelbrot-Example/images/Green-Parallel-512.bmp`), and then rebuild the application.
+
+In the Arm Performix application on your host machine, select the **CPU Microarchitecture** recipe.
+
+
+
+Select the target you configured in the setup section. If this is your first run on this target, you might need to select **Install Tools** to copy the collection tools to the target. After the tools are installed, you see the target is now ready.
+
+Next, select the **Workload type** and select **Launch a new process**.
+
+Enter the absolute path to your executable in the **Workload** field. For example, `/home/ubuntu/Mandelbrot-Example/builds/mandelbrot-parallel`. Make sure to add the number of threads argument.
+
+{{% notice Note %}}
+Use the full path to your executable because the **Workload** field does not currently support shell-style path expansion.
+{{% /notice %}}
+
+Before starting the analysis, you can customize the configuration. For instance, you can set a time limit for the workload or choose specific metrics to investigate. You can also adjust the sampling rate (High, Normal, or Low) to balance collection overhead against sampling granularity. Because this Mandelbrot example is a native C++ application, you can ignore the **Collect managed code stacks** toggle, which is used for Java or .NET workloads.
+
+When your configuration is ready, select **Run Recipe** to launch the workload and collect the performance data.
+
+## View the run results
+
+Arm Performix generates a high-level instruction pipeline view, highlighting where the most time is spent.
+
+
+
+In this breakdown, Backend Stalls dominate the samples. Within that category, work is split between Load Operations and integer and floating-point operations.
+There is no measured SIMD activity, even though this workload is highly parallelizable.
+
+The **Insights** panel highlights ALU contention as a likely improvement opportunity:
+
+
+
+To inspect executed instruction types in more detail, use the Instruction Mix recipe in the next step.
+
+## What you've accomplished and what's next
+
+In this section:
+- You ran the CPU Microarchitecture recipe on the Mandelbrot application.
+- You identified that the application spends most of its time in Backend Stalls without using SIMD operations.
+
+Next, you will run the Instruction Mix recipe to confirm where optimization opportunities exist and implement vectorization.
\ No newline at end of file
diff --git a/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/3-instruction-mix.md b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/3-instruction-mix.md
new file mode 100644
index 0000000000..bb49049e18
--- /dev/null
+++ b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/3-instruction-mix.md
@@ -0,0 +1,135 @@
+---
+title: Analyze SIMD utilization with the Instruction Mix recipe
+weight: 4
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+## Run the Instruction Mix recipe
+
+The previous CPU Microarchitecture analysis showed that the sample application used no single instruction, multiple data (SIMD) operations, which points to an optimization opportunity. Run the Instruction Mix recipe to learn more. The Instruction Mix launch panel is similar to CPU Microarchitecture, but it does not include options to choose metrics. Again, enter the full path to the workload.
+
+Select **Dynamic** f for the **Analysis Mode**.
+
+
+
+The results below confirm a high number of integer and floating-point operations, with no SIMD operations. The **Insights** panel suggests vectorization as a path forward, lists possible root causes, and links to related Learning Paths.
+
+
+
+## Vectorize the application
+
+To address the lack of SIMD operations, you can vectorize the application's most intensive functions. For the Mandelbrot application, `Mandelbrot::draw` and its inner `Mandelbrot::getIterations` function consume most of the runtime. A vectorized version is available in the [instruction-mix branch](https://github.com/arm-education/Mandelbrot-Example/tree/instruction-mix). This branch uses Neon operations, which run on any Neoverse system. Your system might support alternatives such as SVE or SVE2 which can also be used.
+
+Connect to your target machine using SSH and navigate to your project directory. Because you modified `main.cpp` earlier, you must stash your changes before switching to the `instruction-mix` branch. Then, rebuild the application:
+
+```bash
+cd $HOME/Mandelbrot-Example
+git stash
+git checkout instruction-mix
+./build.sh
+```
+
+After you rebuild the application and run the Instruction Mix recipe again, integer and floating-point operations are greatly reduced and replaced by a smaller set of SIMD instructions.
+
+
+
+## Assess the performance improvements
+
+Because you are running multiple experiments, give each run a meaningful nickname to keep results organized.
+
+
+Use the **Compare** feature at the top right of an entry in the **Runs** view to select another run of the same recipe for comparison.
+
+
+
+This selection box lets you choose any run of the same recipe type. The ⇄ arrows swap which run is treated as the baseline and which is current.
+
+After you select two runs, Arm Performix overlays them so you can review category changes in one view. In the new run, note that
+
+
+Compared to the baseline, floating-point operations, branch operations, and some integer operations have been traded for loads, stores, and SIMD operations.
+Execution time also improves significantly, making this run nearly four times faster.
+
+```bash { command_line="root@localhost | 2-6" }
+time builds/mandelbrot-parallel-no-simd 1
+Number of Threads = 1
+
+real 0m31.326s
+user 0m31.279s
+sys 0m0.011s
+```
+
+```bash { command_line="root@localhost | 2-6" }
+time builds/mandelbrot-parallel 1
+Number of Threads = 1
+
+real 0m8.362s
+user 0m8.331s
+sys 0m0.016s
+```
+
+## Compare the CPU Microarchitecture results
+
+The CPU Microarchitecture recipe also supports a **Compare** view that shows percentage-point changes in each stage and instruction type.
+
+
+You can now see that Load and Store operations account for about 70% of execution time. **Insights** offers several explanations because multiple issues can contribute to the root cause.
+```
+The CPU spends a larger share of cycles stalled in the backend, meaning execution or memory resources cannot complete work fast enough. This is a cycle-based measure (percentage of stalled cycles).
+
+POSSIBLE CAUSES
+
+- Slow memory access, for example, L2 cache misses or Dynamic Random-Access Memory (DRAM) misses
+- Contention in execution pipelines, for example, the Arithmetic Logic Unit (ALU) or load/store units
+- Poor data locality
+- Excessive branching
+- Instruction dependencies that create pipeline bubbles
+```
+
+## Apply compiler optimizations for loop unrolling
+
+To address the new load and store bottlenecks, add optimization flags to the compiler to enable more aggressive loop unrolling. Edit the `build.sh` script to include these flags in the `CXXFLAGS` array:
+```bash
+ # build.sh
+ CXXFLAGS=(
+ --std=c++11
+ -O3
+ -mcpu=neoverse-n1+crc+crypto
+ -ffast-math
+ -funroll-loops
+ -flto
+ -DNDEBUG
+ )
+```
+
+After saving the file, run `./build.sh` to compile the application with the new flags.
+
+Runtime improves again, with an additional 11x speedup over the SIMD build that used the default compiler flags.
+
+
+```bash { command_line="root@localhost | 2-6" }
+time ./builds/mandelbrot-parallel 1
+Number of Threads = 1
+
+real 0m0.743s
+user 0m0.724s
+sys 0m0.014s
+```
+
+Another CPU Microarchitecture measurement shows that Load and Store bottlenecks are almost eliminated. SIMD floating-point operations now dominate execution, which indicates the application is better tuned to feed floating-point execution units.
+
+
+The program still generates the same output, and runtime drops from 31 s to less than 1 s, a 43x speedup.
+
+
+
+## What you've accomplished and what's next
+
+In this section:
+- You used the Instruction Mix recipe to confirm a lack of SIMD operations.
+- You vectorized the sample application and verified the shift toward SIMD execution.
+- You applied compiler loop unrolling to relieve backend load/store bottlenecks, achieving over 40x speedup.
+
+You are now ready to analyze and optimize your own native C/C++ applications on Arm Neoverse using Arm Performix. Review the next steps to continue your learning journey.
\ No newline at end of file
diff --git a/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/_index.md b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/_index.md
new file mode 100644
index 0000000000..41e4be3053
--- /dev/null
+++ b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/_index.md
@@ -0,0 +1,61 @@
+---
+title: Optimize application performance using Arm Performix CPU microarchitecture analysis
+
+draft: true
+cascade:
+ draft: true
+
+minutes_to_complete: 60
+
+who_is_this_for: This is an introductory topic for software developers who want to learn performance analysis methodologies for Linux applications on Arm Neoverse-based servers.
+
+learning_objectives:
+ - Identify CPU pipeline bottlenecks using the Arm Performix CPU Microarchitecture recipe
+ - Analyze instruction types and SIMD utilization using the Instruction Mix recipe
+ - Optimize application performance using vectorization and compiler flags
+ - Compare performance profiles to measure execution improvements
+
+prerequisites:
+ - An Arm Neoverse-based server running Linux. A bare-metal or cloud bare-metal instance is best because it exposes more counters.
+
+author:
+- Brendan Long
+- Kieran Hejmadi
+
+### Tags
+skilllevels: Introductory
+subjects: Performance and Architecture
+armips:
+ - Neoverse
+tools_software_languages:
+ - Arm Performix
+ - C
+ - Runbook
+
+operatingsystems:
+ - Linux
+
+further_reading:
+ - resource:
+ title: "Find CPU Cycle Hotspots with Arm Performix"
+ link: /learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/
+ type: documentation
+ - resource:
+ title: "Port Code to Arm Scalable Vector Extension (SVE)"
+ link: /learning-paths/servers-and-cloud-computing/sve/
+ type: documentation
+ - resource:
+ title: "Arm Neoverse N1: Core Performance Analysis Methodology"
+ link: https://armkeil.blob.core.windows.net/developer/Files/pdf/white-paper/neoverse-n1-core-performance-v2.pdf
+ type: documentation
+ - resource:
+ title: "Arm Neoverse N1 PMU Guide"
+ link: https://developer.arm.com/documentation/PJDOC-466751330-547673/r4p1/
+ type: documentation
+
+### FIXED, DO NOT MODIFY
+# ================================================================================
+weight: 1 # _index.md always has weight of 1 to order correctly
+layout: "learningpathall" # All files under learning paths have this same wrapper
+learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
+---
diff --git a/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/_next-steps.md b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/_next-steps.md
new file mode 100644
index 0000000000..c3db0de5a2
--- /dev/null
+++ b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/_next-steps.md
@@ -0,0 +1,8 @@
+---
+# ================================================================================
+# FIXED, DO NOT MODIFY THIS FILE
+# ================================================================================
+weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation.
+title: "Next Steps" # Always the same, html page title.
+layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing.
+---
diff --git a/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/compare-with-box.webp b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/compare-with-box.webp
new file mode 100644
index 0000000000..836873d2a6
Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/compare-with-box.webp differ
diff --git a/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/cpu-uarch-config.webp b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/cpu-uarch-config.webp
new file mode 100644
index 0000000000..c2f4892e71
Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/cpu-uarch-config.webp differ
diff --git a/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/cpu-uarch-insights-original.webp b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/cpu-uarch-insights-original.webp
new file mode 100644
index 0000000000..611081506a
Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/cpu-uarch-insights-original.webp differ
diff --git a/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/cpu-uarch-insights.webp b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/cpu-uarch-insights.webp
new file mode 100644
index 0000000000..ab159ecc64
Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/cpu-uarch-insights.webp differ
diff --git a/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/cpu-uarch-results.webp b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/cpu-uarch-results.webp
new file mode 100644
index 0000000000..36d568dded
Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/cpu-uarch-results.webp differ
diff --git a/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/cpu-uarch-simd-results-diff.webp b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/cpu-uarch-simd-results-diff.webp
new file mode 100644
index 0000000000..023f400bb0
Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/cpu-uarch-simd-results-diff.webp differ
diff --git a/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/green-parallel-512.webp b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/green-parallel-512.webp
new file mode 100644
index 0000000000..a26550d53d
Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/green-parallel-512.webp differ
diff --git a/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/high-simd-utilization.webp b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/high-simd-utilization.webp
new file mode 100644
index 0000000000..300688c470
Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/high-simd-utilization.webp differ
diff --git a/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/instruction-mix-config.webp b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/instruction-mix-config.webp
new file mode 100644
index 0000000000..01a7374816
Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/instruction-mix-config.webp differ
diff --git a/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/instruction-mix-diff-results.webp b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/instruction-mix-diff-results.webp
new file mode 100644
index 0000000000..db2039d1b0
Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/instruction-mix-diff-results.webp differ
diff --git a/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/instruction-mix-results.webp b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/instruction-mix-results.webp
new file mode 100644
index 0000000000..0fc0f8b088
Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/instruction-mix-results.webp differ
diff --git a/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/instruction-mix-simd-results.webp b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/instruction-mix-simd-results.webp
new file mode 100644
index 0000000000..f1cb0cb412
Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/instruction-mix-simd-results.webp differ
diff --git a/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/performance-improvement.webp b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/performance-improvement.webp
new file mode 100644
index 0000000000..a2fedd125e
Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/performance-improvement.webp differ
diff --git a/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/rename-run.webp b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/rename-run.webp
new file mode 100644
index 0000000000..f222b9a053
Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/performix-microarchitecture/rename-run.webp differ