Skip to content

Commit 46612e1

Browse files
authored
Merge pull request #2661 from ArmDeveloperEcosystem/main
Prod update with minor fixes for SME2 LiteRT LP
2 parents 4999e98 + 9c75a4b commit 46612e1

5 files changed

Lines changed: 18 additions & 17 deletions

File tree

content/learning-paths/mobile-graphics-and-gaming/litert-sme/1-litert-kleidiai-sme2.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ layout: learningpathall
88

99
## Inside the LiteRT software stack
1010

11-
LiteRT (Lightweight Runtime, formerly TensorFlow Lite) is a runtime for on-device AI on Arm platforms. The default CPU acceleration library used by LiteRT is XNNPACK.
11+
LiteRT (Lite Runtime, formerly TensorFlow Lite) is a runtime for on-device AI. The default CPU acceleration library used by LiteRT is XNNPACK.
1212

1313
XNNPACK is an open-source library that provides highly optimized implementations of neural-network operators. It continuously integrates the KleidiAI library to use new CPU features such as Scalable Matrix Extension 2 (SME2).
1414

@@ -25,13 +25,13 @@ To understand how KleidiAI SME2 micro-kernels work in LiteRT, think about a Lite
2525
### LiteRT → XNNPACK workflow
2626

2727
![Diagram showing the workflow for a fully connected operator in LiteRT using XNNPACK. The diagram depicts the flow from LiteRT to XNNPACK, highlighting the use of NEON instructions for matrix multiplication and weight packing on Arm platforms. The technical environment emphasizes operator traversal, hardware detection, and parallel computation. alt-text #center](./litert-xnnpack-workflow.png "LiteRT, XNNPACK workflow")
28-
A fully connected operator multiplies two matrices: the input activations (LHS) and the weights (RHS).
28+
For batch sizes greater than 1, a fully connected operator performs a matrix multiplication between the input activations (LHS) and the weights (RHS).
2929

3030
When LiteRT loads a model, it reads the operators and builds a computation graph. If you select the CPU as the accelerator, LiteRT uses XNNPACK by default.
3131

32-
XNNPACK scans the computation graph and looks for operators it can optimize. It packs the weight matrix to prepare for efficient computation. On Arm platforms, XNNPACK uses NEON instructions to speed up this packing and the matrix multiplication.
32+
XNNPACK scans the computation graph and looks for operators it can optimize. XNNPACK also checks the hardware compatibility and chooses the best available micro-kernel. Then, it packs the weight matrix to prepare for efficient computation. On Arm platforms, XNNPACK uses NEON instructions to speed up this packing.
3333

34-
At runtime, XNNPACK checks the hardware and chooses the best available micro-kernel. During inference, it splits the matrices into smaller tiles and runs the multiplications in parallel across multiple threads, using NEON instructions for faster processing.
34+
During model inference, it splits the matrices into smaller tiles and runs the multiplications in parallel across multiple threads, using NEON instructions for faster processing.
3535

3636
### LiteRT → XNNPACK → KleidiAI workflow
3737

@@ -41,7 +41,7 @@ When KleidiAI and SME2 are enabled at build time, the KleidiAI SME2 micro-kernel
4141

4242
During the model loading stage, when XNNPACK optimizes the subgraph, it checks the operator’s data type to determine whether a KleidiAI implementation is available. If KleidiAI supports it, XNNPACK bypasses its own default implementation. As a result, RHS packing is performed using the KleidiAI SME packing micro-kernel. Because KleidiAI typically requires packing of the LHS, a flag is also set during this stage.
4343

44-
During model inference, the LHS packing micro-kernel is invoked. After the LHS is packed, XNNPACK performs the matrix multiplication. At this point, the KleidiAI SME micro-kernel is used to compute the matrix product.
44+
During model inference, the LHS packing micro-kernel is invoked. After the LHS is packed, XNNPACK performs the matrix multiplication. At this point, the KleidiAI SME micro-kernel is used to compute the matrix.
4545

4646
## What you've accomplished and what's next
4747

content/learning-paths/mobile-graphics-and-gaming/litert-sme/3-build-model.md renamed to content/learning-paths/mobile-graphics-and-gaming/litert-sme/2-build-model.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ layout: learningpathall
88

99
## KleidiAI SME2 support in LiteRT
1010

11-
LiteRT uses XNNPACK as its default CPU backend. KleidiAI micro-kernels are integrated through XNNPACK in LiteRT. Only a subset of KleidiAI Scalable Matrix Extension (SME and SME2) micro-kernels has been integrated into XNNPACK. These micro-kernels support operators using the following data types and quantization configurations in the LiteRT model. Other operators use XNNPACK's default implementation during inference.
11+
LiteRT uses XNNPACK as its default CPU backend. KleidiAI micro-kernels are integrated through XNNPACK in LiteRT. Only a subset of KleidiAI SME2 micro-kernels has been integrated into XNNPACK. These micro-kernels support operators using the following data types and quantization configurations in the LiteRT model. Other operators use XNNPACK's default implementation during inference.
1212

1313
### Supported operator configurations
1414

@@ -34,8 +34,8 @@ LiteRT uses XNNPACK as its default CPU backend. KleidiAI micro-kernels are integ
3434

3535
| Activations | Weights | Output |
3636
| ---------------------------- | ----------------------------------------------------- | ---------------------------- |
37-
| FP32 | FP32, pointwise (kernel size is 1) | FP32 |
38-
| FP32 | FP16, pointwise (kernel size is 1) | FP32 |
37+
| FP32 | FP32 | FP32 |
38+
| FP32 | FP16 | FP32 |
3939
| FP32 | Per-channel or per-tensor symmetric INT8 quantization | FP32 |
4040
| Asymmetric INT8 quantization | Per-channel or per-tensor symmetric INT8 quantization | Asymmetric INT8 quantization |
4141

@@ -101,8 +101,6 @@ adb shell chmod +x /data/local/tmp/fc_fp32.tflite
101101

102102
You can also optimize this Keras model using post-training quantization to create a LiteRT model that suits your requirements.
103103

104-
---
105-
106104
## Post-training quantization options
107105

108106
**Post-training FP16 quantization**

content/learning-paths/mobile-graphics-and-gaming/litert-sme/2-build-tool.md renamed to content/learning-paths/mobile-graphics-and-gaming/litert-sme/3-build-tool.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ LiteRT provides a standalone performance measurement utility called `benchmark_m
1212

1313
In this section, you will build two versions of the benchmark tool:
1414
- With KleidiAI and Scalable Matrix Extension version 2 (SME2) enabled, which uses Arm-optimized micro-kernels
15-
- Without KleidiAI and SME2, which provides baseline performance using NEON or SVE2 fallback
15+
- Without KleidiAI and SME2, which provides baseline performance using NEON micro-kernels
1616

1717
This comparison demonstrates the performance gains provided by SME2 acceleration.
1818

@@ -23,9 +23,9 @@ cd $WORKSPACE
2323
git clone https://github.com/google-ai-edge/LiteRT.git
2424
```
2525

26-
Because LiteRT integrates KleidiAI through XNNPACK (an open-source library providing highly optimized neural-network operators), you must build LiteRT from source to enable SME2 micro-kernels.
26+
Because LiteRT integrates KleidiAI through XNNPACK, you must build LiteRT from source to enable SME2 micro-kernels.
2727

28-
Next, set up your Android build environment using Docker on your Linux development machine. Google provides a Dockerfile that installs the toolchain needed for TensorFlow Lite (TFLite)/LiteRT Android builds.
28+
Next, set up your Android build environment using Docker on your Linux development machine. Google provides a Dockerfile that installs the toolchain needed for LiteRT Android builds.
2929

3030
Download the Dockerfile:
3131

@@ -129,7 +129,7 @@ ${XNNPACK_OPTIONS} "${BENCHMARK_TOOL_PATH}" \
129129
--repo_env=HERMETIC_PYTHON_VERSION=3.12
130130
```
131131

132-
This build of the `benchmark_model` disables all SME2 micro-kernels and forces fallback to XNNPACK's NEON or SVE2 kernels.
132+
This build of the `benchmark_model` disables all SME2 micro-kernels and forces fallback to XNNPACK's NEON micro-kernels.
133133

134134
You can then use Android Debug Bridge (ADB) to push the benchmark tool to your Android device:
135135

content/learning-paths/mobile-graphics-and-gaming/litert-sme/4-benchmark.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
---
12
title: Benchmark the LiteRT model
23
weight: 5
34
### FIXED, DO NOT MODIFY
@@ -210,11 +211,13 @@ For other operators supported by KleidiAI, the per-operator profiling node types
210211
| Fully Connected | Fully Connected (NC, QP8, F32, QC4W) | Fully Connected (NC, QD8, F32, QC4W) |
211212
| Fully Connected / Conv2D (Pointwise) | Fully Connected (NC, QP8, F32, QC8W) | Fully Connected (NC, QD8, F32, QC8W) |
212213
| Fully Connected / Conv2D (Pointwise) | Fully Connected (NC, PQS8, QC8W) | Fully Connected (NC, QS8, QC8W) |
214+
| Conv2D | Convolution (NHWC, PF32) | Convolution (NHWC, F32) |
215+
| Conv2D | Convolution (NHWC, PF16) | Convolution (NHWC, F16) |
216+
| Conv2D | Convolution (NHWC, PQS8, QS8, QC8W) | Convolution (NHWC, QC8) |
217+
| TransposeConv | Deconvolution (NHWC, PQS8, QS8, QC8W) | Deconvolution (NC, QS8, QC8W) |
213218
| Batch Matrix Multiply | Batch Matrix Multiply (NC, PF32) | Batch Matrix Multiply (NC, F32) |
214219
| Batch Matrix Multiply | Batch Matrix Multiply (NC, PF16) | Batch Matrix Multiply (NC, F16) |
215220
| Batch Matrix Multiply | Batch Matrix Multiply (NC, QP8, F32, QC8W) | Batch Matrix Multiply (NC, QD8, F32, QC8W) |
216-
| Conv2D | Convolution (NHWC, PQS8, QS8, QC8W) | Convolution (NHWC, QC8) |
217-
| TransposeConv | Deconvolution (NHWC, PQS8, QS8, QC8W) | Deconvolution (NC, QS8, QC8W) |
218221

219222
The letter “P” in the node type indicates a KleidiAI implementation.
220223

content/learning-paths/mobile-graphics-and-gaming/litert-sme/_index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Accelerate LiteRT Models on Android with KleidiAI and SME2
33

4-
minutes_to_complete: 30
4+
minutes_to_complete: 45
55

66
who_is_this_for: This is an advanced topic for developers looking to leverage Arm's Scalable Matrix Extension 2 (SME2) instructions to accelerate LiteRT model inference on Android.
77

0 commit comments

Comments
 (0)