You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/mobile-graphics-and-gaming/litert-sme/1-litert-kleidiai-sme2.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ layout: learningpathall
8
8
9
9
## Inside the LiteRT software stack
10
10
11
-
LiteRT (Lightweight Runtime, formerly TensorFlow Lite) is a runtime for on-device AI on Arm platforms. The default CPU acceleration library used by LiteRT is XNNPACK.
11
+
LiteRT (Lite Runtime, formerly TensorFlow Lite) is a runtime for on-device AI. The default CPU acceleration library used by LiteRT is XNNPACK.
12
12
13
13
XNNPACK is an open-source library that provides highly optimized implementations of neural-network operators. It continuously integrates the KleidiAI library to use new CPU features such as Scalable Matrix Extension 2 (SME2).
14
14
@@ -25,13 +25,13 @@ To understand how KleidiAI SME2 micro-kernels work in LiteRT, think about a Lite
25
25
### LiteRT → XNNPACK workflow
26
26
27
27

28
-
A fully connected operator multiplies two matrices: the input activations (LHS) and the weights (RHS).
28
+
For batch sizes greater than 1, a fully connected operator performs a matrix multiplication between the input activations (LHS) and the weights (RHS).
29
29
30
30
When LiteRT loads a model, it reads the operators and builds a computation graph. If you select the CPU as the accelerator, LiteRT uses XNNPACK by default.
31
31
32
-
XNNPACK scans the computation graph and looks for operators it can optimize. It packs the weight matrix to prepare for efficient computation. On Arm platforms, XNNPACK uses NEON instructions to speed up this packing and the matrix multiplication.
32
+
XNNPACK scans the computation graph and looks for operators it can optimize. XNNPACK also checks the hardware compatibility and chooses the best available micro-kernel. Then, it packs the weight matrix to prepare for efficient computation. On Arm platforms, XNNPACK uses NEON instructions to speed up this packing.
33
33
34
-
At runtime, XNNPACK checks the hardware and chooses the best available micro-kernel. During inference, it splits the matrices into smaller tiles and runs the multiplications in parallel across multiple threads, using NEON instructions for faster processing.
34
+
During model inference, it splits the matrices into smaller tiles and runs the multiplications in parallel across multiple threads, using NEON instructions for faster processing.
35
35
36
36
### LiteRT → XNNPACK → KleidiAI workflow
37
37
@@ -41,7 +41,7 @@ When KleidiAI and SME2 are enabled at build time, the KleidiAI SME2 micro-kernel
41
41
42
42
During the model loading stage, when XNNPACK optimizes the subgraph, it checks the operator’s data type to determine whether a KleidiAI implementation is available. If KleidiAI supports it, XNNPACK bypasses its own default implementation. As a result, RHS packing is performed using the KleidiAI SME packing micro-kernel. Because KleidiAI typically requires packing of the LHS, a flag is also set during this stage.
43
43
44
-
During model inference, the LHS packing micro-kernel is invoked. After the LHS is packed, XNNPACK performs the matrix multiplication. At this point, the KleidiAI SME micro-kernel is used to compute the matrix product.
44
+
During model inference, the LHS packing micro-kernel is invoked. After the LHS is packed, XNNPACK performs the matrix multiplication. At this point, the KleidiAI SME micro-kernel is used to compute the matrix.
Copy file name to clipboardExpand all lines: content/learning-paths/mobile-graphics-and-gaming/litert-sme/2-build-model.md
+3-5Lines changed: 3 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ layout: learningpathall
8
8
9
9
## KleidiAI SME2 support in LiteRT
10
10
11
-
LiteRT uses XNNPACK as its default CPU backend. KleidiAI micro-kernels are integrated through XNNPACK in LiteRT. Only a subset of KleidiAI Scalable Matrix Extension (SME and SME2) micro-kernels has been integrated into XNNPACK. These micro-kernels support operators using the following data types and quantization configurations in the LiteRT model. Other operators use XNNPACK's default implementation during inference.
11
+
LiteRT uses XNNPACK as its default CPU backend. KleidiAI micro-kernels are integrated through XNNPACK in LiteRT. Only a subset of KleidiAI SME2 micro-kernels has been integrated into XNNPACK. These micro-kernels support operators using the following data types and quantization configurations in the LiteRT model. Other operators use XNNPACK's default implementation during inference.
12
12
13
13
### Supported operator configurations
14
14
@@ -34,8 +34,8 @@ LiteRT uses XNNPACK as its default CPU backend. KleidiAI micro-kernels are integ
Because LiteRT integrates KleidiAI through XNNPACK (an open-source library providing highly optimized neural-network operators), you must build LiteRT from source to enable SME2 micro-kernels.
26
+
Because LiteRT integrates KleidiAI through XNNPACK, you must build LiteRT from source to enable SME2 micro-kernels.
27
27
28
-
Next, set up your Android build environment using Docker on your Linux development machine. Google provides a Dockerfile that installs the toolchain needed for TensorFlow Lite (TFLite)/LiteRT Android builds.
28
+
Next, set up your Android build environment using Docker on your Linux development machine. Google provides a Dockerfile that installs the toolchain needed for LiteRT Android builds.
Copy file name to clipboardExpand all lines: content/learning-paths/mobile-graphics-and-gaming/litert-sme/_index.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
2
title: Accelerate LiteRT Models on Android with KleidiAI and SME2
3
3
4
-
minutes_to_complete: 30
4
+
minutes_to_complete: 45
5
5
6
6
who_is_this_for: This is an advanced topic for developers looking to leverage Arm's Scalable Matrix Extension 2 (SME2) instructions to accelerate LiteRT model inference on Android.
0 commit comments