You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/build.md
+7-1Lines changed: 7 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -599,7 +599,13 @@ If KleidiAI is enabled, the output will contain a line similar to:
599
599
```
600
600
load_tensors: CPU_KLEIDIAI model buffer size = 3474.00 MiB
601
601
```
602
-
KleidiAI's microkernels implement optimized tensor operations using Arm CPU features such as dotprod, int8mm and SME. llama.cpp selects the most efficient kernel based on runtime CPU feature detection. However, on platforms that support SME, you must manually enable SME microkernels by setting the environment variable `GGML_KLEIDIAI_SME=1`.
602
+
KleidiAI’s microkernels implement optimized tensor operations using Arm CPU features such as dotprod, int8mm, SVE, and SME. Llama.cpp selects the most efficient kernels at runtime based on detected CPU capabilities.
603
+
On CPUs that support SME, SME microkernels are enabled automatically using runtime detection.
604
+
The environment variable GGML_KLEIDIAI_SME can be used to control SME behavior:
605
+
- Not set: enable SME automatically if supported and detected.
606
+
- 0: disable SME.
607
+
- <n> > 0: enable SME and assume <n> available SME units (override auto detection).
608
+
If SME is not supported by the CPU, SME microkernels are always disabled.
603
609
604
610
Depending on your build target, other higher priority backends may be enabled by default. To ensure the CPU backend is used, you must disable the higher priority backends either at compile time, e.g. -DGGML_METAL=OFF, or during run-time using the command line option `--device none`.
0 commit comments