Releases · Thireus/llama.cpp

17 Apr 11:11

github-actions

b9503

47d8954

b9503 Latest

Latest

Merge branch 'ggml-org:master' into master

⚠️ CUDA Build Notice

CUDA builds have been downgraded to CUDA 13.1 due to an NVIDIA bug that affects certain quantized models when binaries are compiled with CUDA 13.2.

This issue only impacts llama.cpp binaries built with CUDA 13.2 (e.g. previous releases).
Your installed CUDA 13.2 drivers are not affected — no downgrade is needed.
NVIDIA is currently working on a fix.

Recommended workaround: Use binaries compiled with CUDA 12.8 or CUDA 13.1 for now.

Note: ik_llama.cpp is also affected by this issue.
Read more about it here: Thireus/GGUF-Tool-Suite#71

For ref. CUDA 12.8 supports Maxwell (5.0) to Hopper (9.0), while CUDA 13.1 supports Turing (7.5) to Blackwell (12.1) microarchitectures.

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 26

cudart-llama-bin-win-cuda-12.8-x64.zip

sha256:77723c83430f7524fbc53f1dbadea5ee73c9c51459817b71654d4939a8640d1a

537 MB 2026-04-17T11:11:53Z
cudart-llama-bin-win-cuda-13.1-x64.zip

sha256:f96935e7e385e3b2d0189239077c10fe8fd7e95690fea4afec455b1b6c7e3f18

384 MB 2026-04-17T11:12:08Z
llama-b9503-bin-310p-openEuler-aarch64.tar.gz

sha256:436a247cba80f55fb06f16a0c8c015f639d74ac6700d80fbc19fb8f9aad787ae

64.3 MB 2026-04-17T11:12:21Z
llama-b9503-bin-310p-openEuler-x86.tar.gz

sha256:925e80ce1781b15b0f9115d2ce3515ed2a966fa34eecd69cb40204e3e0c509a5

71.8 MB 2026-04-17T11:12:24Z
llama-b9503-bin-910b-openEuler-aarch64-aclgraph.tar.gz

sha256:cf8fb0b8b4b10487aa1651c91c4c4d2041c5ee123376e7575995ac50e806bc8b

64.3 MB 2026-04-17T11:12:27Z
llama-b9503-bin-910b-openEuler-x86-aclgraph.tar.gz

sha256:026afcd93ddd11b754df2e9f23fe6bce7e667fc171cc92c0a9d90e762dc372fa

71.8 MB 2026-04-17T11:12:29Z
llama-b9503-bin-macos-arm64-kleidiai.tar.gz

sha256:3b40fd47b930cade5a60d0e01c4b1e01ed0f3f39615813371a4a600d6f687bdf

84.3 MB 2026-04-17T11:12:32Z
llama-b9503-bin-macos-arm64.tar.gz

sha256:f235b35da4c4db658b38f33101465ff6d104030639d7403413265355458a4f00

84.3 MB 2026-04-17T11:12:36Z
llama-b9503-bin-macos-x64.tar.gz

sha256:9cfd29467a306f9894b59c2ecc0c0d506fcbeebe7c5239f2626802647c725325

84.4 MB 2026-04-17T11:12:40Z
llama-b9503-bin-ubuntu-arm64.tar.gz

sha256:3ba15849ed22e18c07ba732b7cb15228f87325d933393382a20f60a46134b801

55.8 MB 2026-04-17T11:12:43Z
Source code (zip)

2026-04-17T06:28:04Z
Source code (tar.gz)

2026-04-17T06:28:04Z

17 Apr 04:16

github-actions

b9496

ccaae8d

b9496

Merge branch 'ggml-org:master' into master

⚠️ CUDA Build Notice

CUDA builds have been downgraded to CUDA 13.1 due to an NVIDIA bug that affects certain quantized models when binaries are compiled with CUDA 13.2.

This issue only impacts llama.cpp binaries built with CUDA 13.2 (e.g. previous releases).
Your installed CUDA 13.2 drivers are not affected — no downgrade is needed.
NVIDIA is currently working on a fix.

Recommended workaround: Use binaries compiled with CUDA 12.8 or CUDA 13.1 for now.

Note: ik_llama.cpp is also affected by this issue.
Read more about it here: Thireus/GGUF-Tool-Suite#71

For ref. CUDA 12.8 supports Maxwell (5.0) to Hopper (9.0), while CUDA 13.1 supports Turing (7.5) to Blackwell (12.1) microarchitectures.

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 26

16 Apr 11:16

github-actions

b9483

a138a5a

b9483

Merge branch 'ggml-org:master' into master

⚠️ CUDA Build Notice

CUDA builds have been downgraded to CUDA 13.1 due to an NVIDIA bug that affects certain quantized models when binaries are compiled with CUDA 13.2.

This issue only impacts llama.cpp binaries built with CUDA 13.2 (e.g. previous releases).
Your installed CUDA 13.2 drivers are not affected — no downgrade is needed.
NVIDIA is currently working on a fix.

Recommended workaround: Use binaries compiled with CUDA 12.8 or CUDA 13.1 for now.

Note: ik_llama.cpp is also affected by this issue.
Read more about it here: Thireus/GGUF-Tool-Suite#71

For ref. CUDA 12.8 supports Maxwell (5.0) to Hopper (9.0), while CUDA 13.1 supports Turing (7.5) to Blackwell (12.1) microarchitectures.

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 26

16 Apr 02:14

github-actions

b9480

6e6e53f

b9480

Merge branch 'ggml-org:master' into master

⚠️ CUDA Build Notice

CUDA builds have been downgraded to CUDA 13.1 due to an NVIDIA bug that affects certain quantized models when binaries are compiled with CUDA 13.2.

This issue only impacts llama.cpp binaries built with CUDA 13.2 (e.g. previous releases).
Your installed CUDA 13.2 drivers are not affected — no downgrade is needed.
NVIDIA is currently working on a fix.

Recommended workaround: Use binaries compiled with CUDA 12.8 or CUDA 13.1 for now.

Note: ik_llama.cpp is also affected by this issue.
Read more about it here: Thireus/GGUF-Tool-Suite#71

For ref. CUDA 12.8 supports Maxwell (5.0) to Hopper (9.0), while CUDA 13.1 supports Turing (7.5) to Blackwell (12.1) microarchitectures.

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 26

15 Apr 19:08

github-actions

b9476

cf0ee27

b9476

Merge branch 'ggml-org:master' into master

⚠️ CUDA Build Notice

CUDA builds have been downgraded to CUDA 13.1 due to an NVIDIA bug that affects certain quantized models when binaries are compiled with CUDA 13.2.

This issue only impacts llama.cpp binaries built with CUDA 13.2 (e.g. previous releases).
Your installed CUDA 13.2 drivers are not affected — no downgrade is needed.
NVIDIA is currently working on a fix.

Recommended workaround: Use binaries compiled with CUDA 12.8 or CUDA 13.1 for now.

Note: ik_llama.cpp is also affected by this issue.
Read more about it here: Thireus/GGUF-Tool-Suite#71

For ref. CUDA 12.8 supports Maxwell (5.0) to Hopper (9.0), while CUDA 13.1 supports Turing (7.5) to Blackwell (12.1) microarchitectures.

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 26

15 Apr 05:48

github-actions

b9468

d6a5b97

b9468

Merge branch 'ggml-org:master' into master

⚠️ CUDA Build Notice

CUDA builds have been downgraded to CUDA 13.1 due to an NVIDIA bug that affects certain quantized models when binaries are compiled with CUDA 13.2.

This issue only impacts llama.cpp binaries built with CUDA 13.2 (e.g. previous releases).
Your installed CUDA 13.2 drivers are not affected — no downgrade is needed.
NVIDIA is currently working on a fix.

Recommended workaround: Use binaries compiled with CUDA 12.8 or CUDA 13.1 for now.

Note: ik_llama.cpp is also affected by this issue.
Read more about it here: Thireus/GGUF-Tool-Suite#71

For ref. CUDA 12.8 supports Maxwell (5.0) to Hopper (9.0), while CUDA 13.1 supports Turing (7.5) to Blackwell (12.1) microarchitectures.

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 26

15 Apr 05:14

github-actions

b9466

e48cebf

b9466

Merge branch 'ggml-org:master' into master

⚠️ CUDA Build Notice

CUDA builds have been downgraded to CUDA 13.1 due to an NVIDIA bug that affects certain quantized models when binaries are compiled with CUDA 13.2.

This issue only impacts llama.cpp binaries built with CUDA 13.2 (e.g. previous releases).
Your installed CUDA 13.2 drivers are not affected — no downgrade is needed.
NVIDIA is currently working on a fix.

Recommended workaround: Use binaries compiled with CUDA 12.8 or CUDA 13.1 for now.

Note: ik_llama.cpp is also affected by this issue.
Read more about it here: Thireus/GGUF-Tool-Suite#71

For ref. CUDA 12.8 supports Maxwell (5.0) to Hopper (9.0), while CUDA 13.1 supports Turing (7.5) to Blackwell (12.1) microarchitectures.

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 26

14 Apr 07:50

github-actions

b9449

fd60ed1

b9449

Merge branch 'ggml-org:master' into master

⚠️ CUDA Build Notice

CUDA builds have been downgraded to CUDA 13.1 due to an NVIDIA bug that affects certain quantized models when binaries are compiled with CUDA 13.2.

This issue only impacts llama.cpp binaries built with CUDA 13.2 (e.g. previous releases).
Your installed CUDA 13.2 drivers are not affected — no downgrade is needed.
NVIDIA is currently working on a fix.

Recommended workaround: Use binaries compiled with CUDA 12.8 or CUDA 13.1 for now.

Note: ik_llama.cpp is also affected by this issue.
Read more about it here: Thireus/GGUF-Tool-Suite#71

For ref. CUDA 12.8 supports Maxwell (5.0) to Hopper (9.0), while CUDA 13.1 supports Turing (7.5) to Blackwell (12.1) microarchitectures.

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 26

13 Apr 23:58

github-actions

b9443

b1d9d2f

b9443

Merge branch 'ggml-org:master' into master

⚠️ CUDA Build Notice

CUDA builds have been downgraded to CUDA 13.1 due to an NVIDIA bug that affects certain quantized models when binaries are compiled with CUDA 13.2.

This issue only impacts llama.cpp binaries built with CUDA 13.2 (e.g. previous releases).
Your installed CUDA 13.2 drivers are not affected — no downgrade is needed.
NVIDIA is currently working on a fix.

Recommended workaround: Use binaries compiled with CUDA 12.8 or CUDA 13.1 for now.

Note: ik_llama.cpp is also affected by this issue.
Read more about it here: Thireus/GGUF-Tool-Suite#71

For ref. CUDA 12.8 supports Maxwell (5.0) to Hopper (9.0), while CUDA 13.1 supports Turing (7.5) to Blackwell (12.1) microarchitectures.

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 26

13 Apr 03:05

github-actions

b9435

3122e85

b9435

Merge branch 'ggml-org:master' into master

⚠️ CUDA Build Notice

CUDA builds have been downgraded to CUDA 13.1 due to an NVIDIA bug that affects certain quantized models when binaries are compiled with CUDA 13.2.

This issue only impacts llama.cpp binaries built with CUDA 13.2 (e.g. previous releases).
Your installed CUDA 13.2 drivers are not affected — no downgrade is needed.
NVIDIA is currently working on a fix.

Recommended workaround: Use binaries compiled with CUDA 12.8 or CUDA 13.1 for now.

Note: ik_llama.cpp is also affected by this issue.
Read more about it here: Thireus/GGUF-Tool-Suite#71

For ref. CUDA 12.8 supports Maxwell (5.0) to Hopper (9.0), while CUDA 13.1 supports Turing (7.5) to Blackwell (12.1) microarchitectures.

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 26

Releases: Thireus/llama.cpp

b9503

Uh oh!

b9496

Uh oh!

b9483

Uh oh!

b9480

Uh oh!

b9476

Uh oh!

b9468

Uh oh!

b9466

Uh oh!

b9449

Uh oh!

b9443

Uh oh!

b9435

Uh oh!