Releases · huydt84/llama.cpp

26 Jul 03:02

c7f3169

b5994 Latest

Latest

ggml-cpu : disable GGML_NNPA by default due to instability (#14880)

* docs: update s390x document for sentencepiece

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit e086c5e3a7ab3463d8e0906efcfa39352db0a48d)

* docs: update huggingface links + reword

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 8410b085ea8c46e22be38266147a1e94757ef108)

* ggml-cpu: disable ggml-nnpa compile flag by default

fixes #14877

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 412f4c7c88894b8f55846b4719c76892a23cfe09)

* docs: update s390x build docs to reflect nnpa disable

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit c1eeae1d0c2edc74ab9fbeff2707b0d357cf0b4d)

---------

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-07-26T03:02:34Z
llama-b5994-bin-macos-arm64.zip

sha256:76514b0695c4e120e01e3c4e94bd8b1324cc3ccc39b0b7abfe0367eb43c06cd9

10.6 MB 2025-07-26T03:02:44Z
llama-b5994-bin-macos-x64.zip

sha256:f70d28dfb70634a15591d831583f87773f39042a27afe9ba2f8a5d13d4187593

27.2 MB 2025-07-26T03:02:45Z
llama-b5994-bin-ubuntu-vulkan-x64.zip

sha256:fe559a3ebcb26bca54d59306abcbb1aaa6d9c633787a8ab6d4fe9c3086a39302

20.9 MB 2025-07-26T03:02:47Z
llama-b5994-bin-ubuntu-x64.zip

sha256:9a6dd46d6bbe13aae50513cc25aafefba74b61dd1e842f30a5f4f4ccdf6cc80b

12.5 MB 2025-07-26T03:02:48Z
llama-b5994-bin-win-cpu-arm64.zip

sha256:0411e8c15f37899ee80711b8e917ebabcf9c95a1f34d684908e415f0545801f2

10.9 MB 2025-07-26T03:02:49Z
llama-b5994-bin-win-cpu-x64.zip

sha256:4d3f6b1eb52aa098289a91ec2bd2d81696e9585141ee83b63356dc368858cb16

13.7 MB 2025-07-26T03:02:50Z
llama-b5994-bin-win-cuda-12.4-x64.zip

sha256:3b0228ed7cdcfdba366a1f9928e316325f30b8dbaa748a5eed6aa5f1f27de328

129 MB 2025-07-26T03:02:51Z
llama-b5994-bin-win-hip-radeon-x64.zip

sha256:bd9db95e0bc415d833da59841a7822fc4bc26fb4dcde5b354715963eb62a51b8

299 MB 2025-07-26T03:02:55Z
llama-b5994-bin-win-opencl-adreno-arm64.zip

sha256:3c783e0c94b01e86126604ec9ec5d6d272df67d392c17cc5ef53fee3568be40e

11.2 MB 2025-07-26T03:03:05Z
Source code (zip)

2025-07-25T17:09:03Z
Source code (tar.gz)

2025-07-25T17:09:03Z

18 Jul 09:15

github-actions

b5930

f9a31ee

b5930

CUDA: set_rows + cpy.cu refactor (#14712)

Assets 15

17 Jul 16:23

github-actions

b5921

086cf81

b5921

llama : fix parallel processing for lfm2 (#14705)

Assets 15

02 Jul 05:22

github-actions

b5797

de56944

b5797

ci : disable fast-math for Metal GHA CI (#14478)

* ci : disable fast-math for Metal GHA CI

ggml-ci

* cont : remove -g flag

ggml-ci

Assets 15

30 Jun 23:55

github-actions

b5787

0a5a3b5

b5787

Add Conv2d for CPU (#14388)

* Conv2D: Add CPU version

* Half decent

* Tiled approach for F32

* remove file

* Fix tests

* Support F16 operations

* add assert about size

* Review: further formatting fixes, add assert and use CPU version of fp32->fp16

Assets 15

16 Jun 13:11

github-actions

b5682

ad590be

b5682

model : add NeoBERT (#14164)

* convert neobert model to gguf

* add inference graph

* fix flake8 lint

* followed reviewer suggestions

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* follow reviewers suggestions

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* override NeoBERT feed-forward length

---------

Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Assets 15

16 Jun 10:58

github-actions

b5679

3ba0d84

b5679

ggml: Add Android support for GGML_CPU_ALL_VARIANTS (#14206)

Assets 15

16 Jun 01:46

github-actions

b5674

d7da8dc

b5674

model : Add support for Arcee AI's upcoming AFM model (#14185)

* Add Arcee AFM support

* Add draft update code

* Fix linter and update URL, may still not be final

* Update src/llama-model.cpp

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

* Remote accidental blank line

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

Assets 15

15 Jun 13:05

github-actions

b5669

5fce5f9

b5669

kv-cache : fix use-after-move of defrag info (#14189)

ggml-ci

Assets 15

13 Jun 08:12

github-actions

b5650

09cf2c7

b5650

cmake : Improve build-info.cpp generation (#14156)

* cmake: Simplify build-info.cpp generation

The rebuild of build-info.cpp still gets triggered when .git/index gets
changes.

* cmake: generate build-info.cpp in build dir

Assets 15

Releases: huydt84/llama.cpp

b5994

Uh oh!

b5930

Uh oh!

b5921

Uh oh!

b5797

Uh oh!

b5787

Uh oh!

b5682

Uh oh!

b5679

Uh oh!

b5674

Uh oh!

b5669

Uh oh!

b5650

Uh oh!