Skip to content

Releases: huydt84/llama.cpp

b5994

26 Jul 03:02
c7f3169

Choose a tag to compare

ggml-cpu : disable GGML_NNPA by default due to instability (#14880)

* docs: update s390x document for sentencepiece

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit e086c5e3a7ab3463d8e0906efcfa39352db0a48d)

* docs: update huggingface links + reword

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 8410b085ea8c46e22be38266147a1e94757ef108)

* ggml-cpu: disable ggml-nnpa compile flag by default

fixes #14877

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 412f4c7c88894b8f55846b4719c76892a23cfe09)

* docs: update s390x build docs to reflect nnpa disable

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit c1eeae1d0c2edc74ab9fbeff2707b0d357cf0b4d)

---------

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

b5930

18 Jul 09:15
f9a31ee

Choose a tag to compare

CUDA: set_rows + cpy.cu refactor (#14712)

b5921

17 Jul 16:23
086cf81

Choose a tag to compare

llama : fix parallel processing for lfm2 (#14705)

b5797

02 Jul 05:22
de56944

Choose a tag to compare

ci : disable fast-math for Metal GHA CI (#14478)

* ci : disable fast-math for Metal GHA CI

ggml-ci

* cont : remove -g flag

ggml-ci

b5787

30 Jun 23:55
0a5a3b5

Choose a tag to compare

Add Conv2d for CPU (#14388)

* Conv2D: Add CPU version

* Half decent

* Tiled approach for F32

* remove file

* Fix tests

* Support F16 operations

* add assert about size

* Review: further formatting fixes, add assert and use CPU version of fp32->fp16

b5682

16 Jun 13:11
ad590be

Choose a tag to compare

model : add NeoBERT (#14164)

* convert neobert model to gguf

* add inference graph

* fix flake8 lint

* followed reviewer suggestions

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* follow reviewers suggestions

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* override NeoBERT feed-forward length

---------

Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

b5679

16 Jun 10:58
3ba0d84

Choose a tag to compare

ggml: Add Android support for GGML_CPU_ALL_VARIANTS (#14206)

b5674

16 Jun 01:46
d7da8dc

Choose a tag to compare

model : Add support for Arcee AI's upcoming AFM model (#14185)

* Add Arcee AFM support

* Add draft update code

* Fix linter and update URL, may still not be final

* Update src/llama-model.cpp

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

* Remote accidental blank line

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

b5669

15 Jun 13:05
5fce5f9

Choose a tag to compare

kv-cache : fix use-after-move of defrag info (#14189)

ggml-ci

b5650

13 Jun 08:12
09cf2c7

Choose a tag to compare

cmake : Improve build-info.cpp generation (#14156)

* cmake: Simplify build-info.cpp generation

The rebuild of build-info.cpp still gets triggered when .git/index gets
changes.

* cmake: generate build-info.cpp in build dir