Releases · gabriellarson/llama.cpp

15 Sep 07:56

b8e09f0

b6475 Latest

Latest

model : add grok-2 support (#15539)

* add grok-2 support

* type fix

* type fix

* type fix

* "fix" vocab for invalid sequences

* fix expert tensor mapping and spaces in vocab

* add chat template

* fix norm tensor mapping

* rename layer_out_norm to ffn_post_norm

* ensure ffn_post_norm is mapped

* fix experts merging

* remove erroneous FFN_GATE entry

* concatenate split tensors and add more metadata

* process all expert layers and try cat instead of hstack

* add support for community BPE vocab

* fix expert feed forward length and ffn_down concat

* commit this too

* add ffn_up/gate/down, unsure if sequence is right

* add ffn_gate/down/up to tensor names

* correct residual moe (still not working)

* mess--

* fix embedding scale being applied twice

* add built in chat template

* change beta fast for grok if default value

* remove spm vocab in favor of community bpe vocab

* change attention temp length metadata type to integer

* update attention temp length metadata

* remove comment

* replace M_SQRT2 with std::sqrt(2)

* add yarn metadata, move defaults to hparams

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-09-15T07:57:00Z
llama-b6475-bin-macos-arm64.zip

sha256:31f668374d55be950f1faa495ddb2003538fa90955b66dcfafab95a127f9a5c4

11.2 MB 2025-09-15T07:57:17Z
llama-b6475-bin-macos-x64.zip

sha256:e1b93248708746723ac8500e6591853b98098544e7b41e0cffeeeca245805a9a

29.3 MB 2025-09-15T07:57:18Z
llama-b6475-bin-ubuntu-vulkan-x64.zip

sha256:68ace9a569342bd5e5fece9d6d78f8a6e33b04d90aff54d164bfbc547f00af60

26.1 MB 2025-09-15T07:57:20Z
llama-b6475-bin-ubuntu-x64.zip

sha256:7e3fe670624968d8e7028d86018d2673d8abbb9689fc9962a8b1dc9251711b7f

13.2 MB 2025-09-15T07:57:22Z
llama-b6475-bin-win-cpu-arm64.zip

sha256:f6d9a9ae71eaec197ebd383f7cc847afe932f0846ef47095bf07a0d9547ac159

11.4 MB 2025-09-15T07:57:23Z
llama-b6475-bin-win-cpu-x64.zip

sha256:9936268da99c9fc7f4b9c8f7eef9b11a1d1574643b932d985e81de4387c82dd6

14.4 MB 2025-09-15T07:57:24Z
llama-b6475-bin-win-cuda-12.4-x64.zip

sha256:4e5b8f06bd120e07d5af279c5f6ab9f4a4849cae87992bba54602752e35fae3d

147 MB 2025-09-15T07:57:26Z
llama-b6475-bin-win-hip-radeon-x64.zip

sha256:55e37f4a90850e99775886ffcf8db96e5c8aa2f79c41fb37ea727493446f6c9f

310 MB 2025-09-15T07:57:32Z
llama-b6475-bin-win-opencl-adreno-arm64.zip

sha256:4d78f3d031bb961e0268b1e560eba94389a3f17b005c59c585bc433973d022be

11.8 MB 2025-09-15T07:57:45Z
Source code (zip)

2025-09-14T21:00:59Z
Source code (tar.gz)

2025-09-14T21:00:59Z

05 Aug 10:01

github-actions

b6090

ee3a9fc

b6090

context : fix index overflow on huge outputs (#15080)

* context : fix overflow when re-ordering huge outputs

* context : fix logits size overflow for huge batches

Assets 15

04 Aug 13:22

github-actions

b6082

5aa1105

b6082

vulkan: fix build when using glslang that does not support coopmat2 (…

Assets 15

03 Aug 16:50

github-actions

b6077

83bc2f2

b6077

model : add text-only support for Kimi-VL (and find special tokens in…

Assets 15

03 Aug 05:17

github-actions

b6075

5c0eb5e

b6075

opencl: fix adreno compiler detection logic (#15029)

Assets 15

27 Jul 09:27

github-actions

b6001

f1a4e72

b6001

vulkan: skip empty set_rows to avoid invalid API usage (#14860)

Assets 15

27 Jul 06:37

github-actions

b5998

446595b

b5998

Docs: add instructions for adding backends (#14889)

Assets 15

26 Jul 01:53

github-actions

b5994

c7f3169

b5994

ggml-cpu : disable GGML_NNPA by default due to instability (#14880)

* docs: update s390x document for sentencepiece

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit e086c5e3a7ab3463d8e0906efcfa39352db0a48d)

* docs: update huggingface links + reword

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 8410b085ea8c46e22be38266147a1e94757ef108)

* ggml-cpu: disable ggml-nnpa compile flag by default

fixes #14877

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 412f4c7c88894b8f55846b4719c76892a23cfe09)

* docs: update s390x build docs to reflect nnpa disable

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit c1eeae1d0c2edc74ab9fbeff2707b0d357cf0b4d)

---------

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

Assets 15

15 Jul 23:04

github-actions

b5902

4a4f426

b5902

model : add Kimi-K2 support (#14654)

* Kimi-K2 conversion

* add Kimi_K2  pre type

* Kimi-K2

* Kimi-K2 unicode

* Kimi-K2

* LLAMA_MAX_EXPERTS 384

* fix vocab iteration

* regex space fix

* add kimi-k2 to pre_computed_hashes

* Updated with kimi-k2 get_vocab_base_pre hash

* fix whitespaces

* fix flake errors

* remove more unicode.cpp whitespaces

* change set_vocab() flow

* add moonshotai-Kimi-K2.jinja to /models/templates/

* update moonshotai-Kimi-K2.jinja

* add kimi-k2 chat template

* add kimi-k2

* update NotImplementedError

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* except Exception

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* LLM_CHAT_TEMPLATE_KIMI_K2 if(add_ass){}

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

Assets 15

12 Jul 18:51

github-actions

b5884

c31e606

b5884

tests : cover lfm2 cases in test_ssm_conv (#14651)

Assets 15

Releases: gabriellarson/llama.cpp

b6475

Uh oh!

b6090

Uh oh!

b6082

Uh oh!

b6077

Uh oh!

b6075

Uh oh!

b6001

Uh oh!

b5998

Uh oh!

b5994

Uh oh!

b5902

Uh oh!

b5884

Uh oh!