Releases · l3utterfly/llama.cpp · GitHub

29 Nov 10:53

b4219

sycl : Reroute permuted mul_mats through oneMKL (#10408)

This PR fixes the failing MUL_MAT tests for the sycl backend.

Assets 22

27 Nov 12:46

b4200

ci : faster CUDA toolkit installation method and use ccache (#10537)

* ci : faster CUDA toolkit installation method and use ccache

* remove fetch-depth

* only pack CUDA runtime on master

Assets 22

16 Nov 07:52

b4098

vulkan: Optimize some mat-vec mul quant shaders (#10296)

Compute two result elements per workgroup (for Q{4,5}_{0,1}). This reuses
the B loads across the rows and also reuses some addressing calculations.
This required manually partially unrolling the loop, since the compiler
is less willing to unroll outer loops.

Add bounds-checking on the last iteration of the loop. I think this was at
least partly broken before.

Optimize the Q4_K shader to vectorize most loads and reduce the number of
bit twiddling instructions.

Assets 21

05 Nov 08:45

b4033

ggml : fix arch check in bf16_to_fp32 (#10164)

Assets 22

27 Oct 09:02

b3982

sync : ggml

Assets 22

10 Oct 03:40

b3902

cmake : do not build common library by default when standalone (#9804)

Assets 22

18 Jan 04:15

l3utterfly

Layla v3.3.0

llama.cpp used in the Layla v3.3.0 release

Assets 2

16 Jan 04:04

l3utterfly

Layla v3.2.0

Merge branch 'master' into layla-build

Assets 2

09 Jan 14:23

l3utterfly

Layla v3.0.0

server : update readme about token probs (#4777)

* updated server readme to reflect the gg/server-token-probs-4088 commit

added explanation for the API's completion result which now includes `completion_probabilities`. Also added a JSON schema that shows the type/structure of `completion_probabilities`.

* simplified the `completion_probabilities` JSON schema 

It's now easier to understand what the structure of `completion_probabilities` looks like.

* minor : fix trailing whitespace

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Assets 2

18 Dec 06:54

l3utterfly

Layla v2.0.0

Merge branch 'master' into layla-build

Assets 2