Releases: l3utterfly/llama.cpp
Releases · l3utterfly/llama.cpp
b4219
sycl : Reroute permuted mul_mats through oneMKL (#10408) This PR fixes the failing MUL_MAT tests for the sycl backend.
b4200
ci : faster CUDA toolkit installation method and use ccache (#10537) * ci : faster CUDA toolkit installation method and use ccache * remove fetch-depth * only pack CUDA runtime on master
b4098
vulkan: Optimize some mat-vec mul quant shaders (#10296)
Compute two result elements per workgroup (for Q{4,5}_{0,1}). This reuses
the B loads across the rows and also reuses some addressing calculations.
This required manually partially unrolling the loop, since the compiler
is less willing to unroll outer loops.
Add bounds-checking on the last iteration of the loop. I think this was at
least partly broken before.
Optimize the Q4_K shader to vectorize most loads and reduce the number of
bit twiddling instructions.
b4033
ggml : fix arch check in bf16_to_fp32 (#10164)
b3982
sync : ggml
b3902
cmake : do not build common library by default when standalone (#9804)
Layla v3.3.0
llama.cpp used in the Layla v3.3.0 release
Layla v3.2.0
Merge branch 'master' into layla-build
Layla v3.0.0
server : update readme about token probs (#4777) * updated server readme to reflect the gg/server-token-probs-4088 commit added explanation for the API's completion result which now includes `completion_probabilities`. Also added a JSON schema that shows the type/structure of `completion_probabilities`. * simplified the `completion_probabilities` JSON schema It's now easier to understand what the structure of `completion_probabilities` looks like. * minor : fix trailing whitespace --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Layla v2.0.0
Merge branch 'master' into layla-build