Skip to content

Pull requests: ggml-org/llama.cpp

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

server : refactor "use checkpoint" logic examples server
#22114 opened Apr 19, 2026 by ggerganov Member Loading…
ggml: kleidi, cpu: beginnings of macOS cluster scheduling ggml changes relating to the ggml tensor library for machine learning
#22113 opened Apr 19, 2026 by mediouni-m Contributor Loading…
CUDA: PoC for repacking mxfp4 ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs
#22112 opened Apr 19, 2026 by am17an Contributor Draft
rpc: rdma: default max_inline_data to 0 for portability ggml changes relating to the ggml tensor library for machine learning
#22111 opened Apr 19, 2026 by Laitaps Loading…
unicode,test: add Qwen3.5 non-backtracking tokenizer handler and regr… testing Everything test related
#22110 opened Apr 19, 2026 by Kabir08 Contributor Loading…
chat : add MiniMax M2 specialized tool-call handler testing Everything test related
#22106 opened Apr 19, 2026 by doctorjei Loading…
[Speculative decoding] feat: add DFlash support examples model Model specific python python script changes server
#22105 opened Apr 19, 2026 by ruixiang63 Draft
feat: Support sarashina2.2-vision-3b model examples python python script changes
#22103 opened Apr 19, 2026 by samuraieng Loading…
fix: GLM-DSA crash in llama-tokenize when using vocab_only
#22102 opened Apr 19, 2026 by ssam18 Contributor Loading…
[WebGPU] Implement async tensor api and event api devops improvements to build systems and github actions ggml changes relating to the ggml tensor library for machine learning WebGPU
#22099 opened Apr 18, 2026 by nikhilJain17 Contributor Draft
[SYCL] Add Zero-Copy path with Cache Flushing for Intel UMA (Lunar Lake/Meteor Lake) ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
#22098 opened Apr 18, 2026 by i-Charlys Loading…
hip: bypass memory pool for flash attention f16 temp buffers ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs
#22094 opened Apr 18, 2026 by TheTom Draft
[WIP]hexagon: hmx opt phase2 ggml changes relating to the ggml tensor library for machine learning Hexagon
#22086 opened Apr 18, 2026 by chraac Contributor Draft
[SYCL] Update oneapi 2025.3.3, Seperate SYCL build, release Ubuntu 24 package. devops improvements to build systems and github actions documentation Improvements or additions to documentation SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
#22078 opened Apr 18, 2026 by NeoZhangJianyu Contributor Loading…
common/autoparser : allow space after tool call testing Everything test related
#22073 opened Apr 18, 2026 by aldehir Contributor Loading…
sycl: Battlemage (BMG) optimizations — AOT, Q5_K reorder, PAD stride fix, new ops, oneMKL routing ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
#22066 opened Apr 17, 2026 by aicss-genai Loading…
Extend LoRA hotswapping support examples python python script changes server
#22061 opened Apr 17, 2026 by skiz Loading…
GGML: Allow static build with dynamic loaded backends ggml changes relating to the ggml tensor library for machine learning
#22059 opened Apr 17, 2026 by ervanalb Loading…
2 tasks done
spec: save the dynamic/static ngram cache file
#22055 opened Apr 17, 2026 by petersid2022 Loading…
quant: handle shared-KV layer tensors in imatrix-dependent quantization testing Everything test related
#22054 opened Apr 17, 2026 by ajfonthemove Loading…
3 tasks
ProTip! Exclude everything labeled bug with -label:bug.