-
Notifications
You must be signed in to change notification settings - Fork 16.6k
Pull requests: ggml-org/llama.cpp
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
server : improve cache reuse diagnostics for SWA and hybrid models
#21693
opened Apr 9, 2026 by
1oridevs
Loading…
debug: functionality to dump full tensors and compare
examples
python
python script changes
#21691
opened Apr 9, 2026 by
pwilkin
Member
Loading…
common: mark --split-mode tensor as experimental
#21684
opened Apr 9, 2026 by
JohannesGaessler
Contributor
Loading…
Bug-Fix sets an upper VRAM limit for cached ggml_cuda graphs to prevent VRAM memory leaks
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#21673
opened Apr 9, 2026 by
kmorennv
Loading…
webui: Fix messages rendering for "Show raw output"
examples
server/webui
server
#21672
opened Apr 9, 2026 by
allozaur
Contributor
Loading…
common : fix when loading a cached HF models with unavailable API
#21670
opened Apr 9, 2026 by
angt
Member
Loading…
ggml-webgpu: support non-square subgroup matrix configs for Intel GPUs
ggml
changes relating to the ggml tensor library for machine learning
WebGPU
#21669
opened Apr 9, 2026 by
SharmaRithik
Loading…
webui: Static build output improvements
examples
server/webui
server
#21667
opened Apr 9, 2026 by
allozaur
Contributor
Loading…
CUDA: fuse muls
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#21665
opened Apr 9, 2026 by
am17an
Contributor
Loading…
Convert: Fix NemotronH Config Parsing
python
python script changes
#21664
opened Apr 9, 2026 by
anavp-nvidia
Contributor
Loading…
common: load parser in common_chat_parser_params constructor
#21659
opened Apr 9, 2026 by
sacredvoid
Contributor
Loading…
docker: add OCI image labels for version and build date
devops
improvements to build systems and github actions
#21653
opened Apr 9, 2026 by
ssam18
Contributor
Loading…
Prevent the sum of the dequantized activation in q8_1 from overflowing
#21652
opened Apr 9, 2026 by
bartowski1182
Contributor
Loading…
ci: add android arm64 build and release
#21647
opened Apr 8, 2026 by
ykhrustalev
Contributor
Loading…
convert : force f16 or f32 on step3-vl conv weights
#21646
opened Apr 8, 2026 by
CISC
Member
Loading…
ggml-webgpu: Update register tiling matmul to use f32 accumulation
#21644
opened Apr 8, 2026 by
reeselevine
Contributor
Loading…
fix(openvino): define PartialShape bounds for tensors
#21637
opened Apr 8, 2026 by
thedanhoffman
Contributor
Loading…
(Performance; ggml-cpu) Optimized x86 and generic cpu q1_0 dot (follow up)
#21636
opened Apr 8, 2026 by
pl752
Contributor
Loading…
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.