Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
256 commits
Select commit Hold shift + click to select a range
ba2ff79
ggml: update comments for backends which have no memory to report (#2…
taronaeo Mar 6, 2026
d48e876
ggml-cuda: add mem check for fusion (#19916)
am17an Mar 6, 2026
ba2fd11
cpu: skip redudant ROPE cache updates (#20149)
max-krasnyansky Mar 6, 2026
e68f2fb
server : preserve anthropic thinking blocks in conversion (#20120)
T0mSIlver Mar 6, 2026
34df42f
hexagon: add f32 ssm_conv op (#20122)
tboinovski1 Mar 6, 2026
566059a
Autoparser - complete refactoring of parser architecture (#18675)
pwilkin Mar 6, 2026
7463687
Add @pwilkin to CODEOWNERS for autoparser code (#20174)
pwilkin Mar 6, 2026
649f064
quants : Add memsets and other fixes for IQ quants (#19861)
bartowski1182 Mar 6, 2026
2f2923f
Autoparser: add optional argument reshuffle capability (#20171)
pwilkin Mar 6, 2026
c024d85
Autoparser: True streaming (#20177)
pwilkin Mar 7, 2026
6fce5c6
opencl: add l2_norm (#20160)
lhez Mar 7, 2026
c5a7788
ggml: add GATED_DELTA_NET op (#19504)
am17an Mar 7, 2026
213c4a0
[SYCL] supprt Flash Attention for fp32/fp16/Q4/Q5/Q8 (#20190)
arthw Mar 8, 2026
ff52ee9
server : correct index on finish in OAI completion streams (#20226)
decahedron1 Mar 8, 2026
b283f6d
Revert to OAI-compatible args (#20213)
pwilkin Mar 8, 2026
a950479
readme : update infra list (#20212)
Defilan Mar 8, 2026
a976ff0
llama: end-to-end tests (#19802)
JohannesGaessler Mar 8, 2026
cd18a50
vulkan: Fix data races in coopmat1 mul_mat(_id) (#20084)
jeffbolznv Mar 8, 2026
d088d5b
ggml-vulkan: Add ELU op support (#20183)
GiantPrince Mar 8, 2026
62b8143
Fix structured outputs (#20223)
pwilkin Mar 8, 2026
9b24886
Fix compile bug (#20203)
pwilkin Mar 8, 2026
451ef08
common : gracefully handle incomplete output (#20191)
aldehir Mar 8, 2026
35bee03
graph : remove redundant scale_w parameter (#20235)
CISC Mar 8, 2026
d417bc4
server : do not create checkpoints right after mtmd chunks (#20232)
ggerganov Mar 8, 2026
97c64fb
PEG parser for LFM2 (#20251)
pwilkin Mar 9, 2026
ae87863
llama-bench: introduce `-hf` and `-hff` flags & use `--mmap 1` by def…
taronaeo Mar 9, 2026
5f4cdac
cuda : display total and free VRAM capacity during device initializat…
tehsiuhuang Mar 9, 2026
b2f460b
vulkan: skip zero size tensors in backend copies (#20233)
0cc4m Mar 9, 2026
0beb8db
ggml-vulkan: add SGN operator, auto-generate Vulkan.csv and ops.md (#…
bertaye Mar 9, 2026
e2763a6
contributing: limit open PRs for new contributors to 1 (#20036)
am17an Mar 9, 2026
b518195
llama-quant : left-align tensor names in output (#20117)
ddh0 Mar 9, 2026
e8bbc73
ggml-cuda: disable gdn for musa (#20278)
am17an Mar 9, 2026
107d599
server : add kill switch when server is stuck (#20277)
ggerganov Mar 9, 2026
43e1cbd
models : fix assert in mamba2 graph (#20270)
ggerganov Mar 9, 2026
f76565d
common: map developer role to system (#20215)
pwilkin Mar 9, 2026
d6e1556
server : fix off-by-1 in server_tokens::size_up_to_pos() (#20279)
ggerganov Mar 9, 2026
344ee2a
server : warn swa-full is not supported for non-SWA models (#20291)
ggerganov Mar 9, 2026
ed0007a
metal : add upscale (#20284)
ggerganov Mar 9, 2026
96cfc49
server : fix checkpoints n_tokens calculation (#20287)
ggerganov Mar 9, 2026
e22cd0a
metal : extend mul_mv_ext to BF16, Q2_K, Q3_K (#20250)
arkavo-com Mar 9, 2026
23fbfcb
server: Parse port numbers from MCP server URLs in CORS proxy (#20208)
eapache Mar 9, 2026
59db9a3
llama: dynamic head_dim and n_rot for SWA (#20301)
ngxson Mar 9, 2026
0842b9b
model: fix step3.5 n_rot (#20318)
ngxson Mar 9, 2026
c96f608
common: consolidate PEG string parsers (#20263)
aldehir Mar 9, 2026
1dab5f5
llama-quant : fail early on missing imatrix, refactor type selection,…
ddh0 Mar 10, 2026
1a5631b
metal: handle command buffer failures gracefully in synchronize (#20306)
JulianPscheid Mar 10, 2026
af237f3
ggml-cpu: add RVV repack GEMM and GEMV for quantization types (#19121)
taimur-10x Mar 10, 2026
0cd4f47
kleidiai : support for concurrent sme and neon kernel execution (#20070)
chaxu01 Mar 10, 2026
ec947d2
common : fix incorrect uses of stoul (#20313)
CISC Mar 10, 2026
a7b3dee
server : make 2 checkpoints near the end of the prompt (#20288)
ggerganov Mar 10, 2026
1274fbe
models : fix assert in mamba2 (cont) (#20335)
ggerganov Mar 10, 2026
0f1e9d1
docs: update CPU backend ops to mark POOL_1D as supported (#20304)
a3894281 Mar 10, 2026
8d880ac
examples : fix empty items in json_schema_to_grammar.py [no ci] (#19968)
RayXu14 Mar 10, 2026
6c770d1
Reduce level of content parser warning message to avoid log spam on n…
pwilkin Mar 10, 2026
aa2d278
ggml webgpu: faster normal quant and some k-quant matrix operations, …
reeselevine Mar 10, 2026
90b2731
ggml : bump RPC version (#20330)
ggerganov Mar 10, 2026
10e5b14
llama-quant : correct `n_attention_wv` usage (#20357)
ddh0 Mar 10, 2026
4d99d45
model : qwen3vl reranker text support (#20332)
ViniciosLugli Mar 10, 2026
b2e1427
fix for failed UT case: ACC, L2_NORM, UPSCALE, fused_glu, unary (#20283)
arthw Mar 11, 2026
0cec84f
fix op rope, add rope_back (#20293)
arthw Mar 11, 2026
4f2f0a1
vendor : update miniaudio to 0.11.25 (#20209)
cabelo Mar 11, 2026
e1a3999
vendor : update cpp-httplib to 0.37.0 (#20207)
cabelo Mar 11, 2026
00de615
Fix agentic mcp image single model (#20339)
ServeurpersoCom Mar 11, 2026
9ef7523
cuda/hip: fix loop unrolling in ssm-conv (#20369)
IMbackK Mar 11, 2026
5f91b1d
ggml-cuda: gdn use shared mem for HIP (#20366)
IMbackK Mar 11, 2026
acb7c79
common/parser: handle reasoning budget (#20297)
pwilkin Mar 11, 2026
b5fe455
common/parser: use nlohmann::ordered_json to preserve parameter order…
aldehir Mar 11, 2026
182acfe
ci: disable coopmat on ubuntu-24-cmake-vulkan job (#20294)
0cc4m Mar 11, 2026
ecac98e
[SYCL] Update SYCL.md for binary package for Windows (#20401)
arthw Mar 11, 2026
c363256
metal : add env var to trigger graph capture (#20398)
ggerganov Mar 11, 2026
b541241
metal : fix q5_k mul_mv register spill (#20399)
ggerganov Mar 11, 2026
bd1ec81
compare-llama-bench: check remotes as well (#20406)
am17an Mar 11, 2026
76ea1c1
metal : fix capture_compute counter logic (#20410)
ggerganov Mar 11, 2026
eaf1d79
llama : add support for Nemotron 3 Super (#20411)
danbev Mar 11, 2026
3ca19b0
benches : add nemotron super (#20420)
ggerganov Mar 11, 2026
5eae9cb
ggml : add NVFP4 quantization type support (#19769)
richarddd Mar 11, 2026
f90bd1d
llama : whitespace cleanup (#20422)
CISC Mar 11, 2026
d28961d
llama : enable chunked fused GDN path (#20340)
ggerganov Mar 11, 2026
f2ab047
ggml-webgpu: Add supports for `GGML_OP_REPEAT` (#20230)
yomaytk Mar 11, 2026
4a748b8
common : fix --n-cpu-moe, --cpu-moe for models with fused gate + up (…
ddh0 Mar 11, 2026
1eea6a2
graph : add optional scale parameter to build_lora_mm [no ci] (#20427)
richarddd Mar 11, 2026
fdb1764
model : add support for Phi4ForCausalLMV (#20168)
dranger003 Mar 11, 2026
a8304b4
common/parser: add GigaChatV3/3.1 models support (#19931)
Mishusha Mar 12, 2026
d63aa39
hip: compile debug builds with -O2 on hip to avoid a compiler bug (#2…
IMbackK Mar 12, 2026
3d9ab22
opencl: add cumsum op (#18981)
shaofeiqi Mar 12, 2026
0516e04
opencl: use larger workgroup size for get_rows (#20316)
lhez Mar 12, 2026
5866e3b
vulkan: Fix ErrorOutOfHostMemory on Intel GPU when loading large mode…
rillomas Mar 12, 2026
aa429cf
vulkan: fix OOB check in flash_attn_mask_opt (#20296)
jeffbolznv Mar 12, 2026
246ffc4
vulkan: fix l2_norm epsilon handling (#20350)
jeffbolznv Mar 12, 2026
4cc6eb1
ci: Setup self-hosted CI for Intel Linux Vulkan backend (#20154)
rillomas Mar 12, 2026
e4cff09
metal : avoid divisions in bin kernel (#20426)
ggerganov Mar 12, 2026
0503996
ggml-virtgpu: Fix some build commands (#20341)
yomaytk Mar 12, 2026
de19015
New conversations now auto-select the first loaded model (#20403)
ServeurpersoCom Mar 12, 2026
40c550d
vulkan: fix SSM_CONV PP scaling with large ubatch sizes (#20379)
ProgenyAlpha Mar 12, 2026
c3e3f9e
convert : better mtp check and fix return [no ci] (#20419)
CISC Mar 12, 2026
deee238
vulkan: add GATED_DELTA_NET op support (#20334)
ProgenyAlpha Mar 12, 2026
0a10c34
grammar: Fix grammar root symbol check (#19761)
AsbjornOlling Mar 12, 2026
6de1bc6
common : update completion executables list [no ci] (#19934)
danbev Mar 12, 2026
128142f
test-backend-ops: allow loading tests from file and parsing model ope…
0cc4m Mar 12, 2026
0e81041
tests : use `reasoning` instead of `reasoning_budget` in server tests…
pwilkin Mar 12, 2026
557fe2d
vendor : update cpp-httplib to 0.37.1 (#20390)
cabelo Mar 12, 2026
57819b8
llama : disable graph reuse with pipeline parallelism (#20463)
ggerganov Mar 12, 2026
983df14
convert : fix/suppress pyright errors (#20442)
danbev Mar 13, 2026
73c9eb8
metal : fix l2 norm scale (#20493)
ggerganov Mar 13, 2026
2948e60
general: CONTRIBUTING.md - guidelines for quantization schemes (#19762)
pwilkin Mar 13, 2026
8f974d2
mtmd : rename mtmd_get_audio_bitrate to mtmd_get_audio_sample_rate (#…
danbev Mar 13, 2026
b5e1212
ggml : fix typo gmml (#20512)
angt Mar 13, 2026
fbaa95b
ggml-cpu: add RVV vec dot kernels for quantization types (#18859)
rehan-10xengineer Mar 13, 2026
d7ba99c
server: reset counter related to kill-switch on client error (#20513)
SoftwareRenderer Mar 13, 2026
f17b3be
llama : fix pooling assertion crash in chunked GDN detection path (#2…
ZeroV0LT Mar 13, 2026
1430c35
common/parser: gracefully handle undetected tool parser, print error …
pwilkin Mar 13, 2026
e30f1fd
graph : remove redundant GDN state transposes (#20443)
ggerganov Mar 13, 2026
463b6a9
tools : enable kvu in perplexity for hellaswag, winogrande, multiple-…
angt Mar 13, 2026
3b43950
opencl: fix l2_norm (#20480)
lhez Mar 14, 2026
5a32a9b
Fix data race in CUDA's "cpy" kernel (influences GGML's DUP, CONT ope…
Exile333 Mar 14, 2026
77e20cc
vendor : update cpp-httplib to 0.37.2 (#20484)
angt Mar 14, 2026
9789c4e
ggml : add OpenVINO backend (#15307)
wine99 Mar 14, 2026
f2c0dfb
Use fp32 in cuBLAS V100 to avoid overflows, env variables to override…
wallentri88 Mar 14, 2026
d0b79aa
ggml : add native AVX512-FP16 support for F16 operations (#20529)
angt Mar 14, 2026
0024a69
scripts : update get-hellaswag.sh and get-winogrande.sh (#20542)
angt Mar 14, 2026
0685848
scripts : remove get-wikitext-103.sh (#20543)
angt Mar 14, 2026
710878a
webui: restore code preview iframe origin isolation (#20477)
Chedrian07 Mar 14, 2026
a93c0ef
add op gated_delta_net (#20455)
arthw Mar 14, 2026
94d0262
mtmd: add llama-mtmd-debug binary (#20508)
ngxson Mar 14, 2026
9f774e4
ci : reduce webgpu tests timeout to 900s (#20538)
ggerganov Mar 14, 2026
609ea50
hexagon: Q4_0 and MXFP4 repack fixes (#20527)
max-krasnyansky Mar 14, 2026
3a6f059
ci : try to optimize some jobs (#20521)
netrunnereve Mar 14, 2026
fc350fd
docker : force Python 3.13 in Vulkan container (#20530)
gguillemas Mar 14, 2026
b476895
ci : move self-hosted workflows to separate files (#20540)
ggerganov Mar 14, 2026
b30a5fd
metal : add FA specialization for HSK = 320, HSV = 256 (#20549)
ggerganov Mar 14, 2026
d23355a
model : wire up Qwen3.5/Qwen3.5MoE tensors for NVFP4 support (#20506)
michaelw9999 Mar 14, 2026
6b10a82
kv-cache : fix reading llama_kv_cell_ext during state read (#20273)
sprayandwipe Mar 15, 2026
1a3d8ed
vulkan: use graphics queue on AMD (#20551)
0cc4m Mar 15, 2026
617db24
cuda : add RDNA4-specific MMVQ parameter table for bs=1 decode (#19478)
JoursBleu Mar 15, 2026
b9da444
ggml : guard against sumq2 being 0 in IQ4_NL (#20460)
bartowski1182 Mar 15, 2026
89d0aec
convert : support contiguous method on lora tensors (#20489)
CISC Mar 15, 2026
9cd4ebc
ci : split build.yml + server.yml (#20546)
ggerganov Mar 15, 2026
cf45437
codeowners : use teams (#20526)
CISC Mar 15, 2026
5596464
fix: prevent nullptr dereference (#20552)
doraeric Mar 15, 2026
8b7d340
ggml/hip: fix APU compatibility - soft error handling for hipMemAdvi…
moonshadow-25 Mar 15, 2026
07c6a59
vendor : update cpp-httplib to 0.38.0 (#20578)
angt Mar 15, 2026
ceef6b5
ggml: avoid creating CUDA context during device init (#20595)
ServeurpersoCom Mar 15, 2026
ae40cd2
CUDA: limit number of FA stream-k CUDA blocks (#20586)
JohannesGaessler Mar 15, 2026
b91d7df
ci : only save openvino caches on github-hosted master (#20593)
CISC Mar 15, 2026
ebbf544
sycl : fix for untransposed GDA recurrent state (#20583)
CISC Mar 15, 2026
88915cb
server : fix wait in test_cancel_requests() test (#20601)
ggerganov Mar 15, 2026
9e2e219
tools/cli: fix disable reasoning (#20606)
pwilkin Mar 15, 2026
34818ea
CUDA: GDN hide memory latency (#20537)
am17an Mar 16, 2026
d393649
common : fix iterator::end() dereference (#20445)
rillomas Mar 16, 2026
079e5a4
convert : support mixed-precision ModelOpt models with per-tensor NVF…
richarddd Mar 16, 2026
de8f01c
model : wire up Nemotron-H tensors for NVFP4 support (#20561)
CISC Mar 16, 2026
46dba9f
vulkan: fix flash attention dot product precision (#20589)
0cc4m Mar 16, 2026
d8c331c
webui: use date in more human readable exported filename (#19939)
woof-dog Mar 16, 2026
d65c4f2
Fix model selector locked to first loaded model with multiple models …
ServeurpersoCom Mar 16, 2026
67a2209
webui: Add MCP CORS Proxy detection logic & UI (#20167)
allozaur Mar 16, 2026
3c8521c
llama-graph: replace cont with reshape for alpha in qwen35 (#20640)
am17an Mar 16, 2026
dddca02
webui: add model information dialog to router mode (#20600)
ServeurpersoCom Mar 16, 2026
f6da02c
ggml : extend im2col f16 (ggml/1434)
David366AI Mar 15, 2026
c0ccbd1
ggml : try fix arm build (whisper/0)
ggerganov Mar 16, 2026
f47a246
sync : ggml
ggerganov Mar 16, 2026
1bbec6a
jinja : add capability check for object args (#20612)
aldehir Mar 16, 2026
0ed9929
ci : update labeler (#20629)
CISC Mar 16, 2026
cf21cdf
kleidiai: add data type check to get_tensor_traits (#20639)
martin-klacer-arm Mar 16, 2026
55e8702
tests : write to binary buffer to avoid newline translation in jinja …
CISC Mar 16, 2026
9b342d0
benches : add Nemotron 3 Nano on DGX Spark (#20652)
ggerganov Mar 16, 2026
45172df
ci : disable AMX jobs (#20654)
ggerganov Mar 16, 2026
d34ff7e
model: mistral small 4 support (#20649)
ngxson Mar 16, 2026
2e4a6ed
tools/server: support refusal content for Responses API (#20285)
pwilkin Mar 17, 2026
b6c83aa
[SYCL] ehance UPSCALE to support all UT cases (#20637)
arthw Mar 17, 2026
740a447
vulkan: allow graphics queue only through env var (#20599)
0cc4m Mar 17, 2026
6276706
kleidiai : fix MUL_MAT support for batched (3D) inputs (#20620)
jabr Mar 17, 2026
8cc2d81
server : fix ctx checkpoint invalidation (#20671)
ggerganov Mar 17, 2026
3a5cb62
vulkan: async and event fixes (#20518)
0cc4m Mar 17, 2026
ab0bb93
ci : bump ccache [no ci] (#20679)
CISC Mar 17, 2026
054d8b0
ggml-cpu: fix RVV checks in quants and repacking (#20682)
taimur-10x Mar 17, 2026
d2ecd2d
common/parser: add `--skip-chat-parsing` to force a pure content pars…
pwilkin Mar 17, 2026
ee4801e
ggml-blas: set mkl threads from thread context (#20602)
kannon92 Mar 17, 2026
892e3c3
vulkan: disable mmvq on Intel Windows driver (#20672)
0cc4m Mar 17, 2026
cf23ee2
hexagon: add neg, exp, sigmoid, softplus ops, cont, repeat ops (#20701)
srikris-sridhar Mar 17, 2026
a69d54f
context : fix graph not resetting when control vector changes (#20381)
iimez Mar 18, 2026
7533a7d
HIP : ignore return of hipMemAdvise [no ci] (#20696)
IMbackK Mar 18, 2026
7ab321d
webui: Fix duplicated messages on q param (#20715)
allozaur Mar 18, 2026
fe00a84
tests: enable kv_unified to prevent cuda oom error on rtx 2060 (#20645)
taronaeo Mar 18, 2026
5e8910a
common : rework gpt-oss parser (#20393)
aldehir Mar 18, 2026
f4049ad
tests : fix test-jinja-py Windows failures by bypassing command-line …
rillomas Mar 18, 2026
312cf03
llama : re-enable manual LoRA adapter free (#19983)
PopFlamingo Mar 18, 2026
48e6123
webui: improve tooltip wording for attachment requirements (#20688)
julien-c Mar 18, 2026
79187f2
ggml : restore ggml_type_sizef() to aboid major version bump (ggml/1441)
ggerganov Mar 16, 2026
b08f732
ggml : bump version to 0.9.8 (ggml/1442)
ggerganov Mar 16, 2026
4efd326
sync : ggml
ggerganov Mar 18, 2026
78d550b
ggml-cpu/x86: fix unused changemask warning in repack (#20692)
mrshaw01 Mar 18, 2026
8ced5f4
Move to no timeout for WaitAny in graph submission to avoid deadlocks…
reeselevine Mar 18, 2026
5744d7e
Rebuild index.html.gz (#20724)
crsawyer Mar 18, 2026
d13d60a
gguf-py : cleaner way to get the first key (#20727)
CISC Mar 18, 2026
6729d49
model : add control vector support where missing (#20653)
GreyWorks Mar 18, 2026
07ba6d2
CANN: support flash attention for head dim not multiple of 16, fix AL…
noemotiovon Mar 19, 2026
ea01d19
ggml-webgpu: Add supports for `DIAG` and `TRI` (#20664)
yomaytk Mar 19, 2026
509a31d
ggml-webgpu: Update the `RMS_NORM` preprocessor and add `L2_NORM` (#2…
yomaytk Mar 19, 2026
7f2cbd9
CANN: handle in-place ROPE on non-contiguous f32 tensors (#20274)
noemotiovon Mar 19, 2026
c014c3f
docs: add information about openvino in the docker page (#20743)
kannon92 Mar 19, 2026
8113977
vocab : assert array size of scores and toktypes (#20737)
CISC Mar 19, 2026
3fee84e
cmake : fix build warning when kleidiai is enabled (#20457)
chaxu01 Mar 19, 2026
07feeaa
vulkan: dequantize iq4_xs 4 at a time (#20657)
netrunnereve Mar 19, 2026
1b9bbaa
common : fix gpt-oss content removal (#20745)
aldehir Mar 19, 2026
b486c17
convert : support is_causal hyperparameter (#20746)
Bing-su Mar 19, 2026
512bba6
webui: Improve model parsing logic + add unit tests (#20749)
allozaur Mar 19, 2026
cd708db
WebUI: Persist the on/off state of the MCP servers for new conversati…
ServeurpersoCom Mar 19, 2026
1e64534
mtmd: add clip_graph::build_mm() (#20751)
ngxson Mar 19, 2026
4065c1a
Server becomes the source of truth for sampling parameter defaults (#…
ServeurpersoCom Mar 19, 2026
f071ce6
ci : add action for finding duplicate issues (#20756)
ggerganov Mar 19, 2026
922b90e
common : add LLAMA_ARG_SPEC_TYPE (#20744)
ddh0 Mar 19, 2026
c125883
ggml webgpu: ops support for qwen3.5 (SET, TRI_SOLVE, SSM_CONV, GATED…
reeselevine Mar 19, 2026
5e54d51
common/parser: add proper reasoning tag prefill reading (#20424)
pwilkin Mar 19, 2026
b49d8b8
ci : add hip quality check (#20430)
IMbackK Mar 19, 2026
74c42ee
hexagon: add Matrix Extensions (HMX) for Hexagon NPU backend (#20693)
njsyw1997 Mar 19, 2026
900efd5
ci : clarify gh command for viewing issues (#20766)
ggerganov Mar 19, 2026
76f2dc7
chat : handle tool calls with no required args in TAG_WITH_TAGGED for…
jpohhhh Mar 19, 2026
26c9ce1
server: Add cached_tokens info to oaicompat responses (#19361)
percontation Mar 19, 2026
3408072
hip: Avoid compiler bug in RDNA code generation during debug builds o…
Exile333 Mar 19, 2026
6c72646
ci : improve action for duplicate issue (#20772)
ggerganov Mar 19, 2026
a0bbcdd
ggml: guard KleidiAI DOWNLOAD_EXTRACT_TIMESTAMP for cmake < 3.24 (#20…
sundaram123krishnan Mar 19, 2026
b739738
docs: Update server README to reflect PR #20297 (#20560)
Tomeamis Mar 19, 2026
c1b9116
server: fix router mode deadlock on child crash and TOCTOU race in mo…
BenRacicot Mar 19, 2026
c46583b
common/parser : fix out_of_range crash in throw path (#20424 regressi…
jpohhhh Mar 20, 2026
21c8045
jinja : fix heap OOB read in value equality comparison (#20782)
retr0reg Mar 20, 2026
464fd0e
ai : update find-related action (#20790)
ggerganov Mar 20, 2026
6d99b44
docs : fix Metal backend op support status in ops.md (#20779)
seyoungjeong Mar 20, 2026
1af9dab
CANN: add BF16 support for core operators (#20152)
hipudding Mar 20, 2026
ab9d4c3
server : improve mtmd ctx checkpoints (#20726)
ggerganov Mar 20, 2026
3adbef7
model: assert nextn_predict_layers to prevent underflow (#20783)
retr0reg Mar 20, 2026
dc65924
context: zero output buffer on allocation (#20781)
retr0reg Mar 20, 2026
e06c3ab
vulkan: change gated_delta_net to shard a column across a subgroup (#…
jeffbolznv Mar 20, 2026
fb78ad2
server: (doc) clarify in-scope and out-scope features (#20794)
ngxson Mar 20, 2026
58c81f7
model : fix Granite Hybrid type check for 7B.A1B (#20795)
victor-villar Mar 20, 2026
b31b30f
ai : do not run bash commands in the prompt (#20810)
ggerganov Mar 20, 2026
149b249
common : fix typo in debug log ('extracft' -> 'extract') (#20807)
jpohhhh Mar 20, 2026
4cb7e0b
ai : limit runtime of the agent (#20816)
ggerganov Mar 20, 2026
e6ec21e
ggml-cpu: add always_inline to tinyBLAS_PPC accumulator saves (#20791)
shalinib-ibm Mar 20, 2026
b1c70e2
common/parser: fix nasty bug causing subtle corruption of generation …
pwilkin Mar 20, 2026
cea560f
Add shader count for Intel Arc Pro B60 (#20818)
TheBlueMatt Mar 21, 2026
29b28a9
ci : switch from pyright to ty (#20826)
CISC Mar 21, 2026
eac9c6e
Convert: Make NVFP4 and MXFP4 HF conversions say NVFP4/MXFP4 instead …
michaelw9999 Mar 21, 2026
2bcdddd
fix(rpc): prevent division by zero in deserialize_tensor (#20712)
y198nt Mar 21, 2026
568aec8
docs : explicit about banning accounts that violates policy (#19593)
ngxson Mar 21, 2026
212f452
context : use n_embd_out for pooled embedding extraction (#20840)
extfs Mar 21, 2026
990e4d9
common/grammar: fix grammar parsing issues to prevent stack overflow …
aagit Mar 21, 2026
3306dba
misc : prefer ggml-org models in docs and examples (#20827)
ddh0 Mar 21, 2026
ccb87fa
[CUDA] Increase number of output elements per-thread block if the K-d…
gaugarg-nv Mar 22, 2026
db9d8aa
ggml-cuda: native bf16 flash attention for vec kernel (#20525)
eous Mar 22, 2026
7770e70
Merge branch 'layla-build' into merge
l3utterfly Mar 22, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
138 changes: 138 additions & 0 deletions .devops/openvino.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
ARG OPENVINO_VERSION_MAJOR=2026.0
ARG OPENVINO_VERSION_FULL=2026.0.0.20965.c6d6a13a886
ARG UBUNTU_VERSION=24.04

# Optional proxy build arguments - empty by default
ARG http_proxy=
ARG https_proxy=

## Build Image
FROM ubuntu:${UBUNTU_VERSION} AS build

# Pass proxy args to build stage
ARG http_proxy
ARG https_proxy

RUN apt-get update && \
apt-get install -y --no-install-recommends \
ca-certificates \
gnupg \
wget \
git \
cmake \
ninja-build \
build-essential \
libtbb12 \
libssl-dev \
ocl-icd-opencl-dev \
opencl-headers \
opencl-clhpp-headers \
intel-opencl-icd && \
rm -rf /var/lib/apt/lists/*

# Install OpenVINO for Ubuntu 24.04
ARG OPENVINO_VERSION_MAJOR
ARG OPENVINO_VERSION_FULL
RUN mkdir -p /opt/intel && \
wget https://storage.openvinotoolkit.org/repositories/openvino/packages/${OPENVINO_VERSION_MAJOR}/linux/openvino_toolkit_ubuntu24_${OPENVINO_VERSION_FULL}_x86_64.tgz && \
tar -xf openvino_toolkit_ubuntu24_${OPENVINO_VERSION_FULL}_x86_64.tgz && \
mv openvino_toolkit_ubuntu24_${OPENVINO_VERSION_FULL}_x86_64 /opt/intel/openvino_${OPENVINO_VERSION_MAJOR} && \
cd /opt/intel/openvino_${OPENVINO_VERSION_MAJOR} && \
echo "Y" | ./install_dependencies/install_openvino_dependencies.sh && \
cd - && \
ln -s /opt/intel/openvino_${OPENVINO_VERSION_MAJOR} /opt/intel/openvino

ENV OpenVINO_DIR=/opt/intel/openvino

WORKDIR /app

COPY . .

# Build Stage
RUN bash -c "source ${OpenVINO_DIR}/setupvars.sh && \
cmake -B build/ReleaseOV -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DGGML_OPENVINO=ON && \
cmake --build build/ReleaseOV -j$(nproc)"

# Copy all necessary libraries
RUN mkdir -p /app/lib && \
find build/ReleaseOV -name '*.so*' -exec cp {} /app/lib \; && \
find ${OpenVINO_DIR}/runtime/lib/intel64 -name '*.so*' -exec cp -P {} /app/lib \; 2>/dev/null || \
find ${OpenVINO_DIR}/lib/intel64 -name '*.so*' -exec cp -P {} /app/lib \;

# Create runtime directories and copy binaries
RUN mkdir -p /app/full \
&& cp build/ReleaseOV/bin/* /app/full/ \
&& cp *.py /app/full \
&& cp -r gguf-py /app/full \
&& cp -r requirements /app/full \
&& cp requirements.txt /app/full \
&& cp .devops/tools.sh /app/full/tools.sh

## Base Runtime Image
FROM ubuntu:${UBUNTU_VERSION} AS base

# Pass proxy args to runtime stage
ARG http_proxy
ARG https_proxy

RUN apt-get update \
&& apt-get install -y libgomp1 libtbb12 curl\
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
&& find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete \
&& find /var/cache -type f -delete

COPY --from=build /app/lib/ /app/

### Full (all binaries)
FROM base AS full

ARG http_proxy
ARG https_proxy

COPY --from=build /app/full /app/

WORKDIR /app

RUN apt-get update && \
apt-get install -y --no-install-recommends \
git \
python3 \
python3-venv \
python3-pip && \
python3 -m venv /ov-venv && \
/ov-venv/bin/pip install --no-cache-dir --upgrade pip setuptools wheel && \
/ov-venv/bin/pip install --no-cache-dir -r requirements.txt && \
apt-get autoremove -y && \
apt-get clean && \
rm -rf /tmp/* /var/tmp/* && \
find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete && \
find /var/cache -type f -delete

ENTRYPOINT ["/bin/bash", "-c", "source /ov-venv/bin/activate && exec /app/tools.sh \"$@\"", "--"]


### Light, CLI only
FROM base AS light

COPY --from=build /app/full/llama-cli /app/

WORKDIR /app

ENTRYPOINT [ "/app/llama-cli" ]

### Server, Server only
FROM base AS server

ENV LLAMA_ARG_HOST=0.0.0.0

COPY --from=build /app/full/llama-server /app/

WORKDIR /app

HEALTHCHECK CMD [ "curl", "-f", "http://localhost:8080/health" ]

ENTRYPOINT [ "/app/llama-server" ]
5 changes: 3 additions & 2 deletions .devops/vulkan.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -53,10 +53,11 @@ RUN apt-get update \
&& apt-get install -y \
build-essential \
git \
python3 \
python3-dev \
python3.13 \
python3.13-dev \
python3-pip \
python3-wheel \
&& update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.13 100 \
&& pip install --break-system-packages --upgrade setuptools \
&& pip install --break-system-packages -r requirements.txt \
&& apt autoremove -y \
Expand Down
25 changes: 25 additions & 0 deletions .github/actions/linux-setup-openvino/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
name: "Linux - Setup OpenVINO Toolkit"
description: "Setup OpenVINO Toolkit for Linux"
inputs:
path:
description: "Installation path"
required: true
version_major:
description: "OpenVINO major version (e.g., 2025.3)"
required: true
version_full:
description: "OpenVINO full version (e.g., 2025.3.0.19807.44526285f24)"
required: true

runs:
using: "composite"
steps:
- name: Setup OpenVINO Toolkit
id: setup
uses: ./.github/actions/unarchive-tar
with:
url: https://storage.openvinotoolkit.org/repositories/openvino/packages/${{ inputs.version_major }}/linux/openvino_toolkit_ubuntu24_${{ inputs.version_full }}_x86_64.tgz
path: ${{ inputs.path }}
type: z
strip: 1

17 changes: 17 additions & 0 deletions .github/labeler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -104,3 +104,20 @@ OpenCL:
- any-glob-to-any-file:
- ggml/include/ggml-opencl.h
- ggml/src/ggml-opencl/**
- docs/backend/OPENCL.md
Hexagon:
- changed-files:
- any-glob-to-any-file:
- ggml/include/ggml-hexagon.h
- ggml/src/ggml-hexagon/**
WebGPU:
- changed-files:
- any-glob-to-any-file:
- ggml/include/ggml-webgpu.h
- ggml/src/ggml-webgpu/**
OpenVINO:
- changed-files:
- any-glob-to-any-file:
- ggml/include/ggml-openvino.h
- ggml/src/ggml-openvino/**
- docs/backend/OPENVINO.md
87 changes: 87 additions & 0 deletions .github/workflows/ai-issues.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
name: AI review (issues)

on:
issues:
types: [opened]

jobs:
find-related:
if: github.event.action == 'opened'
runs-on: [self-hosted, opencode]

permissions:
contents: read
issues: write

steps:
- name: Checkout repository
uses: actions/checkout@v6
with:
fetch-depth: 1

- name: Find related
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
OPENCODE_PERMISSION: |
{
"bash": {
"*": "deny",
"gh issue*": "allow",
"gh search issues*": "allow"
},
"webfetch": "deny"
}
run: |
rm AGENTS.md
rm CLAUDE.md

timeout 5m opencode run -m llama.cpp-dgx/ai-review-issues-find-similar --thinking "A new issue has been created:

Issue number: ${{ github.event.issue.number }}

Lookup the contents of the issue using the following 'gh' command:

gh issue view ${{ github.event.issue.number }} --json title,body,url,number

Next, perform the following task and then post a SINGLE comment (if needed).

---

TASK : FIND RELATED ISSUES

Using the 'gh' CLI tool, search through existing issues on Github.
Find related or similar issues to the newly created one and list them.
Do not list the new issue itself (it is #${{ github.event.issue.number }}).

Consider:
1. Similar titles or descriptions
2. Same error messages or symptoms
3. Related functionality or components
4. Similar feature requests

---

POSTING YOUR COMMENT:

Based on your findings, post a SINGLE comment on issue #${{ github.event.issue.number }}. Build the comment as follows:

- If no related issues were found, do NOT comment at all.
- If related issues were found, include a section listing them with links using the following format:

[comment]
This issue might be similar or related to the following issue(s):

- #[related_issue_number]: [brief description of how they are related]
- #[related_issue_number]: [brief description of how they are related]
...

_This comment was auto-generated locally using **$GA_ENGINE** on **$GA_MACHINE**_
[/comment]

Remember:
- Do not include the comment tags in your actual comment.
- Post at most ONE comment combining all findings.
- If you didn't find issues that are related enough, post nothing.
- You have access only to the 'gh' CLI tool - don't try to use other tools.
- If the output from a tool call is too long, try to limit down the search.
"
57 changes: 57 additions & 0 deletions .github/workflows/build-3rd-party.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
name: CI (3rd-party)

on:
workflow_dispatch: # allows manual triggering
push:
branches:
- master
paths: [
'.github/workflows/build-3rd-party.yml',
'**/CMakeLists.txt',
'**/.cmake',
'**/*.h',
'**/*.hpp',
'**/*.c',
'**/*.cpp'
]

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref && github.ref || github.run_id }}
cancel-in-progress: true

env:
GGML_NLOOP: 3
GGML_N_THREADS: 1
LLAMA_LOG_COLORS: 1
LLAMA_LOG_PREFIX: 1
LLAMA_LOG_TIMESTAMPS: 1

jobs:
ubuntu-24-llguidance:
runs-on: ${{ 'ubuntu-24.04-arm' || 'ubuntu-24.04' }}

steps:
- name: Clone
id: checkout
uses: actions/checkout@v6

- name: Dependencies
id: depends
run: |
sudo apt-get update
sudo apt-get install build-essential libssl-dev

- name: Build
id: cmake_build
run: |
cmake -B build \
-DLLAMA_FATAL_WARNINGS=ON \
-DLLAMA_LLGUIDANCE=ON
cmake --build build --config Release -j $(nproc)

- name: Test
id: cmake_test
run: |
cd build
ctest -L main --verbose --timeout 900

Loading
Loading