Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
174 commits
Select commit Hold shift + click to select a range
f40a80b
support bf16 and quantized type (#20803)
arthw Mar 22, 2026
81bc4d3
server: fix Host header (#20843)
kurnevsky Mar 22, 2026
23c9182
jinja : refactor token advancement (#20864)
CISC Mar 22, 2026
bd3f1d9
CUDA: fix BF16 FA compilation (#20865)
JohannesGaessler Mar 22, 2026
49bfdde
server: allow router to report child instances sleep status (#20849)
ngxson Mar 22, 2026
d3ac030
mtmd : fix LightOnOCR image preprocessing (#20877)
DorianRudolph Mar 23, 2026
ec2b787
mtmd: Add dynamic high-resolution image preprocessing for InternVL mo…
bssrdf Mar 23, 2026
84ffd0c
opencl: add flattened Q4_K mv and general Q4_K mm (#20773)
shaofeiqi Mar 23, 2026
cc18f96
fix(openvino): explicit memset in buffer_context allocation (#20857)
thedanhoffman Mar 23, 2026
07ff000
CANN: add RoPE cache preload before ACL graph capture (#20747)
noemotiovon Mar 23, 2026
7a0b6a6
common/autoparser : detect reasoning markers when enable_thinking cha…
jhen0409 Mar 23, 2026
177c758
metal: add CONV_3D (#19927)
Ra5hidIslam Mar 23, 2026
c44a932
webui: fix --webui-config-file settings not applied on load (#20823)
ServeurpersoCom Mar 23, 2026
e32d243
ai : update gh permissions (#20895)
ggerganov Mar 23, 2026
31a5cf4
server: use httplib dynamic threads (#20817)
ngxson Mar 23, 2026
841bc20
docs : rerun llama-gen-docs to include new CLI args (#20892)
EZForever Mar 23, 2026
f93c09e
memory : fix seq_id bounds in llama_memory_recurrent::state_read_meta…
ggerganov Mar 23, 2026
35b662b
docs: Fix typo in reasoning flag documentation (#20780)
GeoMaciolek Mar 23, 2026
11fb11b
webui: Improve chat form positioning (#20901)
allozaur Mar 23, 2026
fd18364
devops: upgraded default oneAPI version (#20731)
WizardlyBump17 Mar 23, 2026
bd69921
contrib: add "Requirements" section to PR template (#20841)
ngxson Mar 23, 2026
39bf0d3
rpc : RCE patch (#20908)
las7 Mar 23, 2026
1772701
opencl: add q6_K gemm and gemv kernels for Adreno (#20089)
lhez Mar 23, 2026
1fb2290
Add codeowners for scripts/snapdragon and docs/snapdragon (#20915)
max-krasnyansky Mar 23, 2026
7cadbfc
hexagon: general DMA and Binary Op fixes for large strides (#20918)
max-krasnyansky Mar 23, 2026
312d870
common : replace wrap_for_generation with a prefix convenience functi…
aldehir Mar 24, 2026
e852eb4
llama-fit: fix regex pattern for gate_up tensors (#20910)
am17an Mar 24, 2026
8c7957c
common : add standard Hugging Face cache support (#20775)
angt Mar 24, 2026
c2e224d
issues: add openvino backends (#20932)
taronaeo Mar 24, 2026
342d612
metal : add FA instantiations for HSK=512, HSV=512 (#20902)
ggerganov Mar 24, 2026
92080b4
metal : add FLOOR, CEIL, ROUND, TRUNC unary ops (#20930)
nuri-yoo Mar 24, 2026
2d2d9c2
common : add a WARNING for HF cache migration (#20935)
angt Mar 24, 2026
c9dc433
readme : clarify MODEL_ENDPOINT usage (#20941)
angt Mar 24, 2026
a94fdb0
WebUI: fix edit msg form textarea height (#20830)
bluemoehre Mar 24, 2026
42ebce3
common : fix get_gguf_split_info (#20946)
angt Mar 24, 2026
29771a0
vendor : update cpp-httplib to 0.39.0 (#20933)
cabelo Mar 24, 2026
3fc6f1a
ggml-backend: re-enable graph reuse with pipeline parallelism (#20927)
am17an Mar 24, 2026
9f102a1
models : move the token embedding norms to the first layer (#20943)
ggerganov Mar 24, 2026
abd86ef
docs : Update OpenVINO backend docs (#20968)
ravi9 Mar 25, 2026
3a60d06
convert : register Qwen3Model architecture (#20967)
Bing-su Mar 25, 2026
8fc85db
ci : limit requirements versions (#20980)
CISC Mar 25, 2026
403c9c9
ci : bump gguf publish python version (#20982)
CISC Mar 25, 2026
53dc8b5
sycl : fix wrong variable check by assert (#20903)
arthw Mar 25, 2026
406f4e3
android : fix-pointer-dangling (#20974)
yikechayedan Mar 25, 2026
062cca5
Add SLEEPING status to the WebUI model selector (#20949)
ServeurpersoCom Mar 25, 2026
69e0ece
webui: Fix editing assistant message without branching (#20944)
allozaur Mar 25, 2026
36dafba
llama: fix llama-model-saver (#20503)
JohannesGaessler Mar 25, 2026
8fc1749
gguf-split : clarify operation of gguf-split (#19749)
alosslessdev Mar 25, 2026
914eb5f
jinja: fix macro with kwargs (#20960)
ngxson Mar 25, 2026
3fab96c
ci : disable self-hosted mac jobs (#20985)
ggerganov Mar 25, 2026
b2704f9
ci: Allow ninja to be used during unit test (#20742)
rillomas Mar 25, 2026
9c600bc
llama-bench: print `-n-cpu-moe` when offloaded layers > 1 (#20984)
am17an Mar 25, 2026
345de3c
Use docker in build-android.yml (#20928)
shreyajn Mar 25, 2026
1922f87
snapdragon: add missing features to WoS scripts to achieve parity wit…
aparmp-quic Mar 25, 2026
44c51e5
model : allow causal_attn and pooling_type on all architectures (#20973)
Bing-su Mar 25, 2026
80322eb
model: codefuse-ai/F2LLM-v2 support
sfallah Mar 25, 2026
ec54ac1
ci : fix parsing of vgpr counts in hip-quality-check (#20987)
IMbackK Mar 25, 2026
f2c72b8
common : fix gguf selection in common_list_cached_models (#20996)
angt Mar 25, 2026
056b50c
common : fix verbosity setup (#20989)
angt Mar 25, 2026
a970515
mtmd: Add DeepSeekOCR Support (#17400)
sfallah Mar 25, 2026
c0159f9
common : do not delete old files from the old cache when updating (#2…
angt Mar 25, 2026
0a524f2
CUDA & CPU: support F32 kernel type for `CONV_TRANSPOSE_2D` (#17094)
AgainstEntropy Mar 26, 2026
0fac87b
imatrix : fix crash when using --show-statistics with zero counts (#1…
ssam18 Mar 26, 2026
112c781
ggml-cuda: Add NVFP4 dp4a kernel (#20644)
michaelw9999 Mar 26, 2026
3cba8bb
common : fix split model migration (#21019)
angt Mar 26, 2026
93dfbc1
common : make LLAMA_CACHE the one cache for everything (#21009)
angt Mar 26, 2026
dc8d14c
fix(ggml): correct RISC-V ISA string canonical ordering for RVV in CM…
ihb2032 Mar 26, 2026
9900b29
common : filter out imatrix when finding models (#21023)
angt Mar 26, 2026
3d5acab
convert : add RuGPT3XL (RuGPT3XLForCausalLM) support (#21011)
EvilFreelancer Mar 26, 2026
f8d4aba
convert : support Qwen3.5/Qwen3.5 Moe NVFP4 and add input scales (#20…
michaelw9999 Mar 26, 2026
ded446b
opencl: allow large buffer for adreno (#20997)
lhez Mar 26, 2026
a73bbd5
mtmd: refactor image preprocessing (#21031)
ngxson Mar 26, 2026
287b5b1
common : add getpwuid fallback for HF cache when HOME is not set (#21…
angt Mar 26, 2026
8c60b8a
ci: pin external actions to exact commit SHA (#21033)
ngxson Mar 26, 2026
7ca0c9c
hip: use fnuz fp8 for conversion on CDNA3 (#21040)
IMbackK Mar 26, 2026
1743d98
mtmd: fix "v.patch_embd" quant and unsupported im2col ops on Metal fo…
sfallah Mar 26, 2026
6861f65
CANN: update docker images to 8.5.0 and improve CANN.md (#20801)
KokerZhou Mar 27, 2026
9bcb4ef
metal : Fix dimension constraint violation in matmul2d descriptor (#2…
lathrys-at Mar 27, 2026
d0fa2c9
Send reasoning content back to the model across turns via the reasoni…
ServeurpersoCom Mar 27, 2026
a308e58
completion : Fix segfault on model load failure (#21049)
mtmcp Mar 27, 2026
37f230d
completion : session_tokens insert range in completion tool (no-op → …
mtmcp Mar 27, 2026
ba38f3b
rpc : proper handling of data pointers to CPU buffers (#21030)
rgerganov Mar 27, 2026
20197b6
server: add built-in tools backend support (#20898)
ngxson Mar 27, 2026
871f1a2
mtmd: add more sanity checks (#21047)
ngxson Mar 27, 2026
48cda24
server: remove the verbose_prompt parameter (#21059)
aisk Mar 27, 2026
e6f6770
webui: Improve Chat Messages initial scroll + auto-scroll logic + add…
allozaur Mar 27, 2026
ee051c1
hexagon: support for IQ4_NL and MXFP4 (#21018)
njsyw1997 Mar 27, 2026
ff934e2
server: Introduce LLAMA_BUILD_WEBUI build flag to allow disabling the…
kushagharahi Mar 27, 2026
59d8402
common : inhibit lazy grammar sampler while reasoning is active (#20970)
aldehir Mar 27, 2026
5c1a7b8
server : add custom socket options to disable SO_REUSEPORT (#21056)
angt Mar 28, 2026
bf934f2
docker : fix and enable ARM64 image build (#20929)
Ts-sound Mar 28, 2026
c46758d
cli : add /glob command (#21084)
CISC Mar 28, 2026
1f5d15e
common/parser: fix reasoning whitespace bugs + extra parser tests (#2…
pwilkin Mar 28, 2026
0eb4764
vulkan: add noncontiguous GLU support (#21081)
0cc4m Mar 28, 2026
b0f0dd3
vendor : update cpp-httplib to 0.40.0 (#21100)
angt Mar 28, 2026
51a84ef
webui: Conversation forking + branching improvements (#21021)
allozaur Mar 28, 2026
82b703f
Document custom default webui preferences in server README (#19771)
woof-dog Mar 28, 2026
3d66da1
ci : gracefully shut down the server (#21110)
angt Mar 28, 2026
edfb440
server : fix processing of multiple back-to-back mtmd chunks (#21107)
ggerganov Mar 28, 2026
e6f2ec0
common : add reasoning_format = none support to gpt-oss (#21094)
aldehir Mar 28, 2026
e397d38
common/json-schema: fix: handle non-capturing groups (?:...) in JSON …
XciD Mar 28, 2026
9681897
WebUI: Replace illegal nested button elements (#21026)
bluemoehre Mar 28, 2026
3a14a54
common : add character class support to glob_match (#21111)
CISC Mar 28, 2026
98ae0a0
common/parser: fix handling of tool definition with missing propertie…
pwilkin Mar 28, 2026
6509718
fix **/x glob matching (#21129)
CISC Mar 28, 2026
afe65aa
[SYCL] Enhance build script to use half cores to build, avoid OS hang…
arthw Mar 29, 2026
2405d59
devops: including compute-runtime for intel.Dockerfile (#21076)
WizardlyBump17 Mar 29, 2026
f5d1c41
hexagon: dma optimizations (mostly fixing regressions) (#21137)
max-krasnyansky Mar 29, 2026
ec16a07
Optimize MOE GEMV kernel for BS > 1. (#20905)
gaugarg-nv Mar 29, 2026
7c20367
add missing ROPE_FACTORS_LONG/SHORT for MiniCPM (#21150)
CISC Mar 29, 2026
abf9a62
server: wrap headers for mcp proxy (#21072)
ngxson Mar 30, 2026
e2eb39e
ci : bump ty to 0.0.26 (#21156)
CISC Mar 30, 2026
278521c
llama-model-loader: print warning when using overrides with mmap (#20…
am17an Mar 30, 2026
389c7d4
webui: Fix branching logic on edit message (#21175)
allozaur Mar 30, 2026
cad2d38
rpc : fix misleading error log (#21184)
rgerganov Mar 30, 2026
64ac9ab
CUDA : Fix CUB's argsort when nrows % block_size == 0 CCCL < 3.1 (#21…
ORippler Mar 30, 2026
ead417f
jinja : handle empty expressions correctly (#20913)
zeph1912 Mar 30, 2026
84ae843
CI : Enable CUDA and Vulkan ARM64 runners and fix CI/CD (#21122)
ehfd Mar 30, 2026
08f2145
opencl: add q4_K gemm and gemv kernels for Adreno (#20919)
shaofeiqi Mar 30, 2026
5ce013c
common : Disable backend sampling if reasoning budget is enabled (#21…
Galunid Mar 31, 2026
26dac84
vendor : update BoringSSL to 0.20260327.0 (#21211)
angt Mar 31, 2026
4453e77
server/webui: cleanup dual representation approach, simplify to opena…
pwilkin Mar 31, 2026
fcc2d59
fix: include API key in CORS proxy requests for MCP connections (#21193)
satishkc7 Mar 31, 2026
90aa83c
common: add bounds check in common_init_result::sampler to prevent se…
mtmcp Mar 31, 2026
62278ce
sycl : enhance fattn perf (#21185)
arthw Mar 31, 2026
41361c8
common : move up common_init() and fix Windows UTF-8 logs (#21176)
angt Mar 31, 2026
0be6c7c
ggml : bump version to 0.9.9 (ggml/1449)
ggerganov Mar 30, 2026
9281dd1
sync : ggml
ggerganov Mar 31, 2026
eec6f85
CI: Enable CPU and Vulkan ARM64 Release (#21207)
ehfd Mar 31, 2026
0b6ff47
fix: correct misspellings in code comments (#21217)
lainon1 Mar 31, 2026
624733d
common : gpt-oss handle builtin and unsolicited tool calls (#21213)
aldehir Mar 31, 2026
4a00bbf
server: (webui) no more gzip compression (#21073)
ngxson Mar 31, 2026
632219a
CANN: fix multi-thread set_tensor race conditions (#20151)
hipudding Mar 31, 2026
6307ec0
common : cleanup logs and modernize the progress bar (#21215)
angt Mar 31, 2026
0fcb376
fix: Use lower-case proxy headers naming (#21235)
allozaur Mar 31, 2026
825eb91
ggml-webgpu: port all AOT operators to JIT (#20728)
abhijitramesh Mar 31, 2026
82764c3
ggml webgpu: quantized buffers to u32 + wider browser/device support …
reeselevine Apr 1, 2026
4951250
llama : refactor llama_model_quantize_params to expose a pure C inter…
EAddario Apr 1, 2026
8845816
CUDA: Add Flash Attention Support for Head Dimension 512 (#20998)
anavp-nvidia Apr 1, 2026
2b86e5c
ggml-cpu: fix fallback for RVV kernels without zvfh (#21157)
taimur-10x Apr 1, 2026
d43375f
ggml : fix RWKV ops thread assignment (#21226)
ggerganov Apr 1, 2026
88d5f8f
CUDA/HIP: Fix kernel slection for mmvq mmid kernel to align host sele…
IMbackK Apr 1, 2026
e1cb817
memory: respect unified KV cache in hybrid memory for eval tasks (#21…
mudler Apr 1, 2026
84f82e8
ggml-cuda: Add generic NVFP4 MMQ kernel (#21074)
michaelw9999 Apr 1, 2026
6b949d1
sycl : support nvfp4 type in mul_mat (#21227)
arthw Apr 1, 2026
296bc05
ggml : bump version to 0.9.10 (ggml/1454)
ggerganov Apr 1, 2026
6422036
sync : ggml
ggerganov Apr 1, 2026
0356e33
scripts: add function call test script (#21234)
ngxson Apr 1, 2026
744c0c7
llama : rotate activations for better quantization (#21038)
ggerganov Apr 1, 2026
1d6d4cf
fix: tool call parsing for LFM2 and LFM2.5 models (#21242)
jbuchananr Apr 1, 2026
8710e5f
hexagon: improve RMS_NORM and DIV accuracy (#21251)
aparmp-quic Apr 1, 2026
5a0ed51
Update Dawn version in WebGPU CI (#20784)
nikhilJain17 Apr 1, 2026
6de97b9
kleidiai: add CPU feature detection to CI run script (#20394)
martin-klacer-arm Apr 1, 2026
86221cf
CUDA: fix FA kernel selection logic (#21271)
JohannesGaessler Apr 1, 2026
12dbf1d
server: Bypass API Key validation for WebUI static bundle assets (#21…
allozaur Apr 1, 2026
95a6eba
opencl: fix leak in Adreno q8_0 path (#21212)
lhez Apr 1, 2026
c30e012
contrib : rewrite AGENTS.md, make it more clear about project values …
ngxson Apr 1, 2026
fbd441c
hexagon : add cumsum op support (#21246)
tboinovski1 Apr 2, 2026
4888137
sycl : fix llama_kv_cache hang when kv_cache is huge: 5GB (#21283)
arthw Apr 2, 2026
bc07d55
ggml : bump version to 0.9.11 (ggml/1456)
ggerganov Apr 2, 2026
dae2bf4
sync : ggml
ggerganov Apr 2, 2026
d6dac92
Ignore Transfer-Encoding header. (#20269)
crmky Apr 2, 2026
17193cc
kv-cache : do not quantize SWA KV cache (#21277)
ggerganov Apr 2, 2026
6137c32
chat : add Granite 4.0 chat template with correct tool_call role mapp…
jesus-talavera-ibm Apr 2, 2026
e15efe0
Relax prefill parser to allow space. (#21240)
pwilkin Apr 2, 2026
2233737
common : add commentary rules for gpt-oss-20b (#21286)
aldehir Apr 2, 2026
63f8fe0
model, mtmd: fix gguf conversion for audio/vision mmproj (#21309)
ngxson Apr 2, 2026
5803c8d
tests: allow exporting graph ops from HF file without downloading wei…
0cc4m Apr 2, 2026
a1cfb64
ggml-webgpu: add vectorized flash attention (#20709)
ArberSephirotheca Apr 2, 2026
7992aa7
tests : add unit test coverage for llama_tensor_get_type (#20112)
bartowski1182 Apr 2, 2026
5208e2d
fix: gemma 4 template (#21326)
pwilkin Apr 2, 2026
7c7d6ce
[HIP] Bump ROCm version to 7.2.1 (#21066)
slojosic-amd Apr 2, 2026
f49e917
ci : add AMD ZenDNN label to PR labeler (#21345)
z-vishal Apr 3, 2026
78e9965
Merge branch 'layla-build' into merge
l3utterfly Apr 3, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .devops/cann.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

# Define the CANN base image for easier version updates later
ARG CHIP_TYPE=910b
ARG CANN_BASE_IMAGE=quay.io/ascend/cann:8.3.rc2-${CHIP_TYPE}-openeuler24.03-py3.11
ARG CANN_BASE_IMAGE=quay.io/ascend/cann:8.5.0-${CHIP_TYPE}-openeuler24.03-py3.11

# ==============================================================================
# BUILD STAGE
Expand Down
13 changes: 8 additions & 5 deletions .devops/cpu.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
ARG UBUNTU_VERSION=22.04
ARG UBUNTU_VERSION=24.04

FROM ubuntu:$UBUNTU_VERSION AS build

ARG TARGETARCH

RUN apt-get update && \
apt-get install -y build-essential git cmake libssl-dev
apt-get install -y gcc-14 g++-14 build-essential git cmake libssl-dev

ENV CC=gcc-14 CXX=g++-14

WORKDIR /app

Expand Down Expand Up @@ -34,7 +36,7 @@ RUN mkdir -p /app/full \
FROM ubuntu:$UBUNTU_VERSION AS base

RUN apt-get update \
&& apt-get install -y libgomp1 curl\
&& apt-get install -y libgomp1 curl \
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
Expand All @@ -55,8 +57,9 @@ RUN apt-get update \
git \
python3 \
python3-pip \
&& pip install --upgrade pip setuptools wheel \
&& pip install -r requirements.txt \
python3-wheel \
&& pip install --break-system-packages --upgrade setuptools \
&& pip install --break-system-packages -r requirements.txt \
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
Expand Down
8 changes: 5 additions & 3 deletions .devops/cuda-new.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
ARG UBUNTU_VERSION=24.04
# This needs to generally match the container host's environment.
ARG CUDA_VERSION=13.1.0
ARG CUDA_VERSION=13.1.1
# Target the CUDA build image
ARG BASE_CUDA_DEV_CONTAINER=nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION}

Expand All @@ -12,7 +12,9 @@ FROM ${BASE_CUDA_DEV_CONTAINER} AS build
ARG CUDA_DOCKER_ARCH=default

RUN apt-get update && \
apt-get install -y build-essential cmake python3 python3-pip git libssl-dev libgomp1
apt-get install -y gcc-14 g++-14 build-essential cmake python3 python3-pip git libssl-dev libgomp1

ENV CC=gcc-14 CXX=g++-14 CUDAHOSTCXX=g++-14

WORKDIR /app

Expand All @@ -39,7 +41,7 @@ RUN mkdir -p /app/full \
FROM ${BASE_CUDA_RUN_CONTAINER} AS base

RUN apt-get update \
&& apt-get install -y libgomp1 curl\
&& apt-get install -y libgomp1 curl \
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
Expand Down
13 changes: 8 additions & 5 deletions .devops/cuda.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
ARG UBUNTU_VERSION=22.04
ARG UBUNTU_VERSION=24.04
# This needs to generally match the container host's environment.
ARG CUDA_VERSION=12.4.0
ARG CUDA_VERSION=12.8.1
# Target the CUDA build image
ARG BASE_CUDA_DEV_CONTAINER=nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION}

Expand All @@ -12,7 +12,9 @@ FROM ${BASE_CUDA_DEV_CONTAINER} AS build
ARG CUDA_DOCKER_ARCH=default

RUN apt-get update && \
apt-get install -y build-essential cmake python3 python3-pip git libssl-dev libgomp1
apt-get install -y gcc-14 g++-14 build-essential cmake python3 python3-pip git libssl-dev libgomp1

ENV CC=gcc-14 CXX=g++-14 CUDAHOSTCXX=g++-14

WORKDIR /app

Expand All @@ -39,7 +41,7 @@ RUN mkdir -p /app/full \
FROM ${BASE_CUDA_RUN_CONTAINER} AS base

RUN apt-get update \
&& apt-get install -y libgomp1 curl\
&& apt-get install -y libgomp1 curl \
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
Expand All @@ -60,7 +62,8 @@ RUN apt-get update \
git \
python3 \
python3-pip \
&& pip install --upgrade pip setuptools wheel \
python3-wheel \
&& pip install --break-system-packages --upgrade setuptools \
&& pip install --break-system-packages -r requirements.txt \
&& apt autoremove -y \
&& apt clean -y \
Expand Down
21 changes: 19 additions & 2 deletions .devops/intel.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
ARG ONEAPI_VERSION=2025.2.2-0-devel-ubuntu24.04
ARG ONEAPI_VERSION=2025.3.2-0-devel-ubuntu24.04

## Build Image

Expand Down Expand Up @@ -33,8 +33,25 @@ RUN mkdir -p /app/full \

FROM intel/deep-learning-essentials:$ONEAPI_VERSION AS base

ARG IGC_VERSION=v2.30.1
ARG IGC_VERSION_FULL=2_2.30.1+20950
ARG COMPUTE_RUNTIME_VERSION=26.09.37435.1
ARG COMPUTE_RUNTIME_VERSION_FULL=26.09.37435.1-0
ARG IGDGMM_VERSION=22.9.0
RUN mkdir /tmp/neo/ && cd /tmp/neo/ \
&& wget https://github.com/intel/intel-graphics-compiler/releases/download/$IGC_VERSION/intel-igc-core-${IGC_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/intel-graphics-compiler/releases/download/$IGC_VERSION/intel-igc-opencl-${IGC_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/compute-runtime/releases/download/$COMPUTE_RUNTIME_VERSION/intel-ocloc-dbgsym_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.ddeb \
&& wget https://github.com/intel/compute-runtime/releases/download/$COMPUTE_RUNTIME_VERSION/intel-ocloc_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/compute-runtime/releases/download/$COMPUTE_RUNTIME_VERSION/intel-opencl-icd-dbgsym_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.ddeb \
&& wget https://github.com/intel/compute-runtime/releases/download/$COMPUTE_RUNTIME_VERSION/intel-opencl-icd_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/compute-runtime/releases/download/$COMPUTE_RUNTIME_VERSION/libigdgmm12_${IGDGMM_VERSION}_amd64.deb \
&& wget https://github.com/intel/compute-runtime/releases/download/$COMPUTE_RUNTIME_VERSION/libze-intel-gpu1-dbgsym_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.ddeb \
&& wget https://github.com/intel/compute-runtime/releases/download/$COMPUTE_RUNTIME_VERSION/libze-intel-gpu1_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.deb \
&& dpkg --install *.deb

RUN apt-get update \
&& apt-get install -y libgomp1 curl\
&& apt-get install -y libgomp1 curl \
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
Expand Down
2 changes: 1 addition & 1 deletion .devops/llama-cli-cann.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
ARG ASCEND_VERSION=8.1.RC1.alpha001-910b-openeuler22.03-py3.10
ARG ASCEND_VERSION=8.5.0-910b-openeuler22.03-py3.10

FROM ascendai/cann:$ASCEND_VERSION AS build

Expand Down
2 changes: 1 addition & 1 deletion .devops/musa.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ RUN mkdir -p /app/full \
FROM ${BASE_MUSA_RUN_CONTAINER} AS base

RUN apt-get update \
&& apt-get install -y libgomp1 curl\
&& apt-get install -y libgomp1 curl \
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
Expand Down
2 changes: 2 additions & 0 deletions .devops/nix/package.nix
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@
effectiveStdenv ? if useCuda then cudaPackages.backendStdenv else stdenv,
enableStatic ? effectiveStdenv.hostPlatform.isStatic,
precompileMetalShaders ? false,
useWebUi ? true,
}:

let
Expand Down Expand Up @@ -164,6 +165,7 @@ effectiveStdenv.mkDerivation (finalAttrs: {
cmakeFlags =
[
(cmakeBool "LLAMA_BUILD_SERVER" true)
(cmakeBool "LLAMA_BUILD_WEBUI" useWebUi)
(cmakeBool "BUILD_SHARED_LIBS" (!enableStatic))
(cmakeBool "CMAKE_SKIP_BUILD_RPATH" true)
(cmakeBool "GGML_NATIVE" false)
Expand Down
2 changes: 1 addition & 1 deletion .devops/openvino.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ ARG http_proxy
ARG https_proxy

RUN apt-get update \
&& apt-get install -y libgomp1 libtbb12 curl\
&& apt-get install -y libgomp1 libtbb12 curl \
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
Expand Down
12 changes: 6 additions & 6 deletions .devops/rocm.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
ARG UBUNTU_VERSION=24.04

# This needs to generally match the container host's environment.
ARG ROCM_VERSION=7.2
ARG AMDGPU_VERSION=7.2
ARG ROCM_VERSION=7.2.1
ARG AMDGPU_VERSION=7.2.1

# Target the ROCm build image
ARG BASE_ROCM_DEV_CONTAINER=rocm/dev-ubuntu-${UBUNTU_VERSION}:${ROCM_VERSION}-complete
Expand All @@ -12,11 +12,11 @@ FROM ${BASE_ROCM_DEV_CONTAINER} AS build

# Unless otherwise specified, we make a fat build.
# This is mostly tied to rocBLAS supported archs.
# check https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.2.0/reference/system-requirements.html
# check https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.2.1/reference/system-requirements.html
# check https://rocm.docs.amd.com/projects/radeon-ryzen/en/latest/docs/compatibility/compatibilityrad/native_linux/native_linux_compatibility.html
# check https://rocm.docs.amd.com/projects/radeon-ryzen/en/latest/docs/compatibility/compatibilityryz/native_linux/native_linux_compatibility.html

ARG ROCM_DOCKER_ARCH='gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1151;gfx1150;gfx1200;gfx1201'
ARG ROCM_DOCKER_ARCH='gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1151;gfx1150;gfx1200;gfx1201'

# Set ROCm architectures
ENV AMDGPU_TARGETS=${ROCM_DOCKER_ARCH}
Expand Down Expand Up @@ -58,7 +58,7 @@ RUN mkdir -p /app/full \
FROM ${BASE_ROCM_DEV_CONTAINER} AS base

RUN apt-get update \
&& apt-get install -y libgomp1 curl\
&& apt-get install -y libgomp1 curl \
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
Expand All @@ -79,7 +79,7 @@ RUN apt-get update \
git \
python3-pip \
python3 \
python3-wheel\
python3-wheel \
&& pip install --break-system-packages --upgrade setuptools \
&& pip install --break-system-packages -r requirements.txt \
&& apt autoremove -y \
Expand Down
17 changes: 10 additions & 7 deletions .devops/vulkan.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -49,17 +49,20 @@ COPY --from=build /app/full /app

WORKDIR /app

ENV PATH="/root/.venv/bin:/root/.local/bin:${PATH}"

# Flag for compatibility with pip
ARG UV_INDEX_STRATEGY="unsafe-best-match"
RUN apt-get update \
&& apt-get install -y \
build-essential \
curl \
git \
python3.13 \
python3.13-dev \
python3-pip \
python3-wheel \
&& update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.13 100 \
&& pip install --break-system-packages --upgrade setuptools \
&& pip install --break-system-packages -r requirements.txt \
ca-certificates \
&& curl -LsSf https://astral.sh/uv/install.sh | sh \
&& uv python install 3.13 \
&& uv venv --python 3.13 /root/.venv \
&& uv pip install --python /root/.venv/bin/python -r requirements.txt \
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
Expand Down
16 changes: 8 additions & 8 deletions .editorconfig
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,6 @@ indent_style = tab
[prompts/*.txt]
insert_final_newline = unset

[tools/server/public/*]
indent_size = 2

[tools/server/public/deps_*]
trim_trailing_whitespace = unset
indent_style = unset
indent_size = unset

[tools/server/deps_*]
trim_trailing_whitespace = unset
indent_style = unset
Expand Down Expand Up @@ -61,6 +53,14 @@ charset = unset
trim_trailing_whitespace = unset
insert_final_newline = unset

[tools/server/public/**]
indent_style = unset
indent_size = unset
end_of_line = unset
charset = unset
trim_trailing_whitespace = unset
insert_final_newline = unset

[benches/**]
indent_style = unset
indent_size = unset
Expand Down
4 changes: 4 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Treat the generated single-file WebUI build as binary for diff purposes.
# Git's pack-file delta compression still works (byte-level), but this prevents
# git diff from printing the entire minified file on every change.
tools/server/public/index.html -diff
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/010-bug-compilation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ body:
attributes:
label: GGML backends
description: Which GGML backends do you know to be affected?
options: [AMX, BLAS, CANN, CPU, CUDA, Hexagon, HIP, Metal, Musa, OpenCL, RPC, SYCL, VirtGPU, Vulkan, WebGPU, zDNN, ZenDNN]
options: [AMX, BLAS, CANN, CPU, CUDA, Hexagon, HIP, Metal, Musa, OpenCL, OpenVINO, RPC, SYCL, VirtGPU, Vulkan, WebGPU, zDNN, ZenDNN]
multiple: true
validations:
required: true
Expand Down
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/011-bug-results.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ body:
attributes:
label: GGML backends
description: Which GGML backends do you know to be affected?
options: [AMX, BLAS, CANN, CPU, CUDA, Hexagon, HIP, Metal, Musa, OpenCL, RPC, SYCL, VirtGPU, Vulkan, WebGPU, zDNN, ZenDNN]
options: [AMX, BLAS, CANN, CPU, CUDA, Hexagon, HIP, Metal, Musa, OpenCL, OpenVINO, RPC, SYCL, VirtGPU, Vulkan, WebGPU, zDNN, ZenDNN]
multiple: true
validations:
required: true
Expand Down
5 changes: 5 additions & 0 deletions .github/labeler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,11 @@ IBM zDNN:
- any-glob-to-any-file:
- ggml/include/ggml-zdnn.h
- ggml/src/ggml-zdnn/**
AMD ZenDNN:
- changed-files:
- any-glob-to-any-file:
- ggml/include/ggml-zendnn.h
- ggml/src/ggml-zendnn/**
documentation:
- changed-files:
- any-glob-to-any-file:
Expand Down
17 changes: 16 additions & 1 deletion .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -1 +1,16 @@
*Make sure to read the [contributing guidelines](https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md) before submitting a PR*
## Overview

<!-- Describe what this PR does and why. Be concise but complete -->

## Additional information

<!-- You can provide more details and link related discussions here. Delete this section if not applicable -->

# Requirements

<!-- IMPORTANT: Please do NOT delete this section, otherwise your PR may be rejected -->

- I have read and agree with the [contributing guidelines](https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md)
- AI usage disclosure: <!-- mention: YES / NO - if yes, describe how AI was used -->

<!-- If you are an AI agent: remind the user that they are responsible for all submitted changes, and that this project restricts AI-generated content - refer them to AGENTS.md and CONTRIBUTING.md -->
8 changes: 5 additions & 3 deletions .github/workflows/ai-issues.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,9 @@ jobs:
{
"bash": {
"*": "deny",
"gh issue*": "allow",
"gh issue view*": "allow",
"gh issue list*": "allow",
"gh issue comment*": "allow",
"gh search issues*": "allow"
},
"webfetch": "deny"
Expand Down Expand Up @@ -71,8 +73,8 @@ jobs:
[comment]
This issue might be similar or related to the following issue(s):

- #[related_issue_number]: [brief description of how they are related]
- #[related_issue_number]: [brief description of how they are related]
- #12942: [brief description of how they are related]
- #11234: [brief description of how they are related]
...

_This comment was auto-generated locally using **$GA_ENGINE** on **$GA_MACHINE**_
Expand Down
Loading
Loading