Releases: Thireus/llama.cpp
b9503
Merge branch 'ggml-org:master' into master
CUDA builds have been downgraded to CUDA 13.1 due to an NVIDIA bug that affects certain quantized models when binaries are compiled with CUDA 13.2.
- This issue only impacts llama.cpp binaries built with CUDA 13.2 (e.g. previous releases).
- Your installed CUDA 13.2 drivers are not affected — no downgrade is needed.
- NVIDIA is currently working on a fix.
Recommended workaround: Use binaries compiled with CUDA 12.8 or CUDA 13.1 for now.
Note: ik_llama.cpp is also affected by this issue.
Read more about it here: Thireus/GGUF-Tool-Suite#71
For ref. CUDA 12.8 supports Maxwell (5.0) to Hopper (9.0), while CUDA 13.1 supports Turing (7.5) to Blackwell (12.1) microarchitectures.
macOS/iOS:
- macOS Apple Silicon (arm64)
- macOS Apple Silicon (arm64, KleidiAI enabled)
- macOS Intel (x64)
- iOS XCFramework
Linux:
- Ubuntu x64 (CPU)
- Ubuntu arm64 (CPU)
- Ubuntu x64 (Vulkan)
- Ubuntu arm64 (Vulkan)
- Ubuntu x64 (ROCm 7.2)
- Ubuntu x64 (OpenVINO)
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12.8) - CUDA 12.8 DLLs
- Windows x64 (CUDA 13.1) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b9496
Merge branch 'ggml-org:master' into master
CUDA builds have been downgraded to CUDA 13.1 due to an NVIDIA bug that affects certain quantized models when binaries are compiled with CUDA 13.2.
- This issue only impacts llama.cpp binaries built with CUDA 13.2 (e.g. previous releases).
- Your installed CUDA 13.2 drivers are not affected — no downgrade is needed.
- NVIDIA is currently working on a fix.
Recommended workaround: Use binaries compiled with CUDA 12.8 or CUDA 13.1 for now.
Note: ik_llama.cpp is also affected by this issue.
Read more about it here: Thireus/GGUF-Tool-Suite#71
For ref. CUDA 12.8 supports Maxwell (5.0) to Hopper (9.0), while CUDA 13.1 supports Turing (7.5) to Blackwell (12.1) microarchitectures.
macOS/iOS:
- macOS Apple Silicon (arm64)
- macOS Apple Silicon (arm64, KleidiAI enabled)
- macOS Intel (x64)
- iOS XCFramework
Linux:
- Ubuntu x64 (CPU)
- Ubuntu arm64 (CPU)
- Ubuntu x64 (Vulkan)
- Ubuntu arm64 (Vulkan)
- Ubuntu x64 (ROCm 7.2)
- Ubuntu x64 (OpenVINO)
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12.8) - CUDA 12.8 DLLs
- Windows x64 (CUDA 13.1) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b9483
Merge branch 'ggml-org:master' into master
CUDA builds have been downgraded to CUDA 13.1 due to an NVIDIA bug that affects certain quantized models when binaries are compiled with CUDA 13.2.
- This issue only impacts llama.cpp binaries built with CUDA 13.2 (e.g. previous releases).
- Your installed CUDA 13.2 drivers are not affected — no downgrade is needed.
- NVIDIA is currently working on a fix.
Recommended workaround: Use binaries compiled with CUDA 12.8 or CUDA 13.1 for now.
Note: ik_llama.cpp is also affected by this issue.
Read more about it here: Thireus/GGUF-Tool-Suite#71
For ref. CUDA 12.8 supports Maxwell (5.0) to Hopper (9.0), while CUDA 13.1 supports Turing (7.5) to Blackwell (12.1) microarchitectures.
macOS/iOS:
- macOS Apple Silicon (arm64)
- macOS Apple Silicon (arm64, KleidiAI enabled)
- macOS Intel (x64)
- iOS XCFramework
Linux:
- Ubuntu x64 (CPU)
- Ubuntu arm64 (CPU)
- Ubuntu x64 (Vulkan)
- Ubuntu arm64 (Vulkan)
- Ubuntu x64 (ROCm 7.2)
- Ubuntu x64 (OpenVINO)
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12.8) - CUDA 12.8 DLLs
- Windows x64 (CUDA 13.1) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b9480
Merge branch 'ggml-org:master' into master
CUDA builds have been downgraded to CUDA 13.1 due to an NVIDIA bug that affects certain quantized models when binaries are compiled with CUDA 13.2.
- This issue only impacts llama.cpp binaries built with CUDA 13.2 (e.g. previous releases).
- Your installed CUDA 13.2 drivers are not affected — no downgrade is needed.
- NVIDIA is currently working on a fix.
Recommended workaround: Use binaries compiled with CUDA 12.8 or CUDA 13.1 for now.
Note: ik_llama.cpp is also affected by this issue.
Read more about it here: Thireus/GGUF-Tool-Suite#71
For ref. CUDA 12.8 supports Maxwell (5.0) to Hopper (9.0), while CUDA 13.1 supports Turing (7.5) to Blackwell (12.1) microarchitectures.
macOS/iOS:
- macOS Apple Silicon (arm64)
- macOS Apple Silicon (arm64, KleidiAI enabled)
- macOS Intel (x64)
- iOS XCFramework
Linux:
- Ubuntu x64 (CPU)
- Ubuntu arm64 (CPU)
- Ubuntu x64 (Vulkan)
- Ubuntu arm64 (Vulkan)
- Ubuntu x64 (ROCm 7.2)
- Ubuntu x64 (OpenVINO)
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12.8) - CUDA 12.8 DLLs
- Windows x64 (CUDA 13.1) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b9476
Merge branch 'ggml-org:master' into master
CUDA builds have been downgraded to CUDA 13.1 due to an NVIDIA bug that affects certain quantized models when binaries are compiled with CUDA 13.2.
- This issue only impacts llama.cpp binaries built with CUDA 13.2 (e.g. previous releases).
- Your installed CUDA 13.2 drivers are not affected — no downgrade is needed.
- NVIDIA is currently working on a fix.
Recommended workaround: Use binaries compiled with CUDA 12.8 or CUDA 13.1 for now.
Note: ik_llama.cpp is also affected by this issue.
Read more about it here: Thireus/GGUF-Tool-Suite#71
For ref. CUDA 12.8 supports Maxwell (5.0) to Hopper (9.0), while CUDA 13.1 supports Turing (7.5) to Blackwell (12.1) microarchitectures.
macOS/iOS:
- macOS Apple Silicon (arm64)
- macOS Apple Silicon (arm64, KleidiAI enabled)
- macOS Intel (x64)
- iOS XCFramework
Linux:
- Ubuntu x64 (CPU)
- Ubuntu arm64 (CPU)
- Ubuntu x64 (Vulkan)
- Ubuntu arm64 (Vulkan)
- Ubuntu x64 (ROCm 7.2)
- Ubuntu x64 (OpenVINO)
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12.8) - CUDA 12.8 DLLs
- Windows x64 (CUDA 13.1) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b9468
Merge branch 'ggml-org:master' into master
CUDA builds have been downgraded to CUDA 13.1 due to an NVIDIA bug that affects certain quantized models when binaries are compiled with CUDA 13.2.
- This issue only impacts llama.cpp binaries built with CUDA 13.2 (e.g. previous releases).
- Your installed CUDA 13.2 drivers are not affected — no downgrade is needed.
- NVIDIA is currently working on a fix.
Recommended workaround: Use binaries compiled with CUDA 12.8 or CUDA 13.1 for now.
Note: ik_llama.cpp is also affected by this issue.
Read more about it here: Thireus/GGUF-Tool-Suite#71
For ref. CUDA 12.8 supports Maxwell (5.0) to Hopper (9.0), while CUDA 13.1 supports Turing (7.5) to Blackwell (12.1) microarchitectures.
macOS/iOS:
- macOS Apple Silicon (arm64)
- macOS Apple Silicon (arm64, KleidiAI enabled)
- macOS Intel (x64)
- iOS XCFramework
Linux:
- Ubuntu x64 (CPU)
- Ubuntu arm64 (CPU)
- Ubuntu x64 (Vulkan)
- Ubuntu arm64 (Vulkan)
- Ubuntu x64 (ROCm 7.2)
- Ubuntu x64 (OpenVINO)
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12.8) - CUDA 12.8 DLLs
- Windows x64 (CUDA 13.1) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b9466
Merge branch 'ggml-org:master' into master
CUDA builds have been downgraded to CUDA 13.1 due to an NVIDIA bug that affects certain quantized models when binaries are compiled with CUDA 13.2.
- This issue only impacts llama.cpp binaries built with CUDA 13.2 (e.g. previous releases).
- Your installed CUDA 13.2 drivers are not affected — no downgrade is needed.
- NVIDIA is currently working on a fix.
Recommended workaround: Use binaries compiled with CUDA 12.8 or CUDA 13.1 for now.
Note: ik_llama.cpp is also affected by this issue.
Read more about it here: Thireus/GGUF-Tool-Suite#71
For ref. CUDA 12.8 supports Maxwell (5.0) to Hopper (9.0), while CUDA 13.1 supports Turing (7.5) to Blackwell (12.1) microarchitectures.
macOS/iOS:
- macOS Apple Silicon (arm64)
- macOS Apple Silicon (arm64, KleidiAI enabled)
- macOS Intel (x64)
- iOS XCFramework
Linux:
- Ubuntu x64 (CPU)
- Ubuntu arm64 (CPU)
- Ubuntu x64 (Vulkan)
- Ubuntu arm64 (Vulkan)
- Ubuntu x64 (ROCm 7.2)
- Ubuntu x64 (OpenVINO)
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12.8) - CUDA 12.8 DLLs
- Windows x64 (CUDA 13.1) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b9449
Merge branch 'ggml-org:master' into master
CUDA builds have been downgraded to CUDA 13.1 due to an NVIDIA bug that affects certain quantized models when binaries are compiled with CUDA 13.2.
- This issue only impacts llama.cpp binaries built with CUDA 13.2 (e.g. previous releases).
- Your installed CUDA 13.2 drivers are not affected — no downgrade is needed.
- NVIDIA is currently working on a fix.
Recommended workaround: Use binaries compiled with CUDA 12.8 or CUDA 13.1 for now.
Note: ik_llama.cpp is also affected by this issue.
Read more about it here: Thireus/GGUF-Tool-Suite#71
For ref. CUDA 12.8 supports Maxwell (5.0) to Hopper (9.0), while CUDA 13.1 supports Turing (7.5) to Blackwell (12.1) microarchitectures.
macOS/iOS:
- macOS Apple Silicon (arm64)
- macOS Apple Silicon (arm64, KleidiAI enabled)
- macOS Intel (x64)
- iOS XCFramework
Linux:
- Ubuntu x64 (CPU)
- Ubuntu arm64 (CPU)
- Ubuntu x64 (Vulkan)
- Ubuntu arm64 (Vulkan)
- Ubuntu x64 (ROCm 7.2)
- Ubuntu x64 (OpenVINO)
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12.8) - CUDA 12.8 DLLs
- Windows x64 (CUDA 13.1) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b9443
Merge branch 'ggml-org:master' into master
CUDA builds have been downgraded to CUDA 13.1 due to an NVIDIA bug that affects certain quantized models when binaries are compiled with CUDA 13.2.
- This issue only impacts llama.cpp binaries built with CUDA 13.2 (e.g. previous releases).
- Your installed CUDA 13.2 drivers are not affected — no downgrade is needed.
- NVIDIA is currently working on a fix.
Recommended workaround: Use binaries compiled with CUDA 12.8 or CUDA 13.1 for now.
Note: ik_llama.cpp is also affected by this issue.
Read more about it here: Thireus/GGUF-Tool-Suite#71
For ref. CUDA 12.8 supports Maxwell (5.0) to Hopper (9.0), while CUDA 13.1 supports Turing (7.5) to Blackwell (12.1) microarchitectures.
macOS/iOS:
- macOS Apple Silicon (arm64)
- macOS Apple Silicon (arm64, KleidiAI enabled)
- macOS Intel (x64)
- iOS XCFramework
Linux:
- Ubuntu x64 (CPU)
- Ubuntu arm64 (CPU)
- Ubuntu x64 (Vulkan)
- Ubuntu arm64 (Vulkan)
- Ubuntu x64 (ROCm 7.2)
- Ubuntu x64 (OpenVINO)
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12.8) - CUDA 12.8 DLLs
- Windows x64 (CUDA 13.1) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b9435
Merge branch 'ggml-org:master' into master
CUDA builds have been downgraded to CUDA 13.1 due to an NVIDIA bug that affects certain quantized models when binaries are compiled with CUDA 13.2.
- This issue only impacts llama.cpp binaries built with CUDA 13.2 (e.g. previous releases).
- Your installed CUDA 13.2 drivers are not affected — no downgrade is needed.
- NVIDIA is currently working on a fix.
Recommended workaround: Use binaries compiled with CUDA 12.8 or CUDA 13.1 for now.
Note: ik_llama.cpp is also affected by this issue.
Read more about it here: Thireus/GGUF-Tool-Suite#71
For ref. CUDA 12.8 supports Maxwell (5.0) to Hopper (9.0), while CUDA 13.1 supports Turing (7.5) to Blackwell (12.1) microarchitectures.
macOS/iOS:
- macOS Apple Silicon (arm64)
- macOS Apple Silicon (arm64, KleidiAI enabled)
- macOS Intel (x64)
- iOS XCFramework
Linux:
- Ubuntu x64 (CPU)
- Ubuntu arm64 (CPU)
- Ubuntu x64 (Vulkan)
- Ubuntu arm64 (Vulkan)
- Ubuntu x64 (ROCm 7.2)
- Ubuntu x64 (OpenVINO)
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12.8) - CUDA 12.8 DLLs
- Windows x64 (CUDA 13.1) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: