Skip to content

fix(deps): update module github.com/ollama/ollama to v0.17.1 [security]#92

Open
renovate[bot] wants to merge 1 commit into
mainfrom
renovate/go-github.com-ollama-ollama-vulnerability
Open

fix(deps): update module github.com/ollama/ollama to v0.17.1 [security]#92
renovate[bot] wants to merge 1 commit into
mainfrom
renovate/go-github.com-ollama-ollama-vulnerability

Conversation

@renovate
Copy link
Copy Markdown
Contributor

@renovate renovate Bot commented May 8, 2026

ℹ️ Note

This PR body was truncated due to platform limits.

This PR contains the following updates:

Package Change Age Confidence
github.com/ollama/ollama v0.3.6v0.17.1 age confidence

Ollama contains a heap out-of-bounds read vulnerability in the GGUF model loader

CVE-2026-7482 / GHSA-x8qc-fggm-mpqg

More information

Details

Ollama before 0.17.1 contains a heap out-of-bounds read vulnerability in the GGUF model loader. The /api/create endpoint accepts an attacker-supplied GGUF file in which the declared tensor offset and size exceed the file's actual length; during quantization in fs/ggml/gguf.go and server/quantization.go (WriteTo()), the server reads past the allocated heap buffer. The leaked memory contents may include environment variables, API keys, system prompts, and concurrent users' conversation data, and can be exfiltrated by uploading the resulting model artifact through the /api/push endpoint to an attacker-controlled registry. The /api/create and /api/push endpoints have no authentication in the upstream distribution. Default deployments bind to 127.0.0.1, but the documented OLLAMA_HOST=0.0.0.0 configuration is widely used in practice (large public-internet exposure observed).

Severity

  • CVSS Score: 8.8 / 10 (High)
  • Vector String: CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:H/VI:N/VA:H/SC:N/SI:N/SA:N/AU:Y/R:A/V:D/RE:L/U:Red

References

This data is provided by the GitHub Advisory Database (CC-BY 4.0).


Release Notes

ollama/ollama (github.com/ollama/ollama)

v0.17.1

Compare Source

What's Changed

  • Nemotron architecture support in Ollama's engine
  • MLX engine now has improved memory usage
  • Ollama's app will now allow models that support tools to use web search capabilities
  • Improved LFM2 and LFM2.5 models in Ollama's engine
  • ollama create will no longer default to affine quantization for unquantized models when using the MLX engine
  • Added configuration for disabling automatic update downloading

Full Changelog: ollama/ollama@v0.17.0...v0.17.1

v0.17.0

Compare Source

OpenClaw

OpenClaw can now be installed and configured automatically via Ollama, making it the easiest way to get up and running with OpenClaw with open models like Kimi-K2.5, GLM-5, and Minimax-M2.5.

Get started

ollama launch openclaw

oc1
Web search in OpenClaw

When using cloud models, websearch is enabled - allowing OpenClaw to search the internet.

oc3

What's Changed

  • Improved tokenizer performance
  • Ollama's macOS and Windows apps will now default to a context length based on available VRAM

New Contributors

Full Changelog: ollama/ollama@v0.16.3...v0.17.0

v0.16.3

Compare Source

What's Changed

  • New ollama launch cline added for the Cline CLI
  • ollama launch <integration> will now always show the model picker
  • Added Gemma 3, Llama and Qwen 3 architectures to MLX runner

New Contributors

Full Changelog: ollama/ollama@v0.16.2...v0.16.3

v0.16.2

Compare Source

What's Changed

  • ollama launch claude now supports searching the web when using :cloud models
  • Fixed rendering issue when running ollama in PowerShell
  • New setting in Ollama's app makes it easier to disable cloud models for sensitive and private tasks where data cannot leave your computer. For Linux or when running ollama serve manually, set OLLAMA_NO_CLOUD=1.
  • Fixed issue where experimental image generation models would not run in 0.16.0 and 0.16.1

Full Changelog: ollama/ollama@v0.16.1...v0.16.2-rc0

v0.16.1

Compare Source

What's Changed

  • Installing Ollama via the curl install script on macOS will now only prompt for your password if its required
  • Installing Ollama via the iem install script in Windows will now show progress
  • Image generation models will now respect the OLLAMA_LOAD_TIMEOUT variable

Full Changelog: ollama/ollama@v0.16.0...v0.16.1

v0.16.0

Compare Source

New models

  • GLM-5: A strong reasoning and agentic model from Z.ai with 744B total parameters (40B active), built for complex systems engineering and long-horizon tasks.
  • MiniMax-M2.5: a new state-of-the-art large language model designed for real-world productivity and coding tasks.

New ollama

The new ollama command makes it easy to launch your favorite apps with models using Ollama

Ollama screenshot 2026-02-12 at 04 48 55@​2x

What's Changed

  • Launch Pi with ollama launch pi
  • Improvements to Ollama's MLX runner to support GLM-4.7-Flash
  • Ctrl+G will now allow for editing text prompts in a text editor when running a model

Full Changelog: ollama/ollama@v0.15.6...v0.16.0

v0.15.6

Compare Source

What's Changed

  • Fixed context limits when running ollama launch droid
  • ollama launch will now download missing models instead of erroring
  • Fixed bug where ollama launch claude would cause context compaction when providing images

Full Changelog: ollama/ollama@v0.15.5...v0.15.6

v0.15.5

Compare Source

New models

  • Qwen3-Coder-Next: a coding-focused language model from Alibaba's Qwen team, optimized for agentic coding workflows and local development.
  • GLM-OCR: GLM-OCR is a multimodal OCR model for complex document understanding, built on the GLM-V encoder–decoder architecture.

Improvements to ollama launch

  • ollama launch can now be provided arguments, for example ollama launch claude -- --resume
  • ollama launch will now work run subagents when using ollama launch claude
  • Ollama will now set context limits for a set of models when using ollama launch opencode

What's Changed

  • Sub-agent support for ollama launch for planning, deep research, and similar tasks
  • ollama signin will now open a browser window to make signing in easier
  • Ollama will now default to the following context lengths based on VRAM:
    • < 24 GiB VRAM: 4,096 context
    • 24-48 GiB VRAM: 32,768 context
    • >= 48 GiB VRAM: 262,144 context
  • GLM-4.7-Flash support on Ollama's experimental MLX engine
  • ollama signin will now open the browser to the connect page
  • Fixed off by one error when using num_predict in the API
  • Fixed issue where tokens from a previous sequence would be returned when hitting num_predict

New Contributors

Full Changelog: ollama/ollama@v0.15.4...v0.15.5

v0.15.4

Compare Source

What's Changed

  • ollama launch openclaw will now enter the standard OpenClaw onboarding flow if this has not yet been completed.

Full Changelog: ollama/ollama@v0.15.3...v0.15.4

v0.15.3

Compare Source

What's Changed

  • Renamed ollama launch clawdbot to ollama launch openclaw to reflect the project's new name
  • Improved tool calling for Ministral models
  • docs: add clawdbot by @​ParthSareen in #​13925
  • cmd/config: Use envconfig.Host() for base API in launch config packages by @​gabe-l-hart in #​13937
  • ollama launch will now use the value of OLLAMA_HOST when running it

New Contributors

Full Changelog: ollama/ollama@v0.15.2...v0.15.3

v0.15.2

Compare Source

Ollama screenshot 2026-01-26 at 17 53 40@​2x (1)

What's Changed

  • New ollama launch clawdbot command for launching Clawdbot using Ollama models

Full Changelog: ollama/ollama@v0.15.1...v0.15.2

v0.15.1

Compare Source

What's Changed

  • GLM-4.7-Flash performance and correctness improvements, fixing repetitive answers and tool calling quality
  • Fixed performance issues on macOS and arm64 Linux
  • Fixed issue where ollama launch would not detect claude and would incorrectly update opencode configurations

New Contributors

Full Changelog: ollama/ollama@v0.15.0...v0.15.1

v0.15.0

Compare Source

An image of Ollama building rapidly on the computer. Build with Ollama!

ollama launch

A new ollama launch command to use Ollama's models with Claude Code, Codex, OpenCode, and Droid without separate configuration.

What's Changed

  • New ollama launch command for Claude Code, Codex, OpenCode, and Droid
  • Fixed issue where creating multi-line strings with """ would not work when using ollama run
  • Ctrl+J and Shift+Enter now work for inserting newlines in ollama run
  • Reduced memory usage for GLM-4.7-Flash models

v0.14.3

Compare Source

Ollama screenshot 2026-01-20 at 23 41 54@​2x
  • Z-Image Turbo: 6 billion parameter text-to-image model from Alibaba’s Tongyi Lab. It generates high-quality photorealistic images.
  • Flux.2 Klein: Black Forest Labs’ fastest image-generation models to date.

New models

  • GLM-4.7-Flash: As the strongest model in the 30B class, GLM-4.7-Flash offers a new option for lightweight deployment that balances performance and efficiency.
  • LFM2.5-1.2B-Thinking: LFM2.5 is a new family of hybrid models designed for on-device deployment.

What's Changed

  • Fixed issue where Ollama's macOS app would interrupt system shutdown
  • Fixed ollama create and ollama show commands for experimental models
  • The /api/generate API can now be used for image generation
  • Fixed minor issues in Nemotron-3-Nano tool parsing
  • Fixed issue where removing an image generation model would cause it to first load
  • Fixed issue where ollama rm would only stop the first model in the list if it were running

Full Changelog: ollama/ollama@v0.14.2...v0.14.3

v0.14.2

Compare Source

New models

  • TranslateGemma: A new collection of open translation models built on Gemma 3, helping people communicate across 55 languages.

What's Changed

  • Shift + Enter (or Ctrl + j) will now enter a newline in Ollama's CLI
  • Improve /v1/responses API to better confirm to OpenResponses specification

New Contributors

Full Changelog: ollama/ollama@v0.14.1...v0.14.2

v0.14.1

Compare Source

Image generation models (experimental)

Experimental image generation models are available for macOS and Linux (CUDA) in Ollama:

Available models
ollama run x/z-image-turbo

Note: x is a username on ollama.com where experimental models are uploaded

More models coming soon:

  1. Qwen-Image-2512
  2. Qwen-Image-Edit-2511
  3. GLM-Image

What's Changed

  • fix macOS auto-update signature verification failure

New Contributors

Full Changelog: ollama/ollama@v0.14.0...v0.14.1

v0.14.0

Compare Source

What's Changed

  • ollama run --experimental CLI will now open a new Ollama CLI that includes an agent loop and the bash tool
  • Anthropic API compatibility: support for the /v1/messages API
  • A new REQUIRES command for the Modelfile allows declaring which version of Ollama is required for the model
  • For older models, Ollama will avoid an integer underflow on low VRAM systems during memory estimation
  • More accurate VRAM measurements for AMD iGPUs
  • Ollama's app will now highlight swift source code
  • An error will now return when embeddings return NaN or -Inf
  • Ollama's Linux install bundles files now use zst compression
  • New experimental support for image generation models, powered by MLX

New Contributors

Full Changelog: ollama/ollama@v0.13.5...v0.14.0-rc2

v0.13.5

Compare Source

New Models

  • Google's FunctionGemma a specialized version of Google's Gemma 3 270M model fine-tuned explicitly for function calling.

What's Changed

  • bert architecture models now run on Ollama's engine
  • Added built-in renderer & tool parsing capabilities for DeepSeek-V3.1
  • Fixed issue where nested properties in tools may not have been rendered properly

New Contributors

Full Changelog: ollama/ollama@v0.13.4...v0.13.5

v0.13.4

Compare Source

New Models

  • Nemotron 3 Nano: A new Standard for Efficient, Open, and Intelligent Agentic Models
  • Olmo 3 and Olmo 3.1: A series of Open language models designed to enable the science of language models. These models are pre-trained on the Dolma 3 dataset and post-trained on the Dolci datasets.

What's Changed

  • Enable Flash Attention automatically for models by default
  • Fixed handling of long contexts with Gemma 3 models
  • Fixed issue that would occur with Gemma 3 QAT models or other models imported with the Gemma 3 architecture

New Contributors

Full Changelog: ollama/ollama@v0.13.3...v0.13.4-rc0

v0.13.3

Compare Source

New models

  • Devstral-Small-2: 24B model that excels at using tools to explore codebases, editing multiple files and power software engineering agents.
  • rnj-1: Rnj-1 is a family of 8B parameter open-weight, dense models trained from scratch by Essential AI, optimized for code and STEM with capabilities on par with SOTA open-weight models.
  • nomic-embed-text-v2: nomic-embed-text-v2-moe is a multilingual MoE text embedding model that excels at multilingual retrieval.

What's Changed

  • Improved truncation logic when using /api/embed and /v1/embeddings
  • Extend Gemma 3 architecture to support rnj-1 model
  • Fix error that would occur when running qwen2.5vl with image input

Full Changelog: ollama/ollama@v0.13.2...v0.13.3

v0.13.2

Compare Source

New models

  • Qwen3-Next: The first installment in the Qwen3-Next series with strong performance in terms of both parameter efficiency and inference speed.

What's Changed

  • Flash attention is now enabled by default for vision models such as mistral-3, gemma3, qwen3-vl and more. This improves memory utilization and performance when providing images as input.
  • Fixed GPU detection on multi-GPU CUDA machines
  • Fixed issue where deepseek-v3.1 would always think even with thinking is disabled in Ollama's app

New Contributors

Full Changelog: ollama/ollama@v0.13.1...v0.13.2

v0.13.1

Compare Source

New models

  • Ministral-3: The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware.
  • Mistral-Large-3: A general-purpose multimodal mixture-of-experts model for production-grade tasks and enterprise workloads.

What's Changed

  • nomic-embed-text will now use Ollama's engine by default
  • Tool calling support for cogito-v2.1
  • Fixed issues with CUDA VRAM discovery
  • Fixed link to docs in Ollama's app
  • Fixed issue where models would be evicted on CPU-only systems
  • Ollama will now better render errors instead of showing Unmarshal: errors
  • Fixed issue where CUDA GPUs would fail to be detected with older GPUs
  • Added thinking and tool parsing for cogito-v2.1

New Contributors

Full Changelog: ollama/ollama@v0.13.0...v0.13.1

v0.13.0

Compare Source

New models

  • DeepSeek-OCR: DeepSeek-OCR uses optical 2D mapping to compress long contexts, achieving high OCR precision with reduced vision tokens and demonstrating practical value in document processing.
  • Cogito-V2.1: instruction tuned generative models, currently the best open-weight LLM by a US company

DeepSeek-OCR

DeepSeek-OCR is now available on Ollama. Example inputs:

ollama run deepseek-ocr "/path/to/image\n<|grounding|>Given the layout of the image."
ollama run deepseek-ocr "/path/to/image\nFree OCR."
ollama run deepseek-ocr "/path/to/image\nParse the figure."
ollama run deepseek-ocr "/path/to/image\nExtract the text in the image."
ollama run deepseek-ocr "/path/to/image\n<|grounding|>Convert the document to markdown."

New bench tool

Ollama's GitHub repo now includes a bench tool that can be used to test model performance. For the time being this is a separate tool that can be built in the Ollama GitHub repository:

First, install Go. Then from the root of the Ollama repository run:

go run ./cmd/bench -model gpt-oss:20b

For more information see the tool's documentation

What's Changed

  • DeepSeek-OCR is now supported
  • DeepSeek-V3.1 architecture is now supported in Ollama's engine
  • Fixed performance issues that arose in Ollama 0.12.11 on CUDA
  • Fixed issue where Linux install packages were missing required Vulkan libraries
  • Improved CPU and memory detection while in containers/cgroups
  • Improved VRAM information detection for AMD GPUs
  • Improved KV cache performance to no longer require defragmentation

New Contributors

Full Changelog: ollama/ollama@v0.12.11...v0.13.0

v0.12.11

Compare Source

Logprobs

Ollama's API and OpenAI-compatible API now support log probabilities. Log probabilities of output tokens indicate the likelihood of each token occurring in the sequence given the context. This is useful for different use cases:

  1. Classification tasks
  2. Retrieval (Q&A) evaluation
  3. Autocomplete
  4. Token highlighting and outputting bytes
  5. Calculating perplexity

To enable Logprobs, provide "logprobs": true to Ollama's API:

curl http://localhost:11434/api/generate -d '{
  "model": "gemma3",
  "prompt": "Why is the sky blue?",
  "logprobs": true
}'

When log probabilities are requested, response chunks will now include a "logprobs" field with the token, log probability and raw bytes (for partial unicode).

{
  "model": "gemma3",
  "created_at": "2025-11-14T22:17:56.598562Z",
  "response": "Okay",
  "done": false,
  "logprobs": [
    {
      "token": "Okay",
      "logprob": -1.3434503078460693,
      "bytes": [
        79,
        107,
        97,
        121
      ]
    }
  ]
}
top_logprobs

When setting "top_logprobs", a number of most-likely tokens are also provided, making it possible to introspect alternative tokens. Below is an example request.

curl http://localhost:11434/api/generate -d '{
  "model": "gemma3",
  "prompt": "Why is the sky blue?",
  "logprobs": true,
  "top_logprobs": 3
}'

This will generate a stream of response chunks with the following fields:

{
  "model": "gemma3",
  "created_at": "2025-11-14T22:26:10.466324Z",
  "response": "The",
  "done": false,
  "logprobs": [
    {
      "token": "The",
      "logprob": -0.8361086845397949,
      "bytes": [
        84,
        104,
        101
      ],
      "top_logprobs": [
        {
          "token": "The",
          "logprob": -0.8361086845397949,
          "bytes": [
            84,
            104,
            101
          ]
        },
        {
          "token": "Okay",
          "logprob": -1.2590975761413574,
          "bytes": [
            79,
            107,
            97,
            121
          ]
        },
        {
          "token": "That",
          "logprob": -1.2686877250671387,
          "bytes": [
            84,
            104,
            97,
            116
          ]
        }
      ]
    }
  ]
}
Special thanks

Thank you @​baptistejamin for adding Logprobs to Ollama's API.

Vulkan support (opt-in)

Ollama 0.12.11 includes support for Vulkan acceleration. Vulkan brings support for a broad range of GPUs from AMD, Intel, and iGPUs. Vulkan support is not yet enabled by default, and requires opting in by running Ollama with a custom environment variable:

OLLAMA_VULKAN=1 ollama serve

On Powershell, use:

$env:OLLAMA_VULKAN="1"
ollama serve

For issues or feedback on using Vulkan with Ollama, create an issue labelled Vulkan and make sure to include server logs where possible to aid in debugging.

What's Changed

  • Ollama's API and the OpenAI-compatible API now supports Logprobs
  • Ollama's new app now supports WebP images
  • Improved rendering performance in Ollama's new app, especially when rendering code
  • The "required" field in tool definitions will now be omitted if not specified
  • Fixed issue where "tool_call_id" would be omitted when using the OpenAI-compatible API.
  • Fixed issue where ollama create would import data from both consolidated.safetensors and other safetensor files.
  • Ollama will now prefer dedicated GPUs over iGPUs when scheduling models
  • Vulkan can now be enabled by setting OLLAMA_VULKAN=1. For example: OLLAMA_VULKAN=1 ollama serve

New Contributors

Full Changelog: ollama/ollama@v0.12.10...v0.12.11

v0.12.10

Compare Source

ollama run now works with embedding models

ollama run can now run embedding models to generate vector embeddings from text:

ollama run embeddinggemma "Hello world"

Content can also be provided to ollama run via standard input:

echo "Hello world" | ollama run embeddinggemma

What's Changed

  • Fixed errors when running qwen3-vl:235b and qwen3-vl:235b-instruct
  • Enable flash attention for Vulkan (currently needs to be built from source)
  • Add Vulkan memory detection for Intel GPU using DXGI+PDH
  • Ollama will now return tool call IDs from the /api/chat API
  • Fixed hanging due to CPU discovery
  • Ollama will now show login instructions when switching to a cloud model in interactive mode
  • Fix reading stale VRAM data
  • ollama run now works with embedding models

New Contributors

Full Changelog: ollama/ollama@v0.12.9...v0.12.10

v0.12.9

Compare Source

What's Changed

  • Fix performance regression on CPU-only systems

Full Changelog: ollama/ollama@v0.12.8...v0.12.9

v0.12.8

Compare Source

Ollama_halloween_background

What's Changed

  • qwen3-vl performance improvements, including flash attention support by default
  • qwen3-vl will now output less leading whitespace in the response when thinking
  • Fixed issue where deepseek-v3.1 thinking could not be disabled in Ollama's new app
  • Fixed issue where qwen3-vl would fail to interpret images with transparent backgrounds
  • Ollama will now stop running a model before removing it via ollama rm
  • Fixed issue where prompt processing would be slower on Ollama's engine
  • Ignore unsupported iGPUs when doing device discovery on Windows

New Contributors

Full Changelog: ollama/ollama@v0.12.7...v0.12.8

v0.12.7

Compare Source

Ollama screenshot 2025-10-29 at 13 56 55@​2x

New models

  • Qwen3-VL: Qwen3-VL is now available in all parameter sizes ranging from 2B to 235B
  • MiniMax-M2: a 230 Billion parameter model built for coding & agentic workflows available on Ollama's cloud

Add files and adjust thinking levels in Ollama's new app

Ollama's new app now includes a way to add one or many files when prompting the model:

Screenshot 2025-10-29 at 2 16 55 PM

For better responses, thinking levels can now be adjusted for the gpt-oss models:

Screenshot 2025-10-29 at 2 12 33 PM

New API documentation

New API documentation is available for Ollama's API: https://docs.ollama.com/api

Screenshot 2025-10-29 at 4 02 53 PM

What's Changed

  • Model load failures now include more information on Windows
  • Fixed embedding results being incorrect when running embeddinggemma
  • Fixed gemma3n on Vulkan backend
  • Increased time allocated for ROCm to discover devices
  • Fixed truncation error when generating embeddings
  • Fixed request status code when running cloud models
  • The OpenAI-compatible /v1/embeddings endpoint now supports encoding_format parameter
  • Ollama will now parse tool calls that don't conform to {"name": name, "arguments": args} (thanks @​rick-github!)
  • Fixed prompt processing reporting in the llama runner
  • Increase speed when scheduling models
  • Fixed issue where FROM <model> would not inherit RENDERER or PARSER commands

New Contributors

Full Changelog: ollama/ollama@v0.12.6...v0.12.7

v0.12.6

Compare Source

What's Changed

  • Ollama's app now supports searching when running DeepSeek-V3.1, Qwen3 and other models that support tool calling.
  • Flash attention is now enabled by default for Gemma 3, improving performance and memory utilization
  • Fixed issue where Ollama would hang while generating responses
  • Fixed issue where qwen3-coder would act in raw mode when using /api/generate or ollama run qwen3-coder <prompt>
  • Fixed qwen3-embedding providing invalid results
  • Ollama will now evict models correctly when num_gpu is set
  • Fixed issue where tool_index with a value of 0 would not be sent to the model

Experimental Vulkan Support

Experimental support for Vulkan is now available when you build locally from source. This will enable additional GPUs from AMD, and Intel which are not currently supported by Ollama. To build locally, install the Vulkan SDK and set VULKAN_SDK in your environment, then follow the developer instructions. In a future release, Vulkan support will be included in the binary release as well. Please file issues if you run into any problems.

New Contributors

Full Changelog: ollama/ollama@v0.12.5...v0.12.6

v0.12.5

Compare Source

What's Changed

  • Thinking models now support structured outputs when using the /api/chat API
  • Ollama's app will now wait until Ollama is running to allow for a conversation to be started
  • Fixed issue where "think": false would show an error instead of being silently ignored
  • Fixed deepseek-r1 output issues
  • macOS 12 Monterey and macOS 13 Ventura are no longer supported
  • AMD gfx900 and gfx906 (MI50, MI60, etc) GPUs are no longer supported via ROCm. We're working to support these GPUs via Vulkan in a future release.

New Contributors

Full Changelog: ollama/ollama@v0.12.4...v0.12.5-rc0

v0.12.4

Compare Source

What's Changed

  • Flash attention is now enabled by default for Qwen 3 and Qwen 3 Coder
  • Fixed minor memory estimation issues when scheduling models on NVIDIA GPUs
  • Fixed an issue where keep_alive in the API would accept different values for the /api/chat and /api/generate endpoints
  • Fixed tool calling rendering with qwen3-coder
  • More reliable and accurate VRAM detection
  • OLLAMA_FLASH_ATTENTION can now be overridden to 0 for models that have flash attention enabled by default
  • macOS 12 Monterey and macOS 13 Ventura are no longer supported
  • Fixed crash where templates were not correctly defined
  • Fix memory calculations on NVIDIA iGPUs
  • AMD gfx900 and gfx906 (MI50, MI60, etc) GPUs are no longer supported via ROCm. We're working to support these GPUs via Vulkan in a future release.

New Contributors

Full Changelog: ollama/ollama@v0.12.3...v0.12.4-rc3

v0.12.3

Compare Source

New models

  • DeepSeek-V3.1-Terminus: DeepSeek-V3.1-Terminus is a hybrid model that supports both thinking mode and non-thinking mode. It delivers more stable & reliable outputs across benchmarks compared to the previous version:

    Run on Ollama's cloud:

    ollama run deepseek-v3.1:671b-cloud
    

    Run locally (requires 500GB+ of VRAM)

    ollama run deepseek-v3.1
    
  • Kimi-K2-Instruct-0905: Kimi K2-Instruct-0905 is the latest, most capable version of Kimi K2. It is a state-of-the-art mixture-of-experts (MoE) language model, featuring 32 billion activated parameters and a total of 1 trillion parameters.

    ollama run kimi-k2:1t-cloud
    

What's Changed

  • Fixed issue where tool calls provided as stringified JSON would not be parsed correctly
  • ollama push will now provide a URL to follow to sign in
  • Fixed issues where qwen3-coder would output unicode characters incorrectly
  • Fix issue where loading a model with /load would crash

New Contributors

Full Changelog: ollama/ollama@v0.12.2...v0.12.3

v0.12.2

Compare Source

Web search

ollama_web_search

A new web search API is now available in Ollama. Ollama provides a generous free tier of web searches for individuals to use, and higher rate limits are available via Ollama’s cloud. This web search capability can augment models with the latest information from the web to reduce hallucinations and improve accuracy.

What's Changed

  • Models with Qwen3's architecture including MoE now run in Ollama's new engine
  • Fixed issue where built-in tools for gpt-oss were not being rendered correctly
  • Support multi-regex pretokenizers in Ollama's new engine
  • Ollama's new engine can now load tensors by matching a prefix or suffix

Full Changelog: ollama/ollama@v0.12.1...v0.12.2

v0.12.1

Compare Source

New models

What's Changed

  • Qwen3-Coder now supports tool calling
  • Ollama's app will now longer show "connection lost" in error when connecting to cloud models
  • Fixed issue where Gemma3 QAT models would not output correct tokens
  • Fix issue where & characters in Qwen3-Coder would not be parsed correctly when function calling
  • Fixed issues where ollama signin would not work properly on Linux

Full Changelog: ollama/ollama@v0.12.0...v0.12.1

v0.12.0

Compare Source

Cloud models

Ollama_cloud_background

Cloud models are now available in preview, allowing you to run a group of larger models with fast, datacenter-grade hardware.

To run a cloud model, use:

ollama run qwen3-coder:480b-cloud

What's Changed

  • Models with the Bert architecture now run on Ollama's engine
  • Models with the Qwen 3 architecture now run on Ollama's engine
  • Fix issue where older NVIDIA GPUs would not be detected if newer drivers were installed
  • Fixed issue where models would not be imported correctly with ollama create
  • Ollama will skip parsing the initial <think> if provided in the prompt for /api/generate by @​rick-github

New Contributors

Full Changelog: ollama/ollama@v0.11.11...v0.12.0

v0.11.11

Compare Source

What's Changed

  • Support for CUDA 13
  • Improved memory usage when using gpt-oss in Ollama's app
  • Better scrolling better in Ollama's app when submitting long prompts
  • Cmd +/- will now zoom and shrink text in Ollama's app
  • Assistant messages can now by copied in Ollama's app
  • Fixed error that would occur when attempting to import satefensor files by @​rick-github in #​12176
  • Improved memory estimates for hybrid and recurrent models by @​gabe-l-hart in #​12186
  • Fixed error that would occur when when batch size was greater than context length
  • Flash attention & KV cache quantization validation fixes by @​jessegross in #​12231
  • Add dimensions field to embed requests by @​mxyng in #​12242
  • Enable new memory estimates in Ollama's new engine by default by @​jessegross in #​12252
  • Ollama will no longer load split vision models in the Ollama engine by @​jessegross in #​12241

New Contributors

Full Changelog: ollama/ollama@v0.11.10...v0.11.11

v0.11.10

Compare Source

New models

  • EmbeddingGemma a new open embedding model that delivers best-in-class performance for its size

What's Changed

  • Support for EmbeddingGemma

Full Changelog: ollama/ollama@v0.11.9...v0.11.10

v0.11.9

Compare Source

What's Changed

  • Improved performance via overlapping GPU and CPU computations
  • Fixed issues where unrecognized AMD GPU would cause an error
  • Reduce crashes due to unhandled errors in some Mac and Linux installations of Ollama

New Contributors

Full Changelog: ollama/ollama@v0.11.8...v0.11.9-rc0

v0.11.8

Compare Source

What's Changed

  • gpt-oss now has flash attention enabled by default for systems that support it
  • Improved load times for gpt-oss

Full Changelog: ollama/ollama@v0.11.7...v0.11.8

v0.11.7

Compare Source

DeepSeek-V3.1

DeepSeek-V3.1 is now available to run via Ollama.

This model supports hybrid thinking, meaning thinking can be enabled or disabled by setting think in Ollama's API:

curl http://localhost:11434/api/chat -d '{
  "model": "deepseek-v3.1",
  "messages": [
    {
      "role": "user",
      "content": "why is the sky blue?"
    }
  ],
  "think": true
}'

In Ollama's CLI, thinking can be enabled or disabled by running the /set think or /set nothink commands.

Turbo (in preview)

DeepSeek-V3.1 has over 671B parameters, and so a large amount of VRAM is required to run it. Ollama's Turbo mode (in preview) provides access to powerful hardware in the cloud you can use to run the model.

Turbo via Ollama's app
Screenshot 2025-08-25 at 1 23 37 PM
  1. Download Ollama for macOS or Windows
  2. Select deepseek-v3.1:671b from the model selector
  3. Enable Turbo
Turbo via Ollama's CLI and libraries
  1. Create an account on ollama.com/signup
  2. Follow the docs for Ollama's CLI to upload authenticate your Ollama installation
  3. Run the following:
OLLAMA_HOST=ollama.com ollama run deepseek-v3.1

For instructions on using Turbo with Ollama's Python and JavaScript library, see the docs

What's Changed

  • Fixed issue where multiple models would not be loaded on CPU-only systems
  • Ollama will now work with models who skip outputting the initial<think> tag (e.g. DeepSeek-V3.1)
  • Fixed issue where text would be emitted when there is no opening <think> tag from a model
  • Fixed issue where tool calls containing { or } would not be parsed correctly

New Contributors

Full Changelog: ollama/ollama@v0.11.6...v0.11.7

v0.11.6

Compare Source

What's Changed

  • Ollama's app will now switch between chats faster
  • Improved layout of messages in Ollama's app
  • Fixed issue where command prompt would show when Ollama's app detected an old version of Ollama running
  • Improved performance when using flash attention
  • Fixed boundary case when encoding text using BPE

Full Changelog: ollama/ollama@v0.11.5...v0.11.6

v0.11.5

Compare Source

What's Changed

  • Performance improvements for the gpt-oss models
  • New memory management: this release of Ollama includes improved memory management for scheduling models on GPUs, leading to better VRAM utilization, model performance and less out of memory errors. These new memory estimations can be enabled with OLLAMA_NEW_ESTIMATES=1 ollama serve and will soon be enabled by default.
  • Improved multi-GPU scheduling and reduced VRAM allocation when using more than 2 GPUs
  • Ollama's new app will now remember default selections for default model, Turbo and Web Search between restarts
  • Fix error when parsing bad harmony tool calls
  • OLLAMA_FLASH_ATTENTION=1 will also enable flash attention for pure-CPU models
  • Fixed OpenAI-compatible API not supporting reasoning_effort
  • Reduced size of installation on Windows and Linux

New Contributors

Full Changelog: <https://github.com/ollama/ollama/compare/v0.11.4.

Note

PR body was truncated to here.


Configuration

📅 Schedule: (UTC)

  • Branch creation
    • ""
  • Automerge
    • At any time (no schedule defined)

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

@renovate
Copy link
Copy Markdown
Contributor Author

renovate Bot commented May 8, 2026

ℹ️ Artifact update notice

File name: go.mod

In order to perform the update(s) described in the table above, Renovate ran the go get command, which resulted in the following additional change(s):

  • 6 additional dependencies were updated
  • The go directive was updated for compatibility reasons

Details:

Package Change
go 1.23.0 -> 1.24.1
golang.org/x/crypto v0.26.0 -> v0.43.0
golang.org/x/exp v0.0.0-20240808152545-0cdaa3abc0fa -> v0.0.0-20250218142911-aa4b98e5adaa
golang.org/x/sync v0.8.0 -> v0.17.0
golang.org/x/net v0.27.0 -> v0.46.0
golang.org/x/sys v0.23.0 -> v0.37.0
golang.org/x/text v0.17.0 -> v0.30.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants