Feature Request: Unload Model from Memory on Agent Switch for Local Providers
Summary
When using local model providers (such as dmr or ollama) with multi-agent configurations, switching between agents that use different models can cause significant memory pressure because the previously loaded model is not evicted from the inference engine's memory. This feature request proposes a mechanism to automatically unload the current model when switching to a different agent that uses a different model.
Problem
Docker Agent supports multi-agent architectures where different agents can be configured with different models. When using local providers like dmr (Docker Model Runner), each model is loaded into GPU/CPU memory when first used. If two agents use different models, both models end up resident in memory simultaneously — even though only one is active at any given time.
On machines with limited VRAM or RAM (common for local inference setups), this leads to:
- Memory pressure: two or more large models competing for the same memory pool.
- Degraded performance: thrashing, swapping, or OOM conditions.
- Unnecessary resource usage: the idle model occupies memory that the active model could use.
DMR already exposes an API endpoint to explicitly unload a model from memory:
This endpoint exists precisely to allow consumers to free memory when a model is no longer needed, but docker-agent currently has no way to trigger it automatically when switching between agents.
Proposed Solution
New provider_opts key: unload_on_switch
Add a boolean provider_opts key that tells docker-agent to call the engine's unload API before loading a different model.
models:
local-large:
provider: dmr
model: ai/qwen3:14B
provider_opts:
unload_on_switch: true # unload this model from memory when switching away
local-small:
provider: dmr
model: ai/qwen3:0.6B
provider_opts:
unload_on_switch: true
When unload_on_switch: true is set on a model, docker-agent will call the engine's unload endpoint before activating a different model on the same provider.
New provider key: unload_api
To make the mechanism generic and reusable with other local-inference engines (e.g. Ollama), expose the unload endpoint as a configurable field on the provider configuration:
providers:
my-local-runner:
provider: dmr
base_url: http://localhost:12434/engines/llama.cpp/v1
unload_api: /engines/unload # POST to {base_url_root}{unload_api} to unload a model
my-ollama:
provider: ollama
base_url: http://localhost:11434/v1
unload_api: /api/delete # Ollama's equivalent endpoint
When unload_api is set on a provider, and the active agent switches to an agent using a different model on the same provider, docker-agent will issue a POST (or DELETE, configurable) request to that endpoint before initializing the new model.
Interaction with keep_alive
DMR already supports keep_alive via provider_opts (values like "0" to unload immediately, "-1" to keep forever). The unload_on_switch option is complementary: keep_alive controls TTL-based eviction, while unload_on_switch triggers an explicit eviction at agent-switch time regardless of the TTL.
Setting keep_alive: "0" currently unloads the model after each request, which is too aggressive for normal use. unload_on_switch would provide a middle ground: keep the model loaded during a session with a given agent, but release it when the runtime moves to a different agent with a different model.
Feature Request: Unload Model from Memory on Agent Switch for Local Providers
Summary
When using local model providers (such as
dmrorollama) with multi-agent configurations, switching between agents that use different models can cause significant memory pressure because the previously loaded model is not evicted from the inference engine's memory. This feature request proposes a mechanism to automatically unload the current model when switching to a different agent that uses a different model.Problem
Docker Agent supports multi-agent architectures where different agents can be configured with different models. When using local providers like
dmr(Docker Model Runner), each model is loaded into GPU/CPU memory when first used. If two agents use different models, both models end up resident in memory simultaneously — even though only one is active at any given time.On machines with limited VRAM or RAM (common for local inference setups), this leads to:
DMR already exposes an API endpoint to explicitly unload a model from memory:
This endpoint exists precisely to allow consumers to free memory when a model is no longer needed, but docker-agent currently has no way to trigger it automatically when switching between agents.
Proposed Solution
New
provider_optskey:unload_on_switchAdd a boolean
provider_optskey that tells docker-agent to call the engine's unload API before loading a different model.When
unload_on_switch: trueis set on a model, docker-agent will call the engine's unload endpoint before activating a different model on the same provider.New
providerkey:unload_apiTo make the mechanism generic and reusable with other local-inference engines (e.g. Ollama), expose the unload endpoint as a configurable field on the provider configuration:
When
unload_apiis set on a provider, and the active agent switches to an agent using a different model on the same provider, docker-agent will issue aPOST(orDELETE, configurable) request to that endpoint before initializing the new model.Interaction with
keep_aliveDMR already supports
keep_aliveviaprovider_opts(values like"0"to unload immediately,"-1"to keep forever). Theunload_on_switchoption is complementary:keep_alivecontrols TTL-based eviction, whileunload_on_switchtriggers an explicit eviction at agent-switch time regardless of the TTL.Setting
keep_alive: "0"currently unloads the model after each request, which is too aggressive for normal use.unload_on_switchwould provide a middle ground: keep the model loaded during a session with a given agent, but release it when the runtime moves to a different agent with a different model.