Unload Model from Memory on Agent Switch for Local Providers

# Feature Request: Unload Model from Memory on Agent Switch for Local Providers

## Summary

When using local model providers (such as `dmr` or `ollama`) with multi-agent configurations, switching between agents that use different models can cause significant memory pressure because the previously loaded model is not evicted from the inference engine's memory. This feature request proposes a mechanism to automatically unload the current model when switching to a different agent that uses a different model.

## Problem

Docker Agent supports multi-agent architectures where different agents can be configured with different models. When using local providers like `dmr` (Docker Model Runner), each model is loaded into GPU/CPU memory when first used. If two agents use different models, both models end up resident in memory simultaneously — even though only one is active at any given time.

On machines with limited VRAM or RAM (common for local inference setups), this leads to:

- **Memory pressure**: two or more large models competing for the same memory pool.
- **Degraded performance**: thrashing, swapping, or OOM conditions.
- **Unnecessary resource usage**: the idle model occupies memory that the active model could use.

DMR already exposes an API endpoint to explicitly unload a model from memory:

```
POST /engines/unload
```

This endpoint exists precisely to allow consumers to free memory when a model is no longer needed, but docker-agent currently has no way to trigger it automatically when switching between agents.

## Proposed Solution

### New `provider_opts` key: `unload_on_switch`

Add a boolean `provider_opts` key that tells docker-agent to call the engine's unload API before loading a different model.

```yaml
models:
  local-large:
    provider: dmr
    model: ai/qwen3:14B
    provider_opts:
      unload_on_switch: true   # unload this model from memory when switching away

  local-small:
    provider: dmr
    model: ai/qwen3:0.6B
    provider_opts:
      unload_on_switch: true
```

When `unload_on_switch: true` is set on a model, docker-agent will call the engine's unload endpoint before activating a different model on the same provider.

### New `provider` key: `unload_api`

To make the mechanism generic and reusable with other local-inference engines (e.g. Ollama), expose the unload endpoint as a configurable field on the provider configuration:

```yaml
providers:
  my-local-runner:
    provider: dmr
    base_url: http://localhost:12434/engines/llama.cpp/v1
    unload_api: /engines/unload   # POST to {base_url_root}{unload_api} to unload a model

  my-ollama:
    provider: ollama
    base_url: http://localhost:11434/v1
    unload_api: /api/delete       # Ollama's equivalent endpoint
```

When `unload_api` is set on a provider, and the active agent switches to an agent using a different model on the same provider, docker-agent will issue a `POST` (or `DELETE`, configurable) request to that endpoint before initializing the new model.

### Interaction with `keep_alive`

DMR already supports `keep_alive` via `provider_opts` (values like `"0"` to unload immediately, `"-1"` to keep forever). The `unload_on_switch` option is complementary: `keep_alive` controls TTL-based eviction, while `unload_on_switch` triggers an explicit eviction at agent-switch time regardless of the TTL.

Setting `keep_alive: "0"` currently unloads the model after *each request*, which is too aggressive for normal use. `unload_on_switch` would provide a middle ground: keep the model loaded during a session with a given agent, but release it when the runtime moves to a different agent with a different model.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unload Model from Memory on Agent Switch for Local Providers #2636

Feature Request: Unload Model from Memory on Agent Switch for Local Providers

Summary

Problem

Proposed Solution

New `provider_opts` key: `unload_on_switch`

New `provider` key: `unload_api`

Interaction with `keep_alive`

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unload Model from Memory on Agent Switch for Local Providers #2636

Description

Feature Request: Unload Model from Memory on Agent Switch for Local Providers

Summary

Problem

Proposed Solution

New provider_opts key: unload_on_switch

New provider key: unload_api

Interaction with keep_alive

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

New `provider_opts` key: `unload_on_switch`

New `provider` key: `unload_api`

Interaction with `keep_alive`