Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 80 additions & 0 deletions content/manuals/engine/daemon/prometheus.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,86 @@ traffic caused by the container you just ran.

![Prometheus report showing traffic](images/prometheus-graph_load.webp)

## Available metrics

Docker exposes metrics in Prometheus format. This section describes the available metrics and their meaning.

> [!WARNING]
>
> The available metrics and the names of those metrics are in active
> development and may change at any time.

### Metric types

Docker metrics use the following Prometheus metric types:

- **Counter**: A cumulative metric that only increases (or resets to zero on restart). Use counters for values like total number of events or requests.
- **Gauge**: A metric that can go up or down. Use gauges for values like current memory usage or number of running containers.
- **Histogram**: A metric that samples observations and counts them in configurable buckets. Histograms expose multiple time series:
- `<basename>_bucket{le="<upper_bound>"}`: Cumulative counters for observation buckets
- `<basename>_sum`: Total sum of all observed values
- `<basename>_count`: Count of events that have been observed

For histogram metrics, you can calculate averages, percentiles, and rates. For example, to calculate the average duration: `rate(<basename>_sum[5m]) / rate(<basename>_count[5m])`.

### Engine metrics

These metrics provide information about the Docker Engine's operation and resource usage.

| Metric | Type | Description |
| ------------------------------------------- | --------- | ---------------------------------------------------------------------------------------------------------------------------- |
| `engine_daemon_container_actions_seconds` | Histogram | Time taken to process container operations (start, stop, create, etc.). Labels indicate the action type. |
| `engine_daemon_container_states_containers` | Gauge | Number of containers currently in each state (running, paused, stopped). Labels indicate the state. |
| `engine_daemon_engine_cpus_cpus` | Gauge | Number of CPUs available on the host system. |
| `engine_daemon_engine_info` | Gauge | Static information about the Docker Engine. Always set to 1. Labels provide version, architecture, and other engine details. |
| `engine_daemon_engine_memory_bytes` | Gauge | Total memory available on the host system in bytes. |
| `engine_daemon_events_subscribers_total` | Gauge | Number of current subscribers to Docker events. |
| `engine_daemon_events_total` | Counter | Total number of events processed by the daemon. Labels indicate the event action and type. |
| `engine_daemon_health_checks_failed_total` | Counter | Total number of health checks that have failed. |
| `engine_daemon_health_checks_total` | Counter | Total number of health checks performed. |
| `engine_daemon_host_info_functions_seconds` | Histogram | Time taken to gather host information. |
| `engine_daemon_network_actions_seconds` | Histogram | Time taken to process network operations (create, connect, disconnect, etc.). Labels indicate the action type. |

### Swarm metrics

These metrics are only available when the Docker Engine is running in Swarm mode.

| Metric | Type | Description |
| ------------------------------------------------ | --------- | ----------------------------------------------------------------------------------------------- |
| `swarm_dispatcher_scheduling_delay_seconds` | Histogram | Time from task creation to scheduling decision. Measures scheduler performance. |
| `swarm_manager_configs_total` | Gauge | Total number of configs in the swarm cluster. |
| `swarm_manager_leader` | Gauge | Indicates if this node is the swarm manager leader (1) or not (0). |
| `swarm_manager_networks_total` | Gauge | Total number of networks in the swarm cluster. |
| `swarm_manager_nodes` | Gauge | Number of nodes in the swarm cluster. Labels indicate node state (ready, down, etc.). |
| `swarm_manager_secrets_total` | Gauge | Total number of secrets in the swarm cluster. |
| `swarm_manager_services_total` | Gauge | Total number of services in the swarm cluster. |
| `swarm_manager_tasks_total` | Gauge | Total number of tasks in the swarm cluster. Labels indicate task state (running, failed, etc.). |
| `swarm_node_manager` | Gauge | Indicates if this node is a swarm manager (1) or worker (0). |
| `swarm_raft_snapshot_latency_seconds` | Histogram | Time taken to create and restore Raft snapshots. |
| `swarm_raft_transaction_latency_seconds` | Histogram | Time taken to commit Raft transactions. Measures consensus performance. |
| `swarm_store_batch_latency_seconds` | Histogram | Time taken for batch operations in the swarm store. |
| `swarm_store_lookup_latency_seconds` | Histogram | Time taken for lookup operations in the swarm store. |
| `swarm_store_memory_store_lock_duration_seconds` | Histogram | Duration of lock acquisitions in the memory store. |
| `swarm_store_read_tx_latency_seconds` | Histogram | Time taken for read transactions in the swarm store. |
| `swarm_store_write_tx_latency_seconds` | Histogram | Time taken for write transactions in the swarm store. |

### Using histogram metrics

For histogram metrics (those with `_seconds` in the name), Prometheus creates three time series:

- `<metric_name>_bucket`: Cumulative counters for each configured bucket
- `<metric_name>_sum`: Total sum of all observed values
- `<metric_name>_count`: Total count of observations

For example, `engine_daemon_container_actions_seconds` produces:

- `engine_daemon_container_actions_seconds_bucket{action="start",le="0.005"}`: Count of start actions taking ≤5ms
- `engine_daemon_container_actions_seconds_bucket{action="start",le="0.01"}`: Count of start actions taking ≤10ms
- `engine_daemon_container_actions_seconds_sum{action="start"}`: Total time spent on start actions
- `engine_daemon_container_actions_seconds_count{action="start"}`: Total number of start actions

Use these to calculate percentiles, averages, and rates in your Prometheus queries.

## Next steps

The example provided here shows how to run Prometheus as a container on your
Expand Down