diff --git a/docs/04-For Operators/05-monitoring.md b/docs/04-For Operators/05-monitoring.md index ca456ec..0c990da 100644 --- a/docs/04-For Operators/05-monitoring.md +++ b/docs/04-For Operators/05-monitoring.md @@ -13,7 +13,7 @@ sidebar_position: 5 ## Logging Logs are being collected by -[Promtail](https://grafana.com/docs/loki/latest/send-data/promtail/) and pushed +[Grafana Alloy](https://grafana.com/docs/alloy/latest/) and pushed to a [Loki](https://grafana.com/docs/loki/latest/) instance running in the control plane. Loki is deployed in [monolithic mode](https://grafana.com/docs/loki/latest/setup/install/helm/install-monolithic/) @@ -22,11 +22,43 @@ configuration parameters for the control plane in the control plane's [logging](https://github.com/metal-stack/metal-roles/blob/master/control-plane/roles/logging/README.md) role. -In the partitions, Promtail is deployed inside a systemd-managed Docker -container. Configuration parameters can be found in the partition's -[promtail](https://github.com/metal-stack/metal-roles/blob/master/partition/roles/promtail/README.md) -role. Which hosts Promtail collects from can be configured via the -`prometheus_promtail_targets` variable. +In the partitions, Alloy can be deployed inside a systemd-managed Docker +container on management servers and switches. Configuration parameters can be found in the partition's +[alloy](https://github.com/metal-stack/metal-roles/blob/master/partition/roles/alloy/README.md) +role. + +### Partition Log Sources + +Alloy is configured through snippets that define what logs are collected. The following snippets are typically used: + +| Host type | Snippet | Description | Key labels | +| ---------------------- | -------------- | ------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------- | +| Leaves, spines, exits | `journal` | Collects logs from the systemd journal; auto-discovers both volatile (`/run/log/journal`) and persistent (`/var/log/journal`) storage | `job=systemd-journal`, `unit`, `level` | +| Management servers | `journal-file` | Collects logs from the persistent systemd journal at a configurable path; supports migrating cursor position from promtail | `job=systemd-journal`, `unit`, `level` | +| Hosts without journald | `syslog` | Tails `/var/log/syslog` | `job=syslog` | +| Hosts running Docker | `docker` | Collects logs from all Docker containers via the Docker socket | `job=docker`, `container` | + +All log entries carry the `host` and `partition` labels regardless of snippet, which makes it easy to filter logs in Grafana Explore by host or partition. + +### Querying Logs in Grafana + +Logs can be explored in Grafana using the **Explore** view with the Loki data source. Useful label filters: + +- `{partition=""}` — all logs from a partition +- `{host=""}` — all logs from a specific host +- `{job="docker", container=""}` — logs from a specific Docker container +- `{job="systemd-journal", unit=".service"}` — logs from a specific systemd unit +- `{job="systemd-journal", level="error"}` — error-level journal entries across all units + +:::note Migrating from promtail + +The `promtail` role is deprecated and replaced by the `alloy` role. Refer to the +[Migration from promtail](https://github.com/metal-stack/metal-roles/blob/master/partition/roles/alloy/README.md#migration-from-promtail) +section of the partition alloy role's README and the +[Migration from promtail](https://github.com/metal-stack/metal-roles/blob/master/control-plane/roles/logging/README.md#migration-from-promtail) +section of the control-plane logging role's README for step-by-step instructions. + +::: ## Monitoring diff --git a/docs/04-For Operators/monitoring-stack.svg b/docs/04-For Operators/monitoring-stack.svg index 9ece989..b661a9f 100644 --- a/docs/04-For Operators/monitoring-stack.svg +++ b/docs/04-For Operators/monitoring-stack.svg @@ -1 +1 @@ -
Management Servers
Management Servers
Promtail
Promtail
Prometheus
Prometheus
node_exporter
node_exporter
ipmi_exporter
ipmi_exporter
blackbox_exporter
blackbox_exporter
Exporters
Exporters
Switches
Switches
Promtail
Promtail
Exporters
Exporters
node_exporter
node_exporter
sonic_exporter
sonic_exporter
blackbox_exporter
blackbox_exporter
Machines
Machines
BMC
BMC
Metal Partition
Metal Partition
GCS
GCS
shoot-states
shoot-states
shoot-details
shoot-details
shoot-customizations
shoot-customizations
shoot-cluster
shoot-cluster
gardener-overview
gardener-overview
alertmanager
alertmanager
sonic-exporter
sonic-exporter
rethinkdb
rethinkdb
metal-api
metal-api
machine-capacity
machine-capacity
Gardener Dashboards
Gardener Dashboards
Grafana Dashboards
Grafana Dashboards
Metal Control Plane
Metal Control Plane
Promtail
Promtail
filesystem
filesystem
Loki
Loki
Exporters
Exporters
gardener-metrics-exporter
gardener-metrics-exporter
metal-metrics-exporter
metal-metrics-exporter
event-exporter
event-exporter
rethinkdb-exporter
rethinkdb-exporter
ServiceMonitors
ServiceMonitors
gardener-metrics-exporter
gardener-metrics-exporter
ipam-db
ipam-db
masterdata-api
masterdata-api
masterdata-db
masterdata-db
metal-db
metal-db
rethinkdb-exporter
rethinkdb-exporter
metal-metrics-exporter
metal-metrics-exporter
metal-api
metal-api
prometheus-operator
prometheus-operator
kube-prometheus
kube-prometheus
node_exporter
node_exporter
blackbox_exporter
blackbox_exporter
prometheus-adapter
prometheus-adapter
Grafana
Grafana
kube-state-metrics
kube-state-metrics
Prometheus
Prometheus
alertmanager
alertmanager
Thanos
Thanos
Text is not SVG - cannot display
\ No newline at end of file +
Management Servers
Management Servers
Alloy
Alloy
Prometheus
Prometheus
node_exporter
node_exporter
ipmi_exporter
ipmi_exporter
blackbox_exporter
blackbox_exporter
Exporters
Exporters
Switches
Switches
Alloy
Alloy
Exporters
Exporters
node_exporter
node_exporter
sonic_exporter
sonic_exporter
blackbox_exporter
blackbox_exporter
Machines
Machines
BMC
BMC
Metal Partition
Metal Partition
GCS
GCS
shoot-states
shoot-states
shoot-details
shoot-details
shoot-customizations
shoot-customizations
shoot-cluster
shoot-cluster
gardener-overview
gardener-overview
alertmanager
alertmanager
sonic-exporter
sonic-exporter
rethinkdb
rethinkdb
metal-api
metal-api
machine-capacity
machine-capacity
Gardener Dashboards
Gardener Dashboards
Grafana Dashboards
Grafana Dashboards
Metal Control Plane
Metal Control Plane
Alloy
Alloy
filesystem
filesystem
Loki
Loki
Exporters
Exporters
gardener-metrics-exporter
gardener-metrics-exporter
metal-metrics-exporter
metal-metrics-exporter
event-exporter
event-exporter
rethinkdb-exporter
rethinkdb-exporter
ServiceMonitors
ServiceMonitors
gardener-metrics-exporter
gardener-metrics-exporter
ipam-db
ipam-db
masterdata-api
masterdata-api
masterdata-db
masterdata-db
metal-db
metal-db
rethinkdb-exporter
rethinkdb-exporter
metal-metrics-exporter
metal-metrics-exporter
metal-api
metal-api
prometheus-operator
prometheus-operator
kube-prometheus
kube-prometheus
node_exporter
node_exporter
blackbox_exporter
blackbox_exporter
prometheus-adapter
prometheus-adapter
Grafana
Grafana
kube-state-metrics
kube-state-metrics
Prometheus
Prometheus
alertmanager
alertmanager
Thanos
Thanos
Text is not SVG - cannot display