Skip to content

Commit 0cb7fb1

Browse files
STAC-22541: Derived state monitors
1 parent fee4282 commit 0cb7fb1

3 files changed

Lines changed: 49 additions & 13 deletions

File tree

SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@
3232
* [Troubleshooting](use/alerting/notifications/troubleshooting.md)
3333
* [Customize](dynamic/customize-alerting.md)
3434
* [Add a monitor using the CLI](use/alerting/k8s-add-monitors-cli.md)
35+
* [Derived State monitor](use/alerting/k8s-derived-state-monitors.md)
3536
* [Override monitor arguments](use/alerting/k8s-override-monitor-arguments.md)
3637
* [Write a remediation guide](use/alerting/k8s-write-remediation-guide.md)
3738

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
---
2+
description: SUSE Observability
3+
---
4+
5+
# Derived State Monitors
6+
7+
## Overview
8+
9+
When in you Observability use case you have a set of logical components (bussiness components) that do not have any monitors on themselves but that get impacted when any technical dependencies have issues,
10+
then you can propagate such health state to them using the `derived-state-monitor` function.
11+
This function allows you to based on a starting group of components selected by `componentType` to discover and inherit the most critical health state of them at the top most layer,
12+
the discovery process focuses only on components that do have health state applied by observations (based on metrics for example) and excludes any components that their health state is derived (although keeps traversing across them), for example when you have a logical component depending on other logical component.
13+
14+
## Derived Health State Monitor example
15+
16+
A Monitor implemented using the `derived-state-monitor` function looks like, this can be implement using the guide at [Add a threshold monitor to components using the CLI](/use/alerting/k8s-add-monitors-cli.md):
17+
18+
```
19+
- _type: "Monitor"
20+
name: "Aggregated health state of a Deployment, StatefulSet, ReplicaSet and DaemonSet"
21+
tags:
22+
- deployments
23+
- replicasets
24+
- statefulsets
25+
- daemonsets
26+
- derived
27+
- propagated
28+
identifier: "urn:custom:monitor:..."
29+
status: "DISABLED"
30+
description: "Description"
31+
function: {{ get "urn:stackpack:common:monitor-function:derived-state-monitor" }}
32+
arguments:
33+
componentTypes: "deployment, replicaset, statefulset, daemonset"
34+
intervalSeconds: 30
35+
remediationHint: "Investigate component [{{ causeName }}](/#/components/{{ causeComponentUrnForUrl }}) as is causing the workload to be unhealthy."
36+
```
37+
* The function has a single argument `componentTypes` where you can express the different component types as a single string of "," separated values
38+
* The function offers two values to use in the remediation guide, `causeName` being the component name where the state is propagated from and its `causeComponentUrnForUrl` to be able to create a link

use/alerting/kubernetes-monitors.md

Lines changed: 10 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -144,22 +144,19 @@ Cluster doesn't have any health itself. But a cluster is build from few componen
144144
- all nodes
145145
and then takes the most critical health state.
146146

147-
### Aggregated health state of a DaemonSet
147+
### Derived Workloads health state (Deployment, DaemonSet, ReplicaSet, StatefulSet)
148148

149-
The monitor aggregates states of all children Pods and then returns the most critical health state.
149+
The monitor aggregates states of all children Pod and then returns the most critical health state.
150+
It uses a monitor function based on traversing component dependencies to find the most critical health state that is based on observations and not on any other derived states.
151+
For example the health state of a Deployment will be derived from Pods as those health states are based on actual data and will not take into account the derived state that
152+
a replicaset it depends on might have. Such monitor function helps propagate the health state from technical components into logical components, and it will only apply a derived state if the component does not have already a health state based on observations.
153+
Therefor is you want to activate this monitor then the `Deployment desired replicas`, `Daemonset desired replicas`, `Replicaset desired replicas` and `Statefulset desired replicas` should be disabled.
154+
If in your monitoring case you have a use case where you want to get health states to logical components that do not have any monitors on themselves you can implement a monitor such as
155+
```
156+
157+
```
150158

151-
### Aggregated health state of a Deployment
152159

153-
The monitor aggregates states of all children ReplicaSets and then returns the most critical health state. ReplicaSets have
154-
the similar Monitor, so eventually this one aggregates health states of all children ReplicaSets and Pods.
155-
156-
### Aggregated health state of a ReplicaSet
157-
158-
The monitor aggregates states of all children Pods and then returns the most critical health state.
159-
160-
### Aggregated health state of a StatefulSet
161-
162-
The monitor aggregates states of all children Pods and then returns the most critical health state.
163160

164161
## See also
165162

0 commit comments

Comments
 (0)