Skip to content

Commit f433d44

Browse files
committed
[features] update healthcheck guide
1 parent 945a7b6 commit f433d44

File tree

1 file changed

+149
-106
lines changed

1 file changed

+149
-106
lines changed

docs/features/health.md

Lines changed: 149 additions & 106 deletions
Original file line numberDiff line numberDiff line change
@@ -1,127 +1,170 @@
1-
# Healthcheck
1+
# Healthcheck 💔
22

3-
The app provides a healthcheck endpoint at `GET /health` that you can use to check the status of the app. The endpoint returns a JSON response. The response of the healthcheck differs for Local and Cloud/Private modes. Additionally, you can receive health information via the `system:ping` webhook.
4-
5-
In all modes, a response code of `200` indicates normal app operation.
6-
7-
The health information includes:
3+
The SMSGate provides healthcheck endpoints for monitoring the health of the app. The health information includes:
84

95
* **releaseId**: A unique identifier for the app release.
106
* **version**: The app version.
117
* **status**: The overall app status.
128
* **pass**: The app is running normally.
139
* **warn**: The app is running with some issues.
1410
* **fail**: The app is not running normally.
15-
* **checks**: A list of health checks performed by the app.
11+
* **checks**: A list of health checks performed by the app, depends on the server mode.
1612
* **description**: A description of the health check.
1713
* **observedUnit**: The unit of the observed value.
1814
* **observedValue**: The value observed by the health check.
1915
* **status**: The status of the health check.
2016

21-
## Local Mode
17+
!!! note "📈 Status Calculation"
18+
The overall health status is calculated as follows:
2219

23-
In Local mode, the healthcheck endpoint provides information about the device and the application.
20+
- **Default Status**: `pass`
21+
- **Status Levels**:
22+
- `pass` (0) - All checks passing
23+
- `warn` (1) - At least one warning
24+
- `fail` (2) - At least one failure
25+
- **Overall Status**: Determined by highest severity level across all checks
2426

25-
Example response:
27+
## Server Health ☁️
2628

27-
```json
28-
{
29-
"checks": {
30-
"messages:failed": {
31-
"description": "Failed messages for last hour",
32-
"observedUnit": "messages",
33-
"observedValue": 0,
34-
"status": "pass"
35-
},
36-
"connection:status": {
37-
"description": "Internet connection status",
38-
"observedUnit": "boolean",
39-
"observedValue": 1,
40-
"status": "pass"
41-
},
42-
"connection:transport": {
43-
"description": "Network transport type",
44-
"observedUnit": "flags",
45-
"observedValue": 4,
46-
"status": "pass"
47-
},
48-
"connection:cellular": {
49-
"description": "Cellular network type",
50-
"observedUnit": "index",
51-
"observedValue": 0,
52-
"status": "pass"
53-
},
54-
"battery:level": {
55-
"description": "Battery level in percent",
56-
"observedUnit": "percent",
57-
"observedValue": 94,
58-
"status": "pass"
59-
},
60-
"battery:charging": {
61-
"description": "Is the phone charging?",
62-
"observedUnit": "flags",
63-
"observedValue": 4,
64-
"status": "pass"
65-
}
66-
},
67-
"releaseId": 1,
68-
"status": "pass",
69-
"version": "1.0.0"
70-
}
71-
```
72-
73-
### Available health checks
74-
75-
* **messages:failed**: The number of failed messages for the last hour. `warn` when there is at least one failed message and `fail` when all messages during the last hour have failed.
76-
* **connection:status**: The status of the internet connection. `fail` when the Internet connection is not available.
77-
* **connection:transport**: The transport type of the network connection. When the device is connected to multiple networks, only a single value is provided:
78-
* 0: None
79-
* 1: Unknown
80-
* 2: Cellular
81-
* 4: WiFi
82-
* 8: Ethernet
83-
* **connection:cellular**: The cellular network type. Available only if `connection:transport` has flag `2: Cellular`, otherwise `0: None`.
84-
* 0: None
85-
* 1: Unknown
86-
* 2: Mobile2G
87-
* 3: Mobile3G
88-
* 4: Mobile4G
89-
* 5: Mobile5G
90-
* **battery:level**: The battery level in percent. `warn` when less than 25% and `fail` when less than 10%.
91-
* **battery:charging**: The status of charging as bit flags. For example, if the device is charging via USB, the value will be `1 + 4 = 5`.
92-
* 0: Not charging
93-
* 1: Charging
94-
* 2: AC charger connected
95-
* 4: USB charger connected
96-
97-
## Cloud Mode
98-
99-
The health endpoint in cloud mode provides information about the server, not devices.
100-
101-
If you need to receive device health information in Cloud or Private modes, you can use the `system:ping` webhook.
102-
103-
Example response:
104-
105-
```json
106-
{
107-
"status": "pass",
108-
"version": "v1.17.0",
109-
"releaseId": 932,
110-
"checks": {
111-
"db:ping": {
112-
"description": "Failed sequential pings count",
113-
"observedValue": 0,
114-
"status": "pass"
115-
}
116-
}
117-
}
118-
```
29+
=== "📱 Local Server Mode"
11930

120-
The only provided health check is `db:ping`. It checks the database connectivity and counts failed sequential pings.
31+
In Local mode, the healthcheck endpoint provides information about the device and the application.
12132

122-
## Webhooks
33+
Example response:
12334

124-
In any mode, you can utilize the `system:ping` webhook to receive health information about the devices. The webhook payload will be the same as the device's healthcheck response.
35+
```json
36+
{
37+
"checks": {
38+
"messages:failed": {
39+
"description": "Failed messages for last hour",
40+
"observedUnit": "messages",
41+
"observedValue": 0,
42+
"status": "pass"
43+
},
44+
"connection:status": {
45+
"description": "Internet connection status",
46+
"observedUnit": "boolean",
47+
"observedValue": 1,
48+
"status": "pass"
49+
},
50+
"connection:transport": {
51+
"description": "Network transport type",
52+
"observedUnit": "flags",
53+
"observedValue": 4,
54+
"status": "pass"
55+
},
56+
"connection:cellular": {
57+
"description": "Cellular network type",
58+
"observedUnit": "index",
59+
"observedValue": 0,
60+
"status": "pass"
61+
},
62+
"battery:level": {
63+
"description": "Battery level in percent",
64+
"observedUnit": "percent",
65+
"observedValue": 94,
66+
"status": "pass"
67+
},
68+
"battery:charging": {
69+
"description": "Is the phone charging?",
70+
"observedUnit": "flags",
71+
"observedValue": 4,
72+
"status": "pass"
73+
}
74+
},
75+
"releaseId": 1,
76+
"status": "pass",
77+
"version": "1.0.0"
78+
}
79+
```
80+
81+
Available health checks:
82+
83+
- **messages:failed**: The number of failed messages for the last hour. `warn` when there is at least one failed message and `fail` when all messages during the last hour have failed.
84+
- **connection:status**: The status of the internet connection. `fail` when the Internet connection is not available.
85+
- **connection:transport**: The transport type of the network connection. When the device is connected to multiple networks, only a single value is provided:
86+
* 0: None
87+
* 1: Unknown
88+
* 2: Cellular
89+
* 4: WiFi
90+
* 8: Ethernet
91+
- **connection:cellular**: The cellular network type. Available only if `connection:transport` has flag `2: Cellular`, otherwise `0: None`.
92+
* 0: None
93+
* 1: Unknown
94+
* 2: Mobile2G
95+
* 3: Mobile3G
96+
* 4: Mobile4G
97+
* 5: Mobile5G
98+
- **battery:level**: The battery level in percent. `warn` when less than 25% and `fail` when less than 10%.
99+
- **battery:charging**: The status of charging as bit flags. For example, if the device is charging via USB, the value will be `1 + 4 = 5`.
100+
* 0: Not charging
101+
* 1: Charging
102+
* 2: AC charger connected
103+
* 4: USB charger connected
104+
105+
=== "⚙️ Cloud and Private Server Modes"
106+
107+
The SMSGate server provides Kubernetes-compatible health check endpoints for monitoring service health. The system implements three dedicated endpoints following Kubernetes best practices, along with a legacy endpoint for backward compatibility with existing clients.
108+
109+
For **Kubernetes deployments**, use the following endpoints:
110+
111+
- **🔄 Liveness Probe**: `GET /health/live`
112+
- **Purpose**: Determine if the application is running correctly
113+
- **Response Format**:
114+
```json
115+
{
116+
"status": "pass",
117+
"version": "1.33.0",
118+
"releaseId": "1234",
119+
"checks": {
120+
"system:goroutines": {
121+
"description": "Number of goroutines",
122+
"observedUnit": "goroutines",
123+
"observedValue": 15,
124+
"status": "pass"
125+
},
126+
"system:memory": {
127+
"description": "Memory usage",
128+
"observedUnit": "MiB",
129+
"observedValue": 45,
130+
"status": "pass"
131+
}
132+
}
133+
}
134+
```
135+
- **🚦 Readiness Probe**: `GET /health/ready`
136+
- **Purpose**: Determine if the application is ready to accept traffic
137+
- **Response Format**:
138+
```json
139+
{
140+
"status": "pass",
141+
"version": "1.33.0",
142+
"releaseId": "1234",
143+
"checks": {
144+
"db:ping": {
145+
"description": "Database ping",
146+
"observedUnit": "failed pings",
147+
"observedValue": 0,
148+
"status": "pass"
149+
}
150+
}
151+
}
152+
```
153+
- **🚀 Startup Probe**: `GET /health/startup`
154+
- **Purpose**: Determine if the application has completed its startup sequence
155+
- **Response Format**: Same as readiness probe
156+
157+
!!! important "Migration Note"
158+
Additionally, a legacy endpoint (`GET /health`) is maintained for backward compatibility with existing clients. This endpoint uses the same logic as the readiness probe.
159+
160+
!!! tip
161+
Use the `system:ping` webhook to monitor device health, as the server probes only monitor the application server itself.
162+
163+
## Device Health 📱
164+
165+
The `system:ping` webhook provides device health information across all deployment modes.
166+
167+
In Cloud/Private server deployments, the webhook provides device-level health information while the server probes monitor the application server itself. This separation ensures administrators can monitor both the application infrastructure and the connected devices.
125168

126169
Example payload:
127170

@@ -181,6 +224,6 @@ Example payload:
181224

182225
The webhook allows you to monitor the health of your devices in real-time, providing valuable information about message delivery, connectivity, and battery status.
183226

184-
## Links
227+
## See Also 📚
185228

186229
* [Webhooks Guide](./webhooks.md)

0 commit comments

Comments
 (0)