Skip to content

Commit 39dd756

Browse files
ysyneuclaude
andauthored
docs(on-call): fix 37 documentation drift findings from audit (#49)
* docs(on-call): fix documentation drift from audit findings Applies 37 findings from the on-call module audit (6 high, 21 medium, 10 low) against current source code in fc-event / fc-oncall / fc-foundation-app / flashduty-app. Both zh/ and en/ pages updated in lockstep. Key fixes: - Incident detail now documents all 7 tabs (previously listed 5) - Escalation rule: delay window (aggr_window) and notification template fields - Channel creation wizard: 3-step flow (was 1-step) - On-call schedule: override rules as first-class type, layer effective window, role model, day-mask defaults, personal-preference prerequisites - Alert management: stale timeline tab / close button removed; lifecycle record types, field dimensions, filter/view settings documented - Noise reduction: inhibit source corrected (active alert, not incident); quick silence default 1h; aggregation limits - Reference variables: full AlertEvent template attribute list - Filter conditions: operators and built-in attribute keys enumerated - Custom fields: field_name regex and length constraints - Personal settings: private-deployment channel gating note Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(on-call): remove internal-only details from review feedback Addresses review feedback on PR #49: - Trim alert lifecycle record table to only actively-emitted feed types (a_new, a_update, a_comm, a_merge, a_m_silence, a_m_inhibit); drop deprecated or never-emitted types (a_ack/a_unack, a_snooze/a_wake, a_trans, a_close, a_m_flapping) - Remove intelligent grouping similarity threshold (i_score_threshold) from noise-reduction and outlier-incidents; not exposed in the web UI - Remove internal `frequent` outlier classification (not surfaced in UI) - Drop browser localStorage storage-key mentions from alert-management; implementation detail that users cannot act on Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 10e25df commit 39dd756

26 files changed

Lines changed: 725 additions & 135 deletions

en/on-call/advanced/reference-variables.mdx

Lines changed: 18 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -57,16 +57,30 @@ Use `[TPL]` as prefix and `{{}}` to reference variables (can reference both labe
5757
| `[TPL]{{.EventSeverity}} / Host Down` | `{"EventSeverity": "Warning"}` | Warning / Host Down |
5858

5959
<Note>
60-
`${}` syntax and `{{}}` syntax behave differently when a label is missing: `${}` returns `<no value>`, while `{{}}` returns an empty string.
60+
`${}` syntax and `{{}}` syntax have two key differences:
61+
- **Data scope**: `${name}` reads **only** from `Labels` and cannot reference attribute fields; `{{}}` uses the whole `*AlertEvent` as the data source, so you can access every exported attribute as well as `Labels`.
62+
- **Missing-value behavior**: when a label is missing, `${}` returns `<no value>` while `{{}}` returns an empty string.
6163
</Note>
6264

6365
#### Supported Attributes
6466

67+
When you use `{{}}` syntax, the template data object is the alert event (`AlertEvent`) itself. The main fields you can reference are listed below:
68+
6569
| Field | Type | Description |
6670
| --- | --- | --- |
67-
| Title | string | Title |
68-
| Description | string | Description |
69-
| EventSeverity | string | Severity |
71+
| `Title` | string | Alert title |
72+
| `Description` | string | Alert description |
73+
| `EventSeverity` | string | Severity (Critical / Warning / Info) |
74+
| `EventStatus` | string | Event status (Critical / Warning / Info / Ok, where Ok indicates recovery) |
75+
| `AlertKey` | string | The alert's unique key, used to merge events from the same series into a single alert |
76+
| `TitleRule` | string | The title-generation rule supplied at report time |
77+
| `IntegrationType` | string | Integration type (such as `prometheus` or `zabbix.v5`) |
78+
| `IntegrationName` | string | Integration name |
79+
| `Labels` | map | Label key-value set; access a specific label with `{{.Labels.xxx}}` |
80+
81+
<Tip>
82+
Besides the title, **the alert description and any label value also support reference variables**: in an alert pipeline, when you use the `Reset Description` (resetDescription) or `Reset Labels` (resetLabels) action, any content prefixed with `[TPL]` goes through the same variable substitution.
83+
</Tip>
7084

7185
## FAQ
7286

en/on-call/channel/create-edit.mdx

Lines changed: 60 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -33,10 +33,11 @@ Proper planning can significantly improve operational efficiency.
3333

3434
## Creating a Channel
3535

36+
Go to **Channels****Create Channel**. The wizard has three steps, and **steps 2 and 3 are skippable** — you can fill them in later from the channel details page.
37+
38+
### Step 1: Channel information
39+
3640
<Steps>
37-
<Step title="Enter Creation Page">
38-
Go to **Channels****Create Channel**
39-
</Step>
4041
<Step title="Fill in Basic Information">
4142
Enter **Channel Name**, preferably named after business type or team
4243
</Step>
@@ -64,15 +65,42 @@ Proper planning can significantly improve operational efficiency.
6465
<Step title="Enable External Reporting (Optional)">
6566
When enabled, external personnel can submit incident tickets via a shareable external reporting link without logging in. The system generates a shareable link that you can distribute to external personnel. Disabling external reporting immediately invalidates all previously shared links; re-enabling generates a new link.
6667
</Step>
67-
<Step title="Complete Creation">
68-
Click **Next** to complete creation
69-
</Step>
7068
</Steps>
7169

70+
### Step 2: Configure an escalation rule (skippable)
71+
72+
Set up a default escalation rule for the new channel to decide who gets notified and how when an incident fires.
73+
74+
- **Notification recipients**: choose from a [schedule](/en/on-call/configuration/schedule), a team, or an individual
75+
- **Notification template**: required — select an enabled [notification template](/en/on-call/configuration/templates)
76+
- **Delay window**: 0 – 3600 seconds, default 0; no notification is sent if the incident is auto- or manually closed during the delay window
77+
- **IM group chat channel** (optional): simultaneously push to Feishu/Lark, Dingtalk, WeCom, Slack, Microsoft Teams, and similar group chats
78+
79+
If you click **Skip**, you can add it later under **Channel Details → Escalation Rules**. See [Configure Escalation Rules](/en/on-call/channel/escalation-rule) for details.
80+
81+
### Step 3: Ingest alert events (skippable)
82+
83+
Select one or more alert integrations (such as Zabbix, Prometheus, Alibaba Cloud Monitor) to attach to the current channel, and the system generates the corresponding Webhook addresses.
84+
85+
If you click **Skip**, you can add them later under **Channel Details → Configuration → Integrate Data → Dedicated Integrations**. See [Integrate Alerts](/en/on-call/channel/integrate-data) for details.
86+
7287
<Note>
73-
You can skip [escalation rules](/en/on-call/channel/escalation-rule) or [integration configuration](/en/on-call/channel/integrate-data) during creation and manage them later in the channel.
88+
If you skip steps 2 and 3, only the channel itself is created; you can add or adjust escalation rules and integrations at any time from the channel details page.
7489
</Note>
7590

91+
## Channel Overview
92+
93+
On the channel details page, the **Statistics** module at the top shows 4 cards by default, based on **the last 7 days** with comparison to the previous period:
94+
95+
| Card | Meaning |
96+
| :--- | :--- |
97+
| **MTTA** | Mean time to acknowledge — average elapsed time from incident trigger to acknowledgment |
98+
| **MTTR** | Mean time to resolve — average elapsed time from incident trigger to close |
99+
| **Incidents** | Total number of incidents triggered in the last 7 days |
100+
| **Alert Groups** | Number of alert groups merged into the same incident via alert grouping in the last 7 days |
101+
102+
Statistics cards can be collapsed. Open the **Metric Analysis** page for richer trends and dimension drill-downs.
103+
76104
## Configuring Core Capabilities
77105

78106
After creating a channel, go to the **Configuration** tab on the details page to complete the following configurations. The Configuration tab uses a sidebar menu organized into functional groups:
@@ -196,6 +224,16 @@ Each channel can be associated with up to 3 links. Create links through the **In
196224

197225
- Click the **Star** on channel cards to bookmark frequently used channels
198226
- Quickly locate target channels through **Team Filter** or **My Favorites**
227+
- Use the **Sort by** dropdown menu to order the list by the following fields:
228+
229+
| Sort key | Description |
230+
| :--- | :--- |
231+
| **Creation time** | Sort by channel creation time |
232+
| **Latest incident time** | Sort by the most recent incident time within the channel |
233+
| **Custom order** | Use the order you set by drag-and-drop below (applies globally) |
234+
| **Update time** | Sort by the most recent channel configuration update time |
235+
| **Channel name** | Sort alphabetically by name |
236+
199237
- Use the **Sort** function in the upper right corner to enter sort mode, drag and drop channel cards to adjust display order; the sort order takes effect for all users
200238

201239
### Changing Configuration
@@ -209,10 +247,17 @@ Go to Channel Details, then navigate to **Configuration** → **Settings** to mo
209247
- Access level (public or private)
210248

211249
**Advanced Settings** (Configuration → Settings → Advanced Settings):
212-
- Close with alerts toggle
213-
- Auto-resolve timeout (toggle, window timing start, timeout duration)
214-
- Outlier incident detection
215-
- External reporting (when enabled, generates a shareable link for external personnel to submit incident tickets without logging in)
250+
251+
| Setting | Description | Plan availability |
252+
| :--- | :--- | :--- |
253+
| Auto-resolve timeout | Toggle, window timing start, timeout duration | All plans |
254+
| Close with alerts | Automatically close the incident once all associated alerts recover | Contact the Flashduty team to enable on request |
255+
| Outlier incident detection | Identify and flag unusual new incidents | Professional plan and above |
256+
| External reporting | When enabled, generates a shareable link for external personnel to submit incident tickets without logging in | Professional plan and above |
257+
258+
<Note>
259+
On the free and standard plans, the **Outlier incident detection** and **External reporting** toggles are hidden; the **Close with alerts** toggle is hidden by default — contact the Flashduty team to enable it.
260+
</Note>
216261

217262
### Disabling and Deleting
218263

@@ -250,6 +295,10 @@ See [Search and View Incidents](/en/on-call/incident/search-view-incident) for d
250295
<Accordion title="Is incident data still available after deleting a channel?" icon="circle-question">
251296
No, channel configuration (integrations, escalation rules, etc.) will be permanently deleted and cannot be recovered.
252297
</Accordion>
298+
299+
<Accordion title="How many channels can I create on the free plan?" icon="circle-question">
300+
The free plan supports creating **1** channel only. Once a channel exists, the **Create Channel** button is disabled and prompts you to upgrade. To create more channels, upgrade to the standard plan or above. [Learn about subscription plans](https://flashcat.cloud/flashduty/price/)
301+
</Accordion>
253302
</AccordionGroup>
254303

255304
## Related Topics

en/on-call/channel/escalation-rule.mdx

Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ src="https://download.flashcat.cloud/flashduty/video/escalate-rule.mp4"
1818

1919
## Configuration Elements
2020

21-
An escalation rule contains four core elements. The system matches rules from top to bottom, **stopping after the first successful match**.
21+
An escalation rule contains six core elements. The system matches rules from top to bottom, **stopping after the first successful match**.
2222

2323
You can enable or disable individual escalation rules. Disabled rules are skipped during matching and will not trigger notifications. You can also copy an escalation rule to the current channel or another channel to quickly reuse existing configurations.
2424

@@ -80,7 +80,25 @@ Determines how users are reached.
8080
</Tab>
8181
</Tabs>
8282

83-
### 4. Escalation Rules
83+
### 4. Delay Window
84+
85+
Reserve a waiting period before the first notification is sent, to filter out incidents caused by transient flapping.
86+
87+
- Range: 0 – 3600 seconds, default `0` (disabled, notify immediately)
88+
- During the delay window, if the incident is **auto-closed** or **manually closed**, the system does not send a notification
89+
90+
<Tip>
91+
For easily self-healing monitors (such as transient flaps or brief network timeouts), setting a reasonable delay window can significantly reduce unnecessary interruptions.
92+
</Tip>
93+
94+
### 5. Notification Template
95+
96+
Every escalation rule **must** select a notification template, which determines the message format delivered to each channel.
97+
98+
- Templates must be created and enabled in advance under [Notification Templates](/en/on-call/configuration/templates)
99+
- You can pick different templates for different escalation rules to format messages for specific scenarios
100+
101+
### 6. Escalation Rules
84102

85103
This is the key mechanism for ensuring incident closure. When first-level responders don't respond or complete handling, the system automatically escalates to the next level.
86104

@@ -103,6 +121,12 @@ The minimum escalation timeout is 1 minute.
103121
| Primary/Backup Escalation | Primary on-call no response for 10 min → Escalate to backup on-call |
104122
| Tiered Reporting | Technical staff unresolved for 30 min → Escalate to Tech Lead → Escalate to CTO |
105123

124+
**Level management**:
125+
126+
- **Default first level**: when you create a rule, the first level is generated automatically, with a default **30-minute timeout** before escalation, **no repeated notifications**, and notification methods that **follow personal preferences**
127+
- **Add or remove levels**: you can add or remove levels freely; a rule must keep at least 1 level
128+
- **Reorder**: you can move a level up or down, or insert a new level between any two existing levels
129+
106130
## Best Practices
107131

108132
<AccordionGroup>

en/on-call/channel/noise-reduction.mdx

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@ Flashduty On-call provides two grouping modes:
103103
| :------- | :----------------------------------- |
104104
| **Aggregation Window** | Optional toggle. When disabled, new alerts continue merging into the incident until the incident is closed. When enabled, alerts within the window are merged; alerts arriving after the window expires are grouped into a new incident |
105105
| **Window Timing Start** | Only configurable when the aggregation window is enabled. **Incident trigger** (default): Fixed timer starts from incident creation, stops grouping when the window duration is reached. **Alert merges into incident**: Timer resets each time a new alert merges in, the window recalculates from the last merge |
106-
| **Window Duration** | Only configurable when the aggregation window is enabled. Set the duration of the aggregation window |
106+
| **Window Duration** | Only configurable when the aggregation window is enabled. Set the duration of the aggregation window, minimum 1 minute. Rule-based grouping and intelligent grouping share the same cap: 24 hours by default, extendable to 30 days on request (contact the Flashduty team to enable) |
107107
| **Alert Storm Warning** | When merged alert count reaches a configured threshold, the system records an alert storm event in the incident timeline and triggers a warning notification, prompting urgent handling. You can configure up to 5 thresholds, each ranging from 2 to 10,000 |
108108
| **Strict Grouping** | When enabled, empty label values are treated as different; when disabled, empty values are treated as the same (not supported for intelligent grouping) |
109109

@@ -146,6 +146,16 @@ Flashduty On-call provides two grouping modes:
146146
</Tab>
147147
</Tabs>
148148

149+
### Configuration Limits
150+
151+
To protect grouping performance and stability, the following fields have hard backend caps:
152+
153+
| Grouping mode | Limit | Cap | Description |
154+
| :--- | :--- | :--- | :--- |
155+
| **Rule-based grouping** | Grouping dimensions (equals) | ≤ 5 per group | Number of dimensions per rule in both unified control and fine-grained control |
156+
| **Rule-based grouping** | Fine-grained branches (cases) | ≤ 100 | Total condition branches you can configure under fine-grained control |
157+
| **Intelligent grouping** | Fields used for similarity calculation (i_keys) | ≤ 10 | Default 4: `title`, `description`, `labels.service`, `labels.resource` |
158+
149159
### Grouping Effect
150160

151161
After setting grouping by **Alert Check Item**, 5 alert notifications are grouped into 1 incident:
@@ -196,7 +206,7 @@ Go to Channel Details → Noise Reduction → **Flapping Detection**:
196206
| :--- | :--- | :--- | :--- |
197207
| **State changes** (max_changes) | Number of alert state changes within the observation window to trigger flapping detection | 4 | 2–100 |
198208
| **Observation window** (in_mins) | Time window for counting state changes | 60 minutes | 1–1440 minutes |
199-
| **Mute duration** (mute_mins) | Duration to mute notifications after flapping is detected (only applies in "Alert Then Silence" mode) | 120 minutes | 0–1440 minutes |
209+
| **Mute duration** (mute_mins) | Duration to mute notifications after flapping is detected (only applies in "Alert Then Silence" mode) | 120 minutes | 30–1440 minutes |
200210

201211
<Tip>
202212
"Same incident" refers to incidents with the same Alert Key, typically using the alert ID pushed from the upstream system as a unique identifier.
@@ -259,7 +269,7 @@ Quickly create temporary silence rules based on existing incidents.
259269

260270
- Rule name defaults to "Quick Silence - #short-ID", with the incident title included in the description
261271
- Effective scope is the incident's channel (cannot be changed)
262-
- Default effective for 24 hours, automatically deleted after expiration
272+
- Default effective for 1 hour; you can choose from **30 minutes / 1 hour / 12 hours / 1 day / 1 week / 2 weeks**, and the rule is automatically deleted after expiration
263273
- Conditions default to severity and filtered label matching (automatically excluding numeric, overly long, and special labels)
264274

265275
<Frame caption="Quick Silence Configuration">
@@ -291,7 +301,7 @@ When a root cause alert exists, automatically inhibit related secondary alerts.
291301

292302
### Inhibit Conditions
293303

294-
When a new alert meets conditions, and there's an **active incident** (not closed) meeting conditions within 10 minutes, and both share equal items, the new alert is inhibited.
304+
When a new alert meets the conditions and there is a matching **active alert** (not acknowledged and not recovered) within the last 10 minutes, and both share equal items, the new alert is inhibited.
295305

296306
| Configuration | Description |
297307
| :--------- | :------------------------------ |
@@ -331,8 +341,7 @@ When a new alert meets conditions, and there's an **active incident** (not close
331341
Up to 5000, mainly to ensure console rendering performance. Due to backend concurrent processing, actual count may slightly exceed this limit.
332342
</Accordion>
333343
<Accordion title="What's the maximum number of events a single alert can be associated with?" icon="circle-question">
334-
- **Rule-based Grouping**: No limit, default maximum grouping window is 24 hours. After 24 hours from alert trigger, new events create new incidents
335-
- **Intelligent Grouping**: No limit, default maximum grouping window is 24 hours, with certain subscription plans supporting extension up to 30 days. After exceeding the window, new events create new incidents
344+
- **Rule-based grouping / Intelligent grouping**: No limit, default maximum aggregation window is 24 hours; rule-based and intelligent grouping share the same cap and can be extended to 30 days on request (contact the Flashduty team to enable). After exceeding the window, new events create new incidents
336345
- If the aggregation window is disabled, alerts continue merging into the existing incident until the incident is closed, with no time limit
337346
</Accordion>
338347
</AccordionGroup>

en/on-call/configuration/custom-fields.mdx

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -64,11 +64,11 @@ Go to Console **Incident Management** → **Custom Fields**
6464
<Step title="Create Field">
6565
Click **Create Custom Field**, enter the following information:
6666

67-
| Configuration | Description |
68-
| :--- | :--- |
69-
| **Field Name** | Identifies the field in API, cannot be modified after creation |
70-
| **Display Name** | Shown on incident details page, can be modified after creation |
71-
| **Field Description** | Helps incident handlers understand and use this field |
67+
| Configuration | Description | Constraints |
68+
| :--- | :--- | :--- |
69+
| **Field Name** | Identifies the field in API, cannot be modified after creation | 1-40 characters; letters, digits, and underscores only, and cannot start with a digit (regex: `^[a-zA-Z_][a-zA-Z0-9_]{0,39}$`) |
70+
| **Display Name** | Shown on incident details page, can be modified after creation | 1-40 characters |
71+
| **Field Description** | Helps incident handlers understand and use this field | Up to 200 characters, optional |
7272
</Step>
7373
<Step title="Select Field Type">
7474
| Type | Description |
@@ -79,7 +79,11 @@ Click **Create Custom Field**, enter the following information:
7979
| **Checkbox** | Checkbox toggle |
8080
</Step>
8181
<Step title="Complete Creation">
82-
Set options and default values as needed, click **Submit** to finish
82+
Set options and default values as needed, click **Submit** to finish.
83+
84+
<Note>
85+
For single-select and multi-select fields, the **default value** must be one of the options currently defined on the field. If you delete or modify an option, update the default value accordingly — otherwise the field cannot be saved.
86+
</Note>
8387
</Step>
8488
</Steps>
8589

0 commit comments

Comments
 (0)