Add Monitors and Automation page to DSM Kafka docs#36667
Conversation
Documents the monitor templates Data Streams Monitoring ships at cluster and topic level, and gives concrete examples of automating responses with webhooks (consumer lag, data-loss risk, broker disk, partition health). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Preview links (active after the
|
|
Created https://datadoghq.atlassian.net/browse/DOCS-14396 for docs review. |
The automation section previously framed webhooks as the only path. Workflow Automation is the better fit for the "trigger a runbook" patterns described below (Kubernetes, AWS, Slack, Jira actions). Reframe the section to present both options, and flag the broker-disk example as a non-DSM monitor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add page to left nav under Kafka via main.en.yaml (below Setup). - Convert intro to a bulleted overview linking to in-page sections. - Reorder topic-level monitor templates table (consumer lag first) and drop default thresholds that may change. - Reformat per-scenario automation guidance: detection sentence followed by a single "Potential action" line. - Cover Datadog Workflow Automation alongside webhooks. - Replace the prose call-out on the Kafka index with a capability bullet that matches the existing list format. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
piochelepiotr
left a comment
There was a problem hiding this comment.
Looks good to me. I see we are pretty in depth on technologies not owned by the DSM team (like webhooks, workflows). The downside is that these docs will run out of date if the webhook / workflow docs get updated. Also, my guess is that most users only want to get paged when an issue happens, and not take automatic actions.
- Link "Kafka broker metrics" to the Kafka integration page. - "Compact a candidate topic" -> "reduce retention on a candidate topic" for the disk-filling-up remediation, since reducing retention is the more immediate capacity recovery action. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
johannbotha
left a comment
There was a problem hiding this comment.
I like that we're suggesting what monitors to have at each level. As a user I'd still be confused to understand what metrics I get with KA and what is free with DD. I suggest adding a metrics section under kafka that lets you know all the metrics you get in addition to the free metrics.
Then we can easily link to those in our recommended monitors section.
|
|
||
| | Template | Metric | Condition | | ||
| |------------------------------------------------------------|-------------------------------------------------------------------------|-----------| | ||
| | Consumer lag is high for topic | `kafka.estimated_consumer_lag` | Consumer lag exceeds a threshold for a topic and consumer group. | |
There was a problem hiding this comment.
The important part of this metric is it's estimating in seconds and not offsets.
There was a problem hiding this comment.
Added a "description" column to explain the impact so we can highlight it's in seconds. Want to take a look at all descriptions? Preview is still WIP but you should be able to see the code change easily
There was a problem hiding this comment.
Yes, that makes it clearer that this one is in seconds.
There was a problem hiding this comment.
did you also read the other descriptions? Would love your review if anything is not clear
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
What does this PR do? What is the motivation?
Adds a new page, Monitors and Automation, under
data_streams/kafka/so customers landing on the Kafka docs can find:The monitor template details (titles, descriptions, metrics, default thresholds) are taken from the in-product templates defined in
web-ui(getClusterRecommendedMonitors/getTopicRecommendedMonitors), so the docs match what users see in the {{< ui >}}Monitors{{< /ui >}} side panel on a cluster or topic page.Also adds a
weightfield tosetup.mdand the new page so the left nav reads Setup → Monitors and Automation, and adds a pointer to the new page from the Kafka section's_index.md.Merge instructions
Merge readiness:
AI assistance
Drafted with Claude Code. The monitor templates section was rewritten against the source of truth in
web-ui/packages/apps/data-streams/private/runtime/kafka/components/KafkaMonitorsSidePanel/kafka-monitors-side-panel.utils.tsto make sure metric names, scopes, and the 80% retention threshold match what's actually shipped. Webhook examples were authored with Shelly's input on which "when" conditions to cover.Additional notes
hasKafkaBrokerMetricsis true). The doc notes this requirement inline rather than omitting the template.