Skip to content

refactor: simplify/reformat existing grafana alerts#142

Merged
amdove merged 25 commits intomainfrom
alert-reformat
Mar 5, 2026
Merged

refactor: simplify/reformat existing grafana alerts#142
amdove merged 25 commits intomainfrom
alert-reformat

Conversation

@amdove
Copy link
Contributor

@amdove amdove commented Feb 25, 2026

Description

  • Standardized alert format across all 15 Grafana alerts with consistent WHERE/DETAILS sections
  • Added tenant_name field to WorkloadConfig and Alloy external_labels for tenant identification in alerts
  • Alerts now use severity emojis (🔴 CRITICAL, 🟡 WARNING) and include runbook/dashboard links

Must be deployed to control rooms and workload clusters. Tenant name will default to compound_name but not until the new update gets applied to the workload cluster. Has already been applied to staging.

Test plan

  • Just test-python-pulumi passes (111 tests)
  • Deploy to staging and verify alert format in Grafana

No new tests added since tenant_name is a cosmetic field with simple fallback logic (tenant_name or compound_name), testing would only verify basic Python language features.

Category of change

  • Bug fix (non-breaking change which fixes an issue)
  • Version upgrade (upgrading the version of a service or product)
  • New feature (non-breaking change which adds functionality)
  • Build: a code change that affects the build system or external dependencies
  • Performance: a code change that improves performance
  • Refactor: a code change that neither fixes a bug nor adds a feature
  • Documentation: documentation changes
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

@amdove amdove changed the title Simplify existing grafana alerts refactor: simplify/reformat existing grafana alerts Feb 25, 2026
@amdove amdove marked this pull request as ready for review March 4, 2026 22:13
@amdove amdove requested a review from a team as a code owner March 4, 2026 22:13
@amdove amdove requested a review from timtalbot March 4, 2026 22:14
@amdove amdove enabled auto-merge March 5, 2026 19:24
Copy link
Contributor

@Lytol Lytol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooooh, so excited for some more sensible alert copy!

I left an observation, but it's more just thinking aloud... feel free to ignore it entirely if it's not useful.

One other note: are we delineating the alert "criticality" (between critical and warning) in any way aside from the summary copy? It feels like there should be a norm for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

observation: Obviously much beyond the scope of this PR, but the HUGE inline string configs are kinda annoying IMO. It could be a nice QOL improvement to move those into their own files and then just read them in at runtime. 🤷

Comment on lines +14 to +16
# Human-readable tenant name for alerts (defaults to compound_name if not set)
tenant_name: "Example Analytics Team"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@amdove amdove added this pull request to the merge queue Mar 5, 2026
Merged via the queue into main with commit cefd8df Mar 5, 2026
3 checks passed
@amdove amdove deleted the alert-reformat branch March 5, 2026 20:13
Copy link
Collaborator

@stevenolen stevenolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what he said ⤴️ . this is fantastic!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants