Skip to content

observability: migrate in-cluster demo stack to AMP/AMG and reconcile Terraform (Phase 7) #20

@ausbru87

Description

@ausbru87

Summary

Phase 7. Migrate from the in-cluster demo Prometheus and Grafana stack to the
AWS-native target, then codify all new IAM/IRSA and AWS resources in Terraform.
Design: docs/plans/observability-aws-native.md.

Background

A separate workstream builds an in-cluster Prometheus plus Grafana stack for the
demo (reserved host metrics.usgov.coderdemo.io). The production target replaces
it with AMP plus AMG while keeping the dashboards and alert logic. Existing IRSA
roles were partly created imperatively and must be reconciled (see
docs/as-built/80-iac-vs-imperative.md and terraform/secrets-hardening.tf).

Tasks

  • Bridge: configure the in-cluster Prometheus remote_write to AMP and
    dual-run to de-risk before cutting over to the standalone ADOT collector.
  • Re-import the in-cluster Grafana dashboards into AMG; port Alertmanager
    rules to AMG alerts or CloudWatch alarms plus SNS.
  • Decommission the in-cluster Prometheus, Grafana, and Alertmanager once AMG
    is validated.
  • Codify in Terraform: AMP workspace, AMG workspace and SAML, ADOT and
    Fluent Bit IRSA roles, Firehose, S3 bucket and lifecycle, Glue, log group and
    retention, SNS, EventBridge/alarms, and any CMKs. Import imperatively created
    roles before apply.

Acceptance criteria

  • AMG fully replaces the in-cluster Grafana for the demo views.
  • terraform plan is clean with the new observability resources managed.

Generated by Coder Agents.

Metadata

Metadata

Assignees

No one assigned

    Labels

    observabilityObservability and telemetry

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions