Skip to content

SRE Time: Prometheus & Grafana #112

@adarshm11

Description

@adarshm11

Motivation

Metrics help make sure that the service is running as expected, and alerts when it doesn't. Also just helps with general debugging 😁

Prometheus

  • Create a new file metrics/metrics.go and import the Golang Prometheus client. You can use the promauto package as well to auto-register metrics.
  • Create a list of metrics that will be helpful for determining the health of the server. Examples may include container uptime, endpoint hits, API latency, etc.
  • Add a /metrics endpoint to the server where metrics can be exposed and scraped from
  • Run the container, and ensure that the metrics at /metrics are viewable and accurate (i.e. when an endpoint is hit, does the endpoint hits metric increment accordingly?)
  • Finally, after SCEvents is deployed on the SCE server, add its docker container name to the Clark Prometheus config. To verify that this works, check the Prometheus querying site on one.sce (requires VPN access)

Grafana

  • After these changes are completed and the server is hosted on the SCE server with metrics available via Clark's Prometheus exporter, create a new Grafana dashboard in the monitoring repo in /grafana/provisioning/dashboards. Follow the monitoring instructions for testing and verifying correct behavior. Tip: use an existing service as a guide for how the dashboard should look (I recommend goderpad)

Good luck soldier you are now starting down the SCE SRE pipeline 🫡

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions