Skip to content

feat: add Prometheus metrics for observability#177

Merged
kacpersaw merged 9 commits intomainfrom
kacpersaw/add-prometheus-metrics
Mar 4, 2026
Merged

feat: add Prometheus metrics for observability#177
kacpersaw merged 9 commits intomainfrom
kacpersaw/add-prometheus-metrics

Conversation

@kacpersaw
Copy link
Contributor

Summary

Add Prometheus metrics to expose request success/failure counts and error types so operators can monitor and alert on API communication issues.

New metrics

  • coder_logstream_requests_total{status} — tracks success/failure counts
  • coder_logstream_errors_total{type} — tracks errors by type (network)

New flag

  • --metrics-addr / CODER_LOGSTREAM_METRICS_ADDR (default :9100)
    Starts an HTTP server serving /metrics for Prometheus scraping.

Instrumented paths

  • PostLogSource failures
  • ConnectRPC28WithRole / ConnectRPC20 failures
  • Log Send failures
  • All success paths

Resolves #173, partially addresses #146

Add Prometheus metrics to expose request success/failure counts and
error types so operators can monitor and alert on API communication
issues.

New metrics:
- coder_logstream_requests_total{status} - tracks success/failure counts
- coder_logstream_errors_total{type} - tracks errors by type (network)

New flag:
- --metrics-addr / CODER_LOGSTREAM_METRICS_ADDR (default :9100)
  Starts an HTTP server serving /metrics for Prometheus scraping.

Instrumented paths:
- PostLogSource failures
- ConnectRPC28WithRole / ConnectRPC20 failures
- Log Send failures
- All success paths

Resolves #173, partially addresses #146
@kacpersaw kacpersaw requested a review from deansheather March 2, 2026 14:29
…Total

Address review feedback from @deansheather:
- Replace scattered metric calls with an instrumentedClient wrapper
  around agentsdk.Client that records metrics in one place
- Remove redundant errorsTotal metric (was always incremented alongside
  requestsTotal failure)
- Add method label to requestsTotal for better granularity:
  PostLogSource, ConnectRPC, SendLog
@kacpersaw kacpersaw requested a review from deansheather March 3, 2026 10:18
Address review feedback:
- Replace global prometheus metrics with a metricsCollector struct
  holding a custom prometheus.Registry, passed to all components
- Use requestMethod enum (methodPostLogSource, methodConnectRPC,
  methodSendLog) instead of raw strings
- Tests create fresh metricsCollector instances, avoiding flakes
  from shared global state in parallel test execution
@kacpersaw kacpersaw requested a review from deansheather March 4, 2026 09:37
@kacpersaw kacpersaw merged commit fbd5355 into main Mar 4, 2026
6 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add observability for failed requests

2 participants