You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: add Prometheus metrics for observability (#177)
* feat: add Prometheus metrics for observability
Add Prometheus metrics to expose request success/failure counts and
error types so operators can monitor and alert on API communication
issues.
New metrics:
- coder_logstream_requests_total{status} - tracks success/failure counts
- coder_logstream_errors_total{type} - tracks errors by type (network)
New flag:
- --metrics-addr / CODER_LOGSTREAM_METRICS_ADDR (default :9100)
Starts an HTTP server serving /metrics for Prometheus scraping.
Instrumented paths:
- PostLogSource failures
- ConnectRPC28WithRole / ConnectRPC20 failures
- Log Send failures
- All success paths
Resolves#173, partially addresses #146
* feat: update Helm chart for metrics, default to disabled for backward compat
* test: add end-to-end metrics endpoint test
* fix: initialize metric labels so they appear at zero in /metrics
* fix: handle errcheck lint for resp.Body.Close in test
* refactor: move metrics into instrumentedClient wrapper, remove errorsTotal
Address review feedback from @deansheather:
- Replace scattered metric calls with an instrumentedClient wrapper
around agentsdk.Client that records metrics in one place
- Remove redundant errorsTotal metric (was always incremented alongside
requestsTotal failure)
- Add method label to requestsTotal for better granularity:
PostLogSource, ConnectRPC, SendLog
* fix: remove redundant embedded field from selector (staticcheck QF1008)
* refactor: use custom prometheus registry and method enum
Address review feedback:
- Replace global prometheus metrics with a metricsCollector struct
holding a custom prometheus.Registry, passed to all components
- Use requestMethod enum (methodPostLogSource, methodConnectRPC,
methodSendLog) instead of raw strings
- Tests create fresh metricsCollector instances, avoiding flakes
from shared global state in parallel test execution
* fix: goimports formatting in logger_test.go
0 commit comments