Skip to content

feat: Prometheus metrics and observability#19

Closed
xernobyl wants to merge 68 commits intomainfrom
feat/metrics
Closed

feat: Prometheus metrics and observability#19
xernobyl wants to merge 68 commits intomainfrom
feat/metrics

Conversation

@xernobyl
Copy link
Contributor

@xernobyl xernobyl commented Feb 6, 2026

Summary

Adds optional Prometheus metrics for the Hub API: HTTP request count/duration, webhook job enqueue/delivery/disabled counters, and dropped-event counters. Metrics are exposed on a separate HTTP server (no API key). Also adds a readiness probe and metrics/docs improvements.

Changes

Observability

  • internal/observability/metrics.go

    • HubMetrics interface: RecordRequest, RecordEventDropped, RecordWebhookJobsEnqueued, RecordWebhookEnqueueError, RecordWebhookDelivery, RecordWebhookDisabled.
    • OpenTelemetry MeterProvider with Prometheus exporter; cardinality limit 2000.
    • Explicit histogram buckets (0.005, 0.025, 0.1, 0.5, 1, 2.5, 5 s) for http.server.duration and webhook_delivery_duration_seconds.
    • Instrument creation errors checked and returned from NewMeterProvider (fail fast on misconfiguration).
    • Bounded label values via normalizers (event_type, outcome, reason); unknown values become "unknown".
    • When metrics are disabled, nil is passed and all call sites guard with if metrics != nil.
  • internal/api/middleware/metrics.go

    • Custom HTTP middleware: records request count and duration; normalizes route (UUIDs → {id}) and status class (2xx/4xx/5xx).
    • Wraps the main handler so all API traffic (including GET /health) is measured.

Config and wiring

  • internal/config/config.go

    • PROMETHEUS_ENABLED: true for 1/true/yes/on (case-insensitive).
    • PROMETHEUS_EXPORTER_PORT (default 9464).
  • cmd/api/main.go

    • When PROMETHEUS_ENABLED=1: creates MeterProvider and a second HTTP server on the exporter port serving only GET /metrics; defers shutdown of that server and MeterProvider.
    • Passes metrics (or nil) into MessagePublisherManager, WebhookProvider, WebhookSenderImpl, WebhookDispatchWorker, and middleware.Metrics(metrics).
    • Handler order: Metrics(Logging(mainMux)) so duration is full request time.

Readiness and health

  • internal/api/handlers/health_handler.go
    • Ready handler: pings DB and returns 200 or 503 (for load balancer readiness).
  • GET /ready is registered on the main server (with GET /health); both are covered by the metrics middleware.

Documentation

  • docs/reference/environment-variables.mdx: PROMETHEUS_ENABLED, PROMETHEUS_EXPORTER_PORT, health/ready endpoints; link to metrics reference.
  • docs/reference/metrics.mdx: Full metrics reference—list of metrics, labels, when recorded, histogram buckets, normalized values, example Prometheus queries.
  • docs/webhooks-todo.md: Observability, dropped-events, and main (observability + readiness) items marked done.

Tests

  • internal/observability/metrics_test.go: Unit tests for normalizeEventType, normalizeOutcome, normalizeDisabledReason.
  • tests/metrics_integration_test.go: Creates MeterProvider, records one sample per metric type, calls the metrics handler, asserts response contains expected metric name stems.

Lint and formatting

  • .golangci.yml: lll (line length) linter enabled; limit configurable under linters.settings.lll.line-length; exclude rules for _test.go and tests/ for lll.
  • tests/integration_test.go: Long lines fixed by extracting URLs into variables before http.NewRequestWithContext (keeps lines under the configured limit).

Configuration

Variable Default Description
PROMETHEUS_ENABLED off Set to 1 to enable the metrics server and custom metrics.
PROMETHEUS_EXPORTER_PORT 9464 Port for the metrics HTTP server (only GET /metrics).
  • Metrics endpoint: GET http://<host>:<PROMETHEUS_EXPORTER_PORT>/metrics (no auth).
  • Health: GET /health (liveness), GET /ready (readiness, DB ping).

Testing

make tests          # integration tests (including metrics endpoint)
go test ./internal/observability/... -v  # normalizer unit tests

With PROMETHEUS_ENABLED=1 and the app running:

curl -s http://localhost:9464/metrics | grep -E 'http_server_|events_dropped|webhook_'

Checklist

  • Metrics disabled by default; no impact when PROMETHEUS_ENABLED is unset.
  • All label values bounded (normalizers) to control cardinality.
  • Health and readiness documented; health endpoint covered by metrics.
  • make fmt and make lint pass.

xernobyl and others added 30 commits February 2, 2026 14:26
- Add Event and MessagePublisher interface with MessagePublisherManager
- Add event type enum (internal/datatypes/event_type.go) for feedback_record
  and future webhook event types
- Add placeholder EmailDeliveryService implementing MessagePublisher
- Wire FeedbackRecordsService to publish events on create/update/delete
- Wire message manager in main and integration tests (no webhook provider yet)

Co-authored-by: Cursor <cursoragent@cursor.com>
- Use goose for app schema (migrations/001, 002_webhooks)
- Single 002_webhooks.sql migration with full webhooks table and indexes
- Keep river-migrate target and add to CI after init-db
- AGENTS.md: document river-migrate for webhook delivery

Co-authored-by: Cursor <cursoragent@cursor.com>
Base automatically changed from feat/webhooks to main February 10, 2026 19:21
@xernobyl xernobyl closed this Feb 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant