feat(observability): Prometheus /metrics endpoint by MorganOnCode · Pull Request #43 · MorganOnCode/cardano402

MorganOnCode · 2026-05-15T10:37:49Z

Closes audit #12 — the observability gap noted in the 2026-05-15 audit. Sentry covers errors/traces but there's no latency percentiles, throughput, or queue depth, blocking SLA validation and capacity planning.

What

Adds `GET /metrics` returning the standard Prometheus text format (v0.0.4), scrape-able by any Prometheus-compatible system.

Default metrics (from `prom-client`'s `collectDefaultMetrics`):

Node.js heap size / used
GC duration + counts
Event loop lag
Active handles / requests
CPU user/system seconds

Custom HTTP metrics (added by an `onResponse` hook):

`http_requests_total` (counter) labeled by `method`, `route`, `status_code`
`http_request_duration_seconds` (histogram) with realistic facilitator-latency buckets: 5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, 10s

Default label `service="cardano402"` on every series.

Design choices

Route pattern, not raw URL as the label. `/files/abc123` and `/files/xyz789` both collapse onto `route="/files/:cid"`. Bounded cardinality — won't explode the time-series count even with many distinct file IDs.
/metrics and /health excluded from tracking. The former avoids recursive accounting (scraping creates traffic), the latter avoids skewing latency percentiles with k8s-style liveness-probe noise.
`prom-client` directly, not a wrapper plugin. The wrappers (`fastify-metrics` etc.) add abstraction and another upstream to track for not much gain. ~50 lines of plugin code is clearer and easier to extend with custom metrics later (Redis ping latency, Blockfrost call duration, verify/settle outcomes — easy to add when the need shows up).
robots.txt updated to `Disallow: /metrics` so it's not crawl-indexed.

What it doesn't do (yet, deliberately)

No authentication on /metrics. For a single-VPS setup with no separate Prometheus server yet, this is acceptable. The metrics don't leak secrets — at worst they reveal traffic patterns. If/when you stand up Prometheus (or want to lock this down), the easy paths are: a shared-secret `?token=...` query param, bind /metrics to localhost-only, or wrap with nginx basic-auth. Defer until there's a real scraper.
No custom payment-flow metrics. I'd want `verify_total` / `settle_total` labeled by outcome, `blockfrost_call_duration_seconds`, etc. Quick to add once the framework is in. Out of scope for this PR — the goal here is the infrastructure.

Test plan

`pnpm typecheck` clean
`pnpm lint` clean
`pnpm test` — 34 files / 452 tests passing (+9 new for metrics)
CI passes on this branch
After deploy: `curl http://localhost:3000/metrics\` returns the Prometheus text format
After traffic: `curl http://localhost:3000/metrics | grep http_requests_total` shows non-zero counters for hit routes

🤖 Generated with Claude Code

Closes audit #12 -- the observability gap noted in the 2026-05-15 audit (Sentry covers errors but no latency percentiles, throughput, or queue depth, blocking SLA validation and capacity planning). Adds a small focused plugin at src/routes/metrics.ts using prom-client directly (no wrapper plugin): - GET /metrics returns the standard Prometheus text format (v0.0.4) - Registers Node.js default metrics: heap, GC, event loop lag, CPU - Adds two custom metrics labeled by method/route/status_code: - http_requests_total (counter) - http_request_duration_seconds (histogram with realistic facilitator latency buckets: 5ms to 10s) - Default label `service="cardano402"` on every series - Route label uses the templated pattern (e.g. "/files/:cid") not the raw URL -- bounded cardinality, won't explode the time-series count - Skips /metrics itself (recursive accounting) and /health (k8s-style liveness-probe noise that would skew latency percentiles) robots.txt updated to Disallow /metrics so it isn't crawl-indexed. 9 new tests cover: content type + Prometheus format, default Node.js metrics presence, custom counter/histogram presence, request tracking across multiple calls, route-pattern cardinality boundedness, method + status_code labels, and the /metrics + /health exclusions. Full suite: 34 files / 452 tests passing (was 33 / 443). No production behaviour change: the plugin only adds a new GET route and an onResponse hook that does in-memory increments. Memory cost is trivial (prom-client is ~50KB). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

MorganOnCode merged commit a7d1e78 into master May 15, 2026
5 checks passed

MorganOnCode deleted the feat/prometheus-metrics branch May 15, 2026 10:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(observability): Prometheus /metrics endpoint#43

feat(observability): Prometheus /metrics endpoint#43
MorganOnCode merged 1 commit into
masterfrom
feat/prometheus-metrics

MorganOnCode commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MorganOnCode commented May 15, 2026

What

Design choices

What it doesn't do (yet, deliberately)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant