Observability

Quick Connection URLs

Use these URLs when connecting the API or opening the observability services locally:

Service	URL
API (VS Code local launch)	`http://localhost:5174`
API health (VS Code local launch)	`http://localhost:5174/health`
Grafana	`http://localhost:3001`
Prometheus	`http://localhost:9090`
Loki	`http://localhost:3100`
Tempo	`http://localhost:3200`
Aspire Dashboard UI	`http://localhost:18888` when an Aspire profile is running
OTLP gRPC endpoint	`http://localhost:4317`
OTLP HTTP endpoint	`http://localhost:4318`

For local API telemetry export:

use http://localhost:4317 when sending to Aspire Dashboard in Aspire-only mode
use http://localhost:18889 when sending to Aspire Dashboard in full observability mode
use http://localhost:4317 when sending to Grafana Alloy
do not run Aspire Dashboard and Alloy on the same host OTLP port mapping at the same time unless one is remapped

If you start the API from VS Code using .NET API + Observability, the API is available on http://localhost:5174 and the observability UI is Grafana on http://localhost:3001. If you start the API from VS Code using .NET API + Full Observability, the API is available on http://localhost:5174, Grafana is on http://localhost:3001, and Aspire Dashboard is on http://localhost:18888. If you start the API from VS Code using .NET API + Aspire Dashboard, the dashboard UI is http://localhost:18888. http://localhost:8080 is for the API container when the API itself runs inside Docker.

What This Is

This template uses a single observability model based on OpenTelemetry.

That means:

the API emits traces, metrics, and correlated logs
telemetry leaves the application over OTLP
the destination can change without changing business code
local development can use .NET Aspire Dashboard
shared dev and production-like environments can use Grafana Alloy + Loki + Tempo + Prometheus + Grafana

The design goal is simple:

instrumentation lives in the application
routing and storage live outside the application
observability stays a side product of the system, not a concern spread across all features

Core Terms

OpenTelemetry

OpenTelemetry is the instrumentation standard used by the API.

It defines how the application emits:

traces for request and operation flow
metrics for counters, histograms, and gauges
logs that can be correlated with traces

OTLP

OTLP is the protocol used to export telemetry out of the application.

In this project, the API sends data to:

.NET Aspire Dashboard in local dev
or Grafana Alloy in the full stack

Grafana Alloy

Grafana Alloy is the collector/gateway in the full stack.

It receives OTLP telemetry from the API and forwards it to:

Tempo for traces
Loki for logs
Prometheus for metrics

Grafana LGTM

LGTM in this repo means:

Loki for logs
Grafana for dashboards and exploration
Tempo for traces
Prometheus for metrics

This is the operational stack. Aspire Dashboard is the developer-facing shortcut.

Architecture

High-level architecture

                            Local Dev Option
API -> OpenTelemetry -> OTLP -> Aspire Dashboard

                            Full Stack Option
API -> OpenTelemetry -> OTLP -> Grafana Alloy
                                        |-> Tempo
                                        |-> Loki
                                        |-> Prometheus
                                                |
                                                v
                                             Grafana

Full-stack architecture in this repo

ASP.NET Core API
    |
    | OTLP (gRPC/HTTP)
    v
Grafana Alloy
    |
    | traces -----------------> Tempo
    | logs -------------------> Loki
    | metrics ----------------> Prometheus remote-write
    |
    v
Grafana

Application architecture

Telemetry registration is intentionally centralized:

Project-specific telemetry helpers are isolated under:

Infrastructure/Observability

This keeps controllers, services, filters, auth handlers, and startup code readable.

What Is Running

Application-side telemetry

The API emits:

inbound HTTP traces and metrics
outbound HttpClient traces and metrics
PostgreSQL traces via Npgsql
DragonFly/Redis traces via StackExchangeRedis
MongoDB traces via driver diagnostic sources
GraphQL traces via Hot Chocolate
runtime and process metrics
correlated logs with trace/span ids

Project-specific telemetry

The project adds telemetry for behavior that framework packages do not provide directly:

startup steps
Keycloak readiness
auth/BFF failures
output cache invalidation
output cache outcomes
validation failures
handled exceptions
concurrency conflicts
domain conflicts
explicit stored procedure spans

What Is Instrumented

Built-in instrumentation packages

The application uses OpenTelemetry-compatible packages for:

AspNetCore
HttpClient
Runtime
Process
Npgsql
StackExchangeRedis
HotChocolate
MongoDB diagnostic sources

Startup instrumentation

Startup telemetry traces these steps:

relational migrations
auth bootstrap seeding
MongoDB migrations
Keycloak readiness retries

Relevant code:

Auth and BFF instrumentation

Failure-only telemetry is recorded for:

missing tenant claim
unauthorized redirect converted to 401
missing refresh token
token endpoint rejection
token refresh exception
cookie refresh failure

Relevant code:

Cache instrumentation

Output cache telemetry includes:

invalidation count
invalidation duration
cache outcome counter with:
- hit
- store
- bypass

Relevant code:

Validation and exception instrumentation

The API records:

request rejections by validation
individual validation errors
handled exception count
optimistic concurrency conflicts
domain conflicts

Relevant code:

Stored procedure instrumentation

Stored procedures get explicit parent application spans on top of provider-level Npgsql spans.

Relevant code:

How the API Connects to Observability

Application configuration

Observability settings live in appsettings.json:

{
  "Observability": {
    "ServiceName": "APITemplate",
    "Otlp": {
      "Endpoint": "http://localhost:4317"
    },
    "Aspire": {
      "Endpoint": "http://localhost:4317"
    },
    "Exporters": {
      "Aspire": {
        "Enabled": null
      },
      "Otlp": {
        "Enabled": false
      },
      "Console": {
        "Enabled": false
      }
    }
  }
}

Supported keys:

Key	What it does
`Observability:ServiceName`	Service name attached to telemetry resources
`Observability:Otlp:Endpoint`	OTLP collector endpoint, usually Alloy
`Observability:Aspire:Endpoint`	OTLP endpoint for Aspire Dashboard
`Observability:Exporters:Aspire:Enabled`	Force Aspire on/off
`Observability:Exporters:Otlp:Enabled`	Force OTLP exporter on/off
`Observability:Exporters:Console:Enabled`	Enable OpenTelemetry console export

Exporter behavior

Current default behavior is:

local non-container development:
- Aspire exporter enabled
- OTLP exporter disabled unless explicitly turned on
containerized environments:
- OTLP exporter enabled
- Aspire exporter disabled

This logic lives in:

ObservabilityServiceCollectionExtensions.cs

Environment variable examples

Run local API and send telemetry to Alloy:

$env:Observability__Otlp__Endpoint="http://localhost:4317"
$env:Observability__Exporters__Otlp__Enabled="true"
$env:Observability__Exporters__Aspire__Enabled="false"
dotnet run --project src/APITemplate

Run local API and send telemetry only to Aspire Dashboard:

$env:Observability__Aspire__Endpoint="http://localhost:4317"
$env:Observability__Exporters__Aspire__Enabled="true"
$env:Observability__Exporters__Otlp__Enabled="false"
dotnet run --project src/APITemplate

Enable console exporter for debugging:

$env:Observability__Exporters__Console__Enabled="true"
dotnet run --project src/APITemplate

How to Run It

Option 1: API locally + Aspire Dashboard

Use this when you want quick inspection without the full Grafana stack.

Start Aspire Dashboard:

docker compose --profile aspire up -d aspire-dashboard

Then run the API locally:

dotnet run --project src/APITemplate

Default endpoints:

Aspire Dashboard UI: http://localhost:18888
Aspire OTLP gRPC exposed on host: http://localhost:4317
Aspire OTLP HTTP exposed on host: http://localhost:4318

Flow:

Local API -> localhost:4317 -> Aspire Dashboard

Option 2: API locally + observability stack without Aspire

Use this when you want realistic operational observability while still debugging the API locally.

Start the stack:

docker compose up -d alloy prometheus loki tempo grafana

Then run the API locally and point OTLP to Alloy:

$env:Observability__Otlp__Endpoint="http://localhost:4317"
$env:Observability__Exporters__Otlp__Enabled="true"
$env:Observability__Exporters__Aspire__Enabled="false"
dotnet run --project src/APITemplate

Flow:

Local API -> localhost:4317 -> Alloy -> Tempo/Loki/Prometheus -> Grafana

Option 3: API locally + full observability stack with Aspire and Grafana

Use this when you want both the LGTM stack and Aspire Dashboard running together.

Start the stack:

ASPIRE_OTLP_GRPC_PORT=18889 ASPIRE_OTLP_HTTP_PORT=18890 docker compose --profile aspire up -d postgres mongodb keycloak-db keycloak dragonfly alloy prometheus loki tempo grafana aspire-dashboard

Then run the API locally and send telemetry to both backends:

$env:Observability__Aspire__Endpoint="http://localhost:18889"
$env:Observability__Otlp__Endpoint="http://localhost:4317"
$env:Observability__Exporters__Aspire__Enabled="true"
$env:Observability__Exporters__Otlp__Enabled="true"
dotnet run --project src/APITemplate

Default endpoints:

Grafana UI: http://localhost:3001
Aspire Dashboard UI: http://localhost:18888
Alloy OTLP gRPC exposed on host: http://localhost:4317
Alloy OTLP HTTP exposed on host: http://localhost:4318
Aspire OTLP gRPC exposed on host: http://localhost:18889
Aspire OTLP HTTP exposed on host: http://localhost:18890

Flow:

Local API -> localhost:4317 -> Alloy -> Tempo/Loki/Prometheus -> Grafana
         -> localhost:18889 -> Aspire Dashboard

Option 4: full Docker environment

Use this when you want everything in containers, including the API.

Start the whole environment:

docker compose up -d --build

In this mode the API container already has the required env vars:

Observability__Otlp__Endpoint: "http://alloy:4317"
Observability__Exporters__Otlp__Enabled: "true"
Observability__Exporters__Aspire__Enabled: "false"

That wiring is in docker-compose.yml.

Option 5: production-like Compose

Use the production-like stack without Aspire:

docker compose -f docker-compose.production.yml up -d --build

This uses:

production environment
OTLP export to Alloy
the same LGTM backend pattern

See docker-compose.production.yml.

Docker Services and Ports

Development compose

The default Compose file starts these observability services:

Service	Container purpose	Host port
`alloy`	OTLP receiver and telemetry router	`4317`, `4318`, `12345`
`prometheus`	metrics backend	`9090`
`loki`	logs backend	`3100`
`tempo`	traces backend	`3200`
`grafana`	dashboards and exploration	`3001`
`aspire-dashboard`	optional local telemetry dashboard	`18888`, host `4317`, host `4318` by default, or `18889`, `18890` in full mode

Important detail:

alloy and aspire-dashboard both want OTLP ports on the host
in full mode the same aspire-dashboard service is started with host ports 18889 and 18890 to avoid that conflict
the provided VS Code launch profiles already separate these modes for you

Useful URLs

Tool	URL
API (VS Code local launch)	`http://localhost:5174`
Grafana	`http://localhost:3001`
Prometheus	`http://localhost:9090`
Loki	`http://localhost:3100`
Tempo	`http://localhost:3200`
Aspire Dashboard	`http://localhost:18888`
Health endpoint (VS Code local launch)	`http://localhost:5174/health`

If the API runs as a container instead of a local VS Code process, use http://localhost:8080 and http://localhost:8080/health.

How the Full Stack Is Connected

Alloy

Alloy configuration lives in config.alloy.

What it does:

receives OTLP on:
- 0.0.0.0:4317 for gRPC
- 0.0.0.0:4318 for HTTP
forwards:
- traces to Tempo
- logs to Loki
- metrics to Prometheus remote write
exposes its own metrics on 12345 for Prometheus scraping

Logs are sent to Loki over its native OTLP HTTP ingest endpoint at /otlp, not through the legacy Loki exporter format. That matters because Grafana Logs Drilldown expects Loki's OpenTelemetry-aware label and metadata model.

Tempo

Tempo stores distributed traces.

In this setup:

Alloy forwards traces to tempo:4317
Grafana queries Tempo on http://tempo:3200

Loki

Loki stores logs.

In this setup:

Alloy forwards logs to Loki native OTLP ingest on http://loki:3100/otlp
Grafana queries Loki on http://loki:3100

Prometheus

Prometheus stores metrics.

In this setup:

Alloy remote-writes metrics to http://prometheus:9090/api/v1/write
Prometheus also scrapes internal targets like Alloy, Loki, and Tempo

Prometheus configuration lives in:

prometheus.yml

Grafana

Default provisioning

Grafana is provisioned from repository files. No manual datasource setup is required.

Provisioning paths:

Datasources

Provisioned datasources:

Prometheus
Loki
Tempo

Datasource provisioning file:

datasources.yml

Grafana credentials

Default dev credentials:

user: admin
password: admin

They can be overridden with:

GRAFANA_ADMIN_USER
GRAFANA_ADMIN_PASSWORD

What you can do in Grafana

From Grafana you can:

query metrics in Prometheus
inspect logs in Loki
inspect traces in Tempo
jump from trace to logs using configured trace-to-log links

VS Code Launch Profiles

This repo includes VS Code profiles for observability workflows:

.NET API + Aspire Dashboard
.NET API + Observability
.NET API + Full Observability

These profiles:

start required support services first
run the API locally under the debugger
keep the API outside Docker so local debugging stays simple

Profile mapping:

.NET API + Aspire Dashboard starts aspire-dashboard, so use http://localhost:18888
.NET API + Observability starts alloy, grafana, tempo, loki, and prometheus, so use http://localhost:3001
.NET API + Full Observability starts the LGTM stack and aspire-dashboard, so use http://localhost:3001 and http://localhost:18888

Use them when you want the easiest developer workflow.

How to Use It Day to Day

Typical development flow

For simple local debugging:

start Aspire Dashboard
run the API locally
hit an endpoint
inspect traces, logs, and metrics in Aspire

For realistic end-to-end validation:

start the full LGTM stack
run the API locally or in Docker
hit REST and GraphQL endpoints
inspect traces in Tempo
inspect logs in Loki
inspect metrics and dashboards in Grafana

Example verification flow

Use any endpoint, for example:

GET /health
GET /api/v1/Products
GET /graphql

Then verify:

a trace exists for the request
child spans exist for database/cache/http calls when applicable
logs have traceId and spanId
request metrics appear in Grafana/Prometheus
custom metrics appear when relevant:
- validation errors
- auth failures
- cache outcomes
- conflict counters

Example Data Paths

REST request

HTTP GET /api/v1/Products
  -> AspNetCore server span
  -> service/repository work
  -> Npgsql span(s)
  -> Redis span(s) if cache used
  -> request metrics
  -> correlated logs

GraphQL request

POST /graphql
  -> AspNetCore span
  -> HotChocolate request/resolver spans
  -> GraphQL metrics
  -> Npgsql / Mongo / Redis child spans as needed
  -> correlated logs

Startup

Application startup
  -> startup.migrate (postgresql)
  -> startup.seed-auth-bootstrap
  -> startup.migrate (mongodb)
  -> startup.wait-keycloak-ready

How Logs, Traces, and Metrics Correlate

The project uses Serilog for application logging and enriches logs with OpenTelemetry context.

That gives you:

traceId
spanId
request correlation id

This makes it possible to:

start from a slow trace and find related logs
start from an error log and find the corresponding trace
compare traces with metrics spikes

Relevant code:

Troubleshooting

No telemetry visible

Check:

exporter flags are correct
the endpoint is correct
the receiver is listening on the expected host/port
the API is actually producing requests

Useful checks:

docker compose ps
docker compose logs alloy
docker compose logs grafana
docker compose logs aspire-dashboard

Aspire and Alloy both want port `4317`

This is expected.

Use one of these modes:

Aspire mode
observability mode without Aspire
full observability mode with remapped Aspire OTLP on 18889 and 18890

The launch profiles already separate these scenarios.

Traces appear but no logs

Check:

Alloy is forwarding logs to Loki
Loki is healthy
Grafana datasource Loki is provisioned

If logs appear in dashboards or Explore but not in Logs Drilldown, check that Alloy is exporting logs to Loki via native OTLP ingest. The legacy otelcol.exporter.loki path can still show logs in normal Loki queries, but Drilldown can miss them because the OTLP resource labels and structured metadata are not exposed the same way.

Metrics appear but no application service in dashboards

Check:

Observability:ServiceName
resource attributes from OTel registration
Grafana dashboard query filters

Duplicate DB spans

This project intentionally avoids EntityFrameworkCore tracing because provider-level Npgsql tracing is already enabled.

That avoids duplicate spans for the same PostgreSQL command.

Design Decisions

Why OpenTelemetry everywhere

Because it keeps instrumentation stable and backend choice flexible.

The app does not care whether telemetry ends up in:

Aspire Dashboard
Grafana LGTM
another OTLP-capable collector

Why Alloy instead of putting exporters everywhere

Because the application should export once.

Alloy then becomes the place where you:

route telemetry
enrich or transform telemetry
switch backends later

Why Prometheus and not Mimir

Because this template is optimized for simplicity first.

Prometheus is enough for:

local development
small shared environments
template-level operational baselines

Mimir can be introduced later when scale or retention requires it.

Why Npgsql tracing and not EF Core tracing

Because provider-level PostgreSQL spans are the most useful signal for this project and avoid duplicate DB spans.

Relevant Files

Application

Infrastructure

Summary

If you want the shortest mental model, it is this:

the API emits telemetry with OpenTelemetry
OTLP is the wire protocol
Aspire Dashboard is the quick local viewer
Alloy is the collector/router
Tempo stores traces
Loki stores logs
Prometheus stores metrics
Grafana is where you explore everything together

FilesExpand file tree

observability.md

Latest commit

History

observability.md

File metadata and controls

Observability

Quick Connection URLs

What This Is

Core Terms

OpenTelemetry

OTLP

Grafana Alloy

Grafana LGTM

Architecture

High-level architecture

Full-stack architecture in this repo

Application architecture

What Is Running

Application-side telemetry

Project-specific telemetry

What Is Instrumented

Built-in instrumentation packages

Startup instrumentation

Auth and BFF instrumentation

Cache instrumentation

Validation and exception instrumentation

Stored procedure instrumentation

How the API Connects to Observability

Application configuration

Exporter behavior

Environment variable examples

How to Run It

Option 1: API locally + Aspire Dashboard

Option 2: API locally + observability stack without Aspire

Option 3: API locally + full observability stack with Aspire and Grafana

Option 4: full Docker environment

Option 5: production-like Compose

Docker Services and Ports

Development compose

Useful URLs

How the Full Stack Is Connected

Alloy

Tempo

Loki

Prometheus

Grafana

Default provisioning

Datasources

Grafana credentials

What you can do in Grafana

VS Code Launch Profiles

How to Use It Day to Day

Typical development flow

Example verification flow

Example Data Paths

REST request

GraphQL request

Startup

How Logs, Traces, and Metrics Correlate

Troubleshooting

No telemetry visible

Aspire and Alloy both want port 4317

Traces appear but no logs

Metrics appear but no application service in dashboards

Duplicate DB spans

Design Decisions

Why OpenTelemetry everywhere

Why Alloy instead of putting exporters everywhere

Why Prometheus and not Mimir

Why Npgsql tracing and not EF Core tracing

Relevant Files

Application

Infrastructure

Summary

Aspire and Alloy both want port `4317`