Skip to content

[gateway] OTel Resource is empty on semconv schema-URL conflict — service.name lost, breaking trace stitching #7492

@lpcox

Description

@lpcox

Problem

The MCP gateway (ghcr.io/github/gh-aw-mcpg) exports all OpenTelemetry spans with a completely empty resource object — no service.name, no attributes. This prevents observability backends from stitching gateway spans into the gh-aw trace: instead of appearing as children of the agent.setup span, all gateway.request and mcp.tool_call spans render as disconnected orphan roots in a separate, service-less bucket.

The root cause is a semconv schema-URL conflict inside resource.New(). At gateway startup the log shows:

[tracing:provider] Warning: failed to create OTEL resource: error detecting resource: conflicting Schema URL: (opentelemetry.io/redacted) and (opentelemetry.io/redacted)
[tracing:provider] OTLP tracing initialized successfully

When this error occurs, the current code falls back to resource.Empty() — which has zero attributes — losing even the explicitly configured service.name. The SDK's TracerProvider is built with this empty resource, so every exported span has a resource-less OTel payload.

Trace linkage itself is correct (same traceId, correct parentSpanId). This is purely a Resource identity problem.

Source: github/gh-aw#38893

Analysis

File: internal/tracing/provider.go (around line 145–160)

InitProvider calls resource.New() with an explicit resource.WithSchemaURL(semconv.SchemaURL) from the go.opentelemetry.io/otel/semconv/v1.34.0 import, pinning the schema URL to `(opentelemetry.io/redacted)

import semconv "go.opentelemetry.io/otel/semconv/v1.34.0"

res, err := resource.New(ctx,
    resource.WithTelemetrySDK(),
    resource.WithSchemaURL(semconv.SchemaURL), // → ".../1.34.0"
    resource.WithContainer(),
    resource.WithAttributes(
        semconv.ServiceName(serviceName),
        semconv.ServiceVersion(version.Get()),
    ),
    resource.WithProcessPID(),
    resource.WithHost(),
)
if err != nil {
    // ← ROOT CAUSE: discards all attributes including service.name
    logTracing.Printf("Warning: failed to create OTEL resource: %v", err)
    res = resource.Empty()
}

The built-in detectors shipped with go.opentelemetry.io/otel/sdk v1.44.0 (the version in go.mod) use semconv/v1.41.0 internally. OTel's merge logic returns a conflicting Schema URL error when contributors disagree on the schema URL, and resource.New() returns a partial or empty resource along with the error.

The go.mod lists go.opentelemetry.io/otel/sdk v1.44.0 as a direct dependency, but the code imports semconv/v1.34.0 for attribute key definitions — these two versions are misaligned.

Impact table (from issue reporter's OTLP capture):

Check gh-aw spans gateway (mcpg) spans
traceId X same X
parentSpanId of gateway.request = agent.setup's spanId ✅
resource.service.name gh-aw. absent (empty resource)

Proposed Solution

Fix 1 (immediate, safe) — improve the error fallback in internal/tracing/provider.go:

Replace res = resource.Empty() with a fallback that always preserves the service identity even when full detection fails:

if err != nil {
    logTracing.Printf("Warning: failed to create OTEL resource: %v", err)
    // Fall back to a minimal resource containing only service.name/version.
    // Calling resource.New with only WithAttributes avoids any schema-URL conflicts
    // between auto-detectors, ensuring service identity is always exported.
    var fallbackErr error
    res, fallbackErr = resource.New(ctx,
        resource.WithAttributes(
            semconv.ServiceName(serviceName),
            semconv.ServiceVersion(version.Get()),
        ),
    )
    if fallbackErr != nil {
        logTracing.Printf("Warning: also failed to create minimal fallback resource: %v", fallbackErr)
        res = resource.Default()
    }
}

Fix 2 (root cause) — align semconv version with OTel SDK v1.44.0 in internal/tracing/provider.go:

The OTel SDK v1.44.0 uses semconv/v1.41.0 internally. Updating the import eliminates the schema-URL conflict entirely and removes the need for the explicit resource.WithSchemaURL(semconv.SchemaURL) option (the SDK detectors will all agree on the same version):

// Before:
import semconv "go.opentelemetry.io/otel/semconv/v1.34.0"

// After:
import semconv "go.opentelemetry.io/otel/semconv/v1.41.0"

Also remove the standalone resource.WithSchemaURL(semconv.SchemaURL) option from resource.New() — the attributes already carry the correct schema URL through their package origin, so the explicit override is redundant and was the source of the conflict.

Both fixes should be applied together: Fix 2 prevents the error; Fix 1 ensures a graceful degradation if any future semconv drift re-introduces a conflict.

Optionally: Accept gh-aw correlation headers (e.g., x-project-name, gh-aw.run.id) forwarded from the gh-aw CLI and surface them as resource attributes on the gateway resource, enabling cross-service correlation even in UIs that lane traces by service name.

Testing

  1. Start a local OTLP collector (e.g., otelcol with a debug/logging exporter, or Jaeger all-in-one).
  2. Run the gateway with `OTEL_EXPORTER_OTLP_ENDPOINT=(localhost/redacted)
  3. Make an MCP tool call through the gateway.
  4. Assert:
    • The gateway log no longer shows Warning: failed to create OTEL resource.
    • The OTLP payload's resourceSpans[].resource.attributes contains service.name = "mcp-gateway" and service.version.
  5. Add a unit test in internal/tracing/provider_test.go that calls InitProvider with a test OTLP endpoint and asserts res (via TracerProvider().Resource()) contains a non-empty service.name attribute.

Generated by Gateway Issue Dispatcher · 470.2 AIC · ⊞ 32.8K ·

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions