Skip to content

fix: replace Node.js OTLP exporter with fetch-based exporter#890

Merged
stack72 merged 1 commit intomainfrom
fix/otel-fetch-exporter
Mar 27, 2026
Merged

fix: replace Node.js OTLP exporter with fetch-based exporter#890
stack72 merged 1 commit intomainfrom
fix/otel-fetch-exporter

Conversation

@stack72
Copy link
Copy Markdown
Contributor

@stack72 stack72 commented Mar 27, 2026

Summary

  • Replace @opentelemetry/exporter-trace-otlp-http (Node.js HttpsClientRequest transport) with a custom FetchOtlpExporter that uses Deno's native fetch API
  • The Node.js HTTPS transport fails TLS connections in Deno's compiled binary, causing 10s timeout errors when exporting traces to HTTPS OTLP endpoints like Honeycomb
  • Native fetch bypasses the Node.js compat layer entirely, fixing HTTPS in compiled binaries
  • All export errors are silently swallowed — tracing never interferes with the CLI
  • Added graceful error handling around shutdownTracing() so flush failures are caught

What changed

File Change
src/infrastructure/tracing/fetch_otlp_exporter.ts New FetchOtlpExporter implementing SpanExporter with fetch + JsonTraceSerializer
src/infrastructure/tracing/fetch_otlp_exporter_test.ts 5 unit tests: URL/headers, HTTP errors, network errors, shutdown, timeout
src/infrastructure/tracing/otel_init.ts Swap to FetchOtlpExporter, add try/catch around shutdown
deno.json Remove exporter-trace-otlp-http, add otlp-transformer + core
deno.lock Updated lockfile
design/tracing.md Document new exporter and dependency changes

Why fetch instead of catching the error?

Catching the timeout would hide the symptom, but traces would still silently fail to export over HTTPS. The HttpsClientRequest transport can't complete TLS handshakes in compiled binaries — the request never succeeds. Switching to fetch fixes the actual transport so traces reach the collector.

Test Plan

  • 5 new unit tests for FetchOtlpExporter (URL, headers, errors, shutdown, timeout)
  • 3 existing otel_init tests pass
  • Full test suite: 3606 tests pass
  • deno check, deno lint, deno fmt all clean
  • Compiled binary, ran multi-step workflow with OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 — 17 spans exported to local Jaeger across full hierarchy (swamp.cliswamp.workflow.runswamp.workflow.jobswamp.workflow.stepswamp.model.methodswamp.driver.execute)

Fixes #889

🤖 Generated with Claude Code

The `@opentelemetry/exporter-trace-otlp-http` package uses Node.js
`HttpsClientRequest` for HTTPS connections, which fails in Deno's
compiled binary due to a TLS compatibility issue in the Node.js compat
layer. This causes a 10-second timeout and uncaught promise rejection
when `OTEL_EXPORTER_OTLP_ENDPOINT` points to an HTTPS endpoint (e.g.
Honeycomb).

Replace the Node.js HTTP-based exporter with a custom `FetchOtlpExporter`
that uses Deno's native `fetch` API. This bypasses the Node.js compat
layer entirely, fixing HTTPS connections in compiled binaries.

Changes:
- Add `FetchOtlpExporter` implementing the `SpanExporter` interface,
  using `JsonTraceSerializer` from `@opentelemetry/otlp-transformer`
  for OTLP JSON serialization and native `fetch` for transport
- All export errors are silently swallowed — tracing never interferes
  with the CLI
- Add graceful error handling around `shutdownTracing()` so flush
  failures during shutdown are caught
- Swap dependencies: remove `@opentelemetry/exporter-trace-otlp-http`,
  add `@opentelemetry/otlp-transformer` and `@opentelemetry/core`
- Update `design/tracing.md` to document the new exporter
- Add comprehensive unit tests for the new exporter (URL construction,
  headers, timeout, error handling, shutdown behavior)

Verified end-to-end: compiled binary successfully exports traces to
local Jaeger (17 spans across full workflow hierarchy).

Fixes #889

Co-authored-by: Blake Irvin <blakeirvin@me.com>
Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Clean, well-scoped infrastructure fix. The FetchOtlpExporter is correctly placed in src/infrastructure/tracing/, contains no domain logic, and is properly encapsulated behind the existing otel_init.ts dynamic import. No libswamp boundary violations.

Blocking Issues

None.

Suggestions

  1. HTTP error handling semantics (fetch_otlp_exporter.ts:99-102): Non-ok HTTP responses (e.g. 500) resolve as SUCCESS because #send doesn't throw. The test at line 133 documents this as intentional ("the exporter's job is to send, not to retry"), which is a reasonable design — but worth noting that BatchSpanProcessor won't retry these since it only retries on FAILED. If that's acceptable for this use case (tracing is best-effort), no action needed.

What looks good

  • Private fields via # — clean encapsulation
  • AbortController timeout with proper cleanup in finally
  • Response body draining on error to avoid resource leaks
  • Shutdown guard prevents sends after shutdown
  • 5 well-structured tests covering happy path, HTTP errors, network errors, shutdown, and timeout
  • try/catch around shutdownTracing() is a good defensive addition
  • Design doc updated to reflect the dependency change

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial Review

Critical / High

None found.

Medium

  1. fetch_otlp_exporter.ts:95body.buffer as ArrayBuffer may send wrong bytes if the Uint8Array is a view into a larger buffer.

    JsonTraceSerializer.serializeRequest() returns a Uint8Array. If that array is a subarray view (non-zero byteOffset, or byteLength < buffer.byteLength), then body.buffer returns the entire underlying ArrayBuffer, not just the visible slice. This would silently send corrupted trace data.

    Breaking example: If a future version of otlp-transformer returns largeBuffer.subarray(offset, offset + len), the fetch sends largeBuffer bytes instead of the intended slice.

    Suggested fix: Pass the Uint8Array directly — fetch accepts it as a body in Deno:

    body: body,

    This is both simpler and immune to the buffer/view mismatch.

  2. fetch_otlp_exporter.ts:99-102 — HTTP 4xx/5xx responses are reported as SUCCESS, preventing BatchSpanProcessor retries.

    #send() doesn't throw on non-ok responses — it drains the body and resolves normally. The .then() success handler on line 69 fires, reporting ExportResultCode.SUCCESS. BatchSpanProcessor uses this code to decide whether to retry; reporting SUCCESS on a 503 or 429 means transient server errors silently drop spans without retry.

    Breaking example: OTLP collector returns 503 during a rolling deploy. Spans are lost with no retry because the exporter reported success.

    Suggested fix: Throw (or reject) on !response.ok so the rejection handler on line 70 fires FAILED:

    if (!response.ok) {
      await response.arrayBuffer();
      throw new Error(`OTLP export failed: ${response.status}`);
    }

    Note: the test at line 130 ("returns FAILED on HTTP error responses") asserts SUCCESS and would need updating.

  3. fetch_otlp_exporter.ts:92-97 — Response body not consumed on the success path, potential connection leak in Deno.

    When response.ok is true, the body is never read. In Deno, unconsumed response bodies hold the underlying HTTP connection open until garbage collection. Under sustained export load, this could accumulate open connections to the collector.

    Suggested fix: Drain on all paths:

    // After the fetch:
    await response.arrayBuffer();

Low

  1. fetch_otlp_exporter.ts:86serializeRequest returning falsy is silently treated as SUCCESS. If serialization fails (returns undefined), #send resolves, and the success callback reports ExportResultCode.SUCCESS. This is fine for "tracing never blocks the CLI," but it means the caller (BatchSpanProcessor) believes the export succeeded when nothing was sent.

  2. fetch_otlp_exporter.ts:74-76shutdown() returns immediately without awaiting in-flight exports. If export() has already dispatched a #send() call that's mid-flight, shutdown() won't wait for it. The fetch will be orphaned. In practice this is harmless for a CLI that's about to exit, but it's a deviation from the SpanExporter contract where shutdown should "shut down the exporter and clean up."

Verdict

PASS — This is a well-structured, well-tested fix for a real problem (Node.js TLS compat in Deno compiled binaries). The medium findings are all in the "tracing should never block the CLI" telemetry path, so none rise to the level of blocking the merge. The body.buffer issue (Medium #1) is the most worth addressing since it's a one-character fix that eliminates a latent data correctness risk.

@stack72 stack72 merged commit c0abc5a into main Mar 27, 2026
10 checks passed
@stack72 stack72 deleted the fix/otel-fetch-exporter branch March 27, 2026 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OTel OTLP exporter times out connecting to HTTPS endpoints from compiled binary

1 participant