Skip to content
130 changes: 130 additions & 0 deletions develop-docs/sdk/telemetry/spans/implementation.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
---
title: Implementation Guidelines
sidebar_order: 10
---

<Alert level="warning">
🚧 This document is work in progress.
The steps and suggestions in this document primarily serve as a means to document what SDKs so far have been doing when implementing Span-First.
This page also serves as a place to document (temporary) decisions, trade-offs, considerations, etc.
</Alert>

<Alert>
This document uses key words such as "MUST", "SHOULD", and "MAY" as defined in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt) to indicate requirement levels.
</Alert>

This document provides guidelines for implementing Span-First in SDKs. This is purposefully NOT a full specification. For exact specifications, refer to the other pages under [Spans](..).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Relative link [Spans](..) on line 16 points to the wrong parent directory.
Severity: HIGH | Confidence: High

🔍 Detailed Analysis

The relative link [Spans](..) on line 16 of implementation.mdx is incorrect. The (..) path points to the parent directory (develop-docs/sdk/telemetry/), which is the Telemetry overview page. The intended target is the Spans overview page at develop-docs/sdk/telemetry/spans/index.mdx. This causes the link to navigate to the wrong page or fail.

💡 Suggested Fix

Change [Spans](..) to [Spans](./) or [Spans](./index.mdx) to correctly reference the Spans overview page.

🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: develop-docs/sdk/telemetry/spans/implementation.mdx#L16

Potential issue: The relative link `[Spans](..)` on line 16 of `implementation.mdx` is
incorrect. The `(..)` path points to the parent directory
(`develop-docs/sdk/telemetry/`), which is the Telemetry overview page. The intended
target is the Spans overview page at `develop-docs/sdk/telemetry/spans/index.mdx`. This
causes the link to navigate to the wrong page or fail.

Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 6181154


## How To Approach Span-First in SDKs

If you're implementing Span-First (as a PoC) in your SDK, take an iterative approach in which you implement the functionality incrementally. Here's a rough suggestion for iterations.

1. Add the Span v2 Envelope (type), serialization logic and any utilities necessary to support sending a new envelope. See [Span Protocol](../span-protocol) for more details.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Incorrect relative links to span-protocol.mdx and span-api.mdx using ../ instead of ./.
Severity: HIGH | Confidence: High

🔍 Detailed Analysis

Relative links to span-protocol.mdx and span-api.mdx in implementation.mdx are incorrect. On lines 22, 29, 40, and 61, the paths use ../ which navigates up to the parent directory (/develop-docs/sdk/telemetry/) instead of referencing sibling files within the same spans/ directory. This prevents users from accessing essential documentation, fragmenting the learning experience.

💡 Suggested Fix

Change ../span-protocol to ./span-protocol (or span-protocol) and ../span-api to ./span-api (or span-api) on lines 22, 29, 40, and 61.

🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: develop-docs/sdk/telemetry/spans/implementation.mdx#L22

Potential issue: Relative links to `span-protocol.mdx` and `span-api.mdx` in
`implementation.mdx` are incorrect. On lines 22, 29, 40, and 61, the paths use `../`
which navigates up to the parent directory (`/develop-docs/sdk/telemetry/`) instead of
referencing sibling files within the same `spans/` directory. This prevents users from
accessing essential documentation, fragmenting the learning experience.

Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 6181154

2. Add the top-level `traceLifecycle` (or `trace_lifecycle`) SDK init option which controls if traces should be sent as transactions or as spans (v2).
- The allowed values for this option MUST be `'static'` and `'stream'`.
- By default, the SDK MUST send traces as transactions (`'static'`). Span-First MUST be an opt-in feature.
- Continue with adding Span-First logic which MUST only be applied if `traceLifecycle` is set to `'stream'`.
3. As an initial PoC, leave your current transaction APIs in place and convert the transaction event to a v2 spans array to be sent in the new envelope.
- At this point, you can already start sending spans in batches (i.e. in multiple envelopes) to send more than 1000 spans at once. The maximum number of spans per envelope MUST be limited to 1000 and an envelope MUST only contain spans from one trace (as the trace envelope header is shared).
4. If applicable to your SDK, add new Span APIs to start spans. See [Span API](../span-api) for more details.
- Most importantly, add the simplest possible `start_span` API that leaves much control to users.
- Follow up with optional, more convenient APIs later.
- This new API MUST only be used in conjunction with the new `traceLifecycle` option and therefore only emit new spans (no transactions).
- This new API MUST NOT expose any old transaction properties or concepts like (`op`, `description`, `tags`, etc).
- TBD: Some SDKs already have `startSpan` or similar APIs. The migration path is still TBD but a decision can be made at a later stage.
5. Implement the `captureSpan` [single-span processing pipeline](#single-span-processing-pipeline)
- Either reuse existing heuristics (e.g. flush when segment span ends) or build a simple span buffer to flush spans (e.g. similar to the existing buffers for logs or metrics).
- Implementing the more complex [Telemetry Buffer](./../../telemetry-buffer) can happen at a later stage.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Relative links to Telemetry Buffer on lines 37 and 113 point to the wrong directory.
Severity: HIGH | Confidence: High

🔍 Detailed Analysis

Relative links to Telemetry Buffer on lines 37 and 113 of implementation.mdx are incorrect. The paths ./../../telemetry-buffer and ../../telemetry-buffer both resolve to develop-docs/sdk/telemetry-buffer/. The correct path should be ../telemetry-buffer to navigate up one level to telemetry/ and then into telemetry-buffer/. This results in broken internal documentation links.

💡 Suggested Fix

Change [Telemetry Buffer](./../../telemetry-buffer) and [Telemetry Buffer](../../telemetry-buffer) to [Telemetry Buffer](../telemetry-buffer).

🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: develop-docs/sdk/telemetry/spans/implementation.mdx#L37

Potential issue: Relative links to `Telemetry Buffer` on lines 37 and 113 of
`implementation.mdx` are incorrect. The paths `./../../telemetry-buffer` and
`../../telemetry-buffer` both resolve to `develop-docs/sdk/telemetry-buffer/`. The
correct path should be `../telemetry-buffer` to navigate up one level to `telemetry/`
and then into `telemetry-buffer/`. This results in broken internal documentation links.

Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 6181154

6. Achieve data parity with the existing transaction events.
- Ensure that the data added by SDK integrations, event processors, etc. to transaction events is also added to the spans (see [Event Processors](#tbd-event-processors)).
- Most additional data MUST only be added to the segment span. See [Common Attributes](../span-protocol/#common-attribute-keys) for attributes that MUST be added to every span.
- Mental model: All data our SDKs _automatically_ add to a transaction, MUST also be added to the segment span.
7. Implement the span telemetry buffer for proper, weighted span flushing. See [Span Buffer](#span-buffer) for more details.
8. (Optional) Depending on necessity, drop support for sending traces as transactions in the next major release. From this point on, the SDK will by default send spans (v2) only and therefore will no longer be compatible with current self-hosted Sentry installations.


## Span APIs

To do: This section needs a few guidelines and implementation hints, including:
- how to set a span active and remove it from the scope once it ends
- languages having to deal with async context management
- edge cases (e.g. adding a span with an explicit parent span that already ended)

## Single-Span Processing Pipeline

SDKs MUST expose a `captureSpan` API that takes a single span once it ends, and then processes and enqueues it into the span buffer. In most cases, this API SHOULD be exposed as a method on the `Client`. SDKs (e.g. JS Browser) MAY chose a different location if necessary.

Here's a rough overview of what `captureSpan` should do in which order:

1. Accept any span that already ended (i.e. has an `end_timestamp`)
2. Obtain the current, isolation and global scopes and merge the scope data.
3. Apply [common span attributes](../span-protocol/#common-attribute-keys) from the client and the merged scope data to every span.
4. Apply the merged scope data (including scope attributes) to the span IFF it is a segment span.
5. Apply any span processing hooks (i.e. event processor replacements) to the span.
6. Apply the `before_send_span` callback to the span.
7. Enqueue the span into the span buffer.

The `captureSpan` pipeline MUST NOT
- drop any span
- buffer spans before enqueuing them

### [TMP solution] Span Filtering

For the moment, we settled on `ignore_spans` being applied prior to span start. This means that the `captureSpan` pipeline doesn't have to handle filtering spans. However, there are some drawbacks with this approach, most prominently:
- Not being able to filter on span names or data that is added/updated post span start
- Not being able to filter entire segments (e.g. `http.server` segments for bot requests resulting in 404 errors)

We might revisit this, which could require changes to the single-span processing pipeline.

For now, this means though:
- Whenever `ignore_spans` is applied, SDKs MUST NOT start an actual span. Instead, they SHOULD start a No-op ("non-recording") span, which has no influence on the trace hierarchy.
- SDKS MUST record client outcomes for ignored spans
- SDKs MUST apply `ignore_spans` to every span if at all possible (POTel SDKs are excepted, but encouraged to do so as well)

### [TBD] Event Processors

Given that spans no longer are events (as opposed to transactions), they don't go through our event processors, which are exensively used throughout the SDKs (clients, integrations) but also by users.
Instead, we need to find another way for users or integrations to enrich and mutate spans.

For user-facing migration, we should try to solve every use case with `ignore_spans` (for filtering) and `before_send_span` (for enrichment, data scrubbing and span mutation).

For SDK-internal processing, we're still evaluating the preferred approach but there are two main options:

1. Expose new APIs for integrations (and secondarily users) to process a span.
For example via SDK lifecycle hooks (implemented in the JS SDK).
Every integration would have to listen to this hook and apply its logic to spans.
SDKs need to add a subscriber to the hook everywhere where they currently add an event processor.
- Pro: Clear separation and semantics
- Pro: Easy to implement and maintain
- Con: Leads to a lot of duplication whenever event processors apply to more than transaction events (these we can eventually drop once span-first becomes the default)
- Con: Users have to rewrite their event processors or perhaps their integrations. Not many users write their own processors but they definitely exist. Also 3rd party published integrations would be affected.
2. Construct a pseudo-event from the span and invoke event processors during `captureSpan`.
Once the processors were applied, back-merge the modified pseudo event into the span.
- Pro: Less duplication of code
- Pro: No/less need to rewrite existing instrumentations/integrations to support span-first
- Con: Because of the single-span processing approach, we cannot add child spans to the pseudo event. Even if we somehow made this possible, we have no guarantee that the entire span tree would be present. Similarly to the [span filtering implications](#tmp-solution-span-filtering).
- Con: back-merging is complex and might not be able to cover every aspect
- Con: Very obscure behaviour (to us and users) and contradicts our commitment to move away from events in the future.

SDK authors working on Span-First are encouraged to evaluate both options, try them out and provide perspective as well as better solutions.

## Span Buffer

This section is intentionally short because all buffering specification is being added to the [Telemetry Buffer](../../telemetry-buffer) page.

Some rough pointers:
- Given that SDKs SHOULD materialize and freeze the DSC as late as possible, the span buffer SHOULD enqueue span instances and at _flush time_ serialize them to JSON.
Before serialization, the span buffer SHOULD materialize and freeze the DSC on the segment span if not already done so.
This ensures that the `trace` envelope header has the most up to date data from the DSC (e.g. relevant for `transaction` names in the DSC).
- SDKs SHOULD follow one of the backend, mobile or browser telemetry buffer specifications.
- It is expected and fine to implement the proper, weighted buffering logic as a final step in the Span-First project.
Intermediate buffers MAY be simpler, for example disregard the priority logic and just buffer until a certain span length, size or time interval is reached.
## Release

The initial PoC implementation of Span-First **SHOULD** be released in a **minor version** of the SDK.

- This feature is entirely opt-in via `traceLifecycle = 'stream'` and therefore does **not** introduce breaking changes to existing users.
- The default tracing behavior (transaction-based) MUST remain unchanged until Span-First becomes the default in a future major release.
- Release notes and user facing documentation SHOULD clearly describe:
- the availability of Span-First behind the opt-in flag
- any known limitations
Loading