-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
feat(develop/span-first): Add implementation guidelines page #15717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
fa8758c
a416952
9face3c
f5ee5d1
8e0553c
982025c
24a4365
ba7af5b
a82cf16
53e523c
482820a
da22b11
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,130 @@ | ||
| --- | ||
| title: Implementation Guidelines | ||
| sidebar_order: 10 | ||
| --- | ||
|
|
||
| <Alert level="warning"> | ||
| 🚧 This document is work in progress. | ||
| The steps and suggestions in this document primarily serve as a means to document what SDKs so far have been doing when implementing Span-First. | ||
| This page also serves as a place to document (temporary) decisions, trade-offs, considerations, etc. | ||
| </Alert> | ||
|
|
||
| <Alert> | ||
| This document uses key words such as "MUST", "SHOULD", and "MAY" as defined in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt) to indicate requirement levels. | ||
| </Alert> | ||
|
|
||
| This document provides guidelines for implementing Span-First in SDKs. This is purposefully NOT a full specification. For exact specifications, refer to the other pages under [Spans](..). | ||
|
|
||
| ## How To Approach Span-First in SDKs | ||
|
|
||
| If you're implementing Span-First (as a PoC) in your SDK, take an iterative approach in which you implement the functionality incrementally. Here's a rough suggestion for iterations. | ||
|
|
||
| 1. Add the Span v2 Envelope (type), serialization logic and any utilities necessary to support sending a new envelope. See [Span Protocol](../span-protocol) for more details. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Bug: Incorrect relative links to 🔍 Detailed AnalysisRelative links to 💡 Suggested FixChange 🤖 Prompt for AI AgentDid we get this right? 👍 / 👎 to inform future reviews. |
||
| 2. Add the top-level `traceLifecycle` (or `trace_lifecycle`) SDK init option which controls if traces should be sent as transactions or as spans (v2). | ||
| - The allowed values for this option MUST be `'static'` and `'stream'`. | ||
| - By default, the SDK MUST send traces as transactions (`'static'`). Span-First MUST be an opt-in feature. | ||
| - Continue with adding Span-First logic which MUST only be applied if `traceLifecycle` is set to `'stream'`. | ||
| 3. As an initial PoC, leave your current transaction APIs in place and convert the transaction event to a v2 spans array to be sent in the new envelope. | ||
| - At this point, you can already start sending spans in batches (i.e. in multiple envelopes) to send more than 1000 spans at once. The maximum number of spans per envelope MUST be limited to 1000 and an envelope MUST only contain spans from one trace (as the trace envelope header is shared). | ||
| 4. If applicable to your SDK, add new Span APIs to start spans. See [Span API](../span-api) for more details. | ||
| - Most importantly, add the simplest possible `start_span` API that leaves much control to users. | ||
| - Follow up with optional, more convenient APIs later. | ||
| - This new API MUST only be used in conjunction with the new `traceLifecycle` option and therefore only emit new spans (no transactions). | ||
| - This new API MUST NOT expose any old transaction properties or concepts like (`op`, `description`, `tags`, etc). | ||
| - TBD: Some SDKs already have `startSpan` or similar APIs. The migration path is still TBD but a decision can be made at a later stage. | ||
| 5. Implement the `captureSpan` [single-span processing pipeline](#single-span-processing-pipeline) | ||
| - Either reuse existing heuristics (e.g. flush when segment span ends) or build a simple span buffer to flush spans (e.g. similar to the existing buffers for logs or metrics). | ||
| - Implementing the more complex [Telemetry Buffer](./../../telemetry-buffer) can happen at a later stage. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Bug: Relative links to 🔍 Detailed AnalysisRelative links to 💡 Suggested FixChange 🤖 Prompt for AI AgentDid we get this right? 👍 / 👎 to inform future reviews. |
||
| 6. Achieve data parity with the existing transaction events. | ||
| - Ensure that the data added by SDK integrations, event processors, etc. to transaction events is also added to the spans (see [Event Processors](#tbd-event-processors)). | ||
| - Most additional data MUST only be added to the segment span. See [Common Attributes](../span-protocol/#common-attribute-keys) for attributes that MUST be added to every span. | ||
| - Mental model: All data our SDKs _automatically_ add to a transaction, MUST also be added to the segment span. | ||
| 7. Implement the span telemetry buffer for proper, weighted span flushing. See [Span Buffer](#span-buffer) for more details. | ||
| 8. (Optional) Depending on necessity, drop support for sending traces as transactions in the next major release. From this point on, the SDK will by default send spans (v2) only and therefore will no longer be compatible with current self-hosted Sentry installations. | ||
|
|
||
|
|
||
| ## Span APIs | ||
|
|
||
| To do: This section needs a few guidelines and implementation hints, including: | ||
| - how to set a span active and remove it from the scope once it ends | ||
| - languages having to deal with async context management | ||
| - edge cases (e.g. adding a span with an explicit parent span that already ended) | ||
|
|
||
| ## Single-Span Processing Pipeline | ||
|
|
||
| SDKs MUST expose a `captureSpan` API that takes a single span once it ends, and then processes and enqueues it into the span buffer. In most cases, this API SHOULD be exposed as a method on the `Client`. SDKs (e.g. JS Browser) MAY chose a different location if necessary. | ||
|
|
||
| Here's a rough overview of what `captureSpan` should do in which order: | ||
|
|
||
| 1. Accept any span that already ended (i.e. has an `end_timestamp`) | ||
| 2. Obtain the current, isolation and global scopes and merge the scope data. | ||
| 3. Apply [common span attributes](../span-protocol/#common-attribute-keys) from the client and the merged scope data to every span. | ||
| 4. Apply the merged scope data (including scope attributes) to the span IFF it is a segment span. | ||
| 5. Apply any span processing hooks (i.e. event processor replacements) to the span. | ||
| 6. Apply the `before_send_span` callback to the span. | ||
| 7. Enqueue the span into the span buffer. | ||
|
|
||
| The `captureSpan` pipeline MUST NOT | ||
| - drop any span | ||
| - buffer spans before enqueuing them | ||
|
|
||
| ### [TMP solution] Span Filtering | ||
|
|
||
| For the moment, we settled on `ignore_spans` being applied prior to span start. This means that the `captureSpan` pipeline doesn't have to handle filtering spans. However, there are some drawbacks with this approach, most prominently: | ||
| - Not being able to filter on span names or data that is added/updated post span start | ||
| - Not being able to filter entire segments (e.g. `http.server` segments for bot requests resulting in 404 errors) | ||
|
|
||
| We might revisit this, which could require changes to the single-span processing pipeline. | ||
|
|
||
| For now, this means though: | ||
| - Whenever `ignore_spans` is applied, SDKs MUST NOT start an actual span. Instead, they SHOULD start a No-op ("non-recording") span, which has no influence on the trace hierarchy. | ||
| - SDKS MUST record client outcomes for ignored spans | ||
| - SDKs MUST apply `ignore_spans` to every span if at all possible (POTel SDKs are excepted, but encouraged to do so as well) | ||
|
|
||
| ### [TBD] Event Processors | ||
|
|
||
| Given that spans no longer are events (as opposed to transactions), they don't go through our event processors, which are exensively used throughout the SDKs (clients, integrations) but also by users. | ||
| Instead, we need to find another way for users or integrations to enrich and mutate spans. | ||
|
|
||
| For user-facing migration, we should try to solve every use case with `ignore_spans` (for filtering) and `before_send_span` (for enrichment, data scrubbing and span mutation). | ||
|
|
||
| For SDK-internal processing, we're still evaluating the preferred approach but there are two main options: | ||
|
|
||
| 1. Expose new APIs for integrations (and secondarily users) to process a span. | ||
| For example via SDK lifecycle hooks (implemented in the JS SDK). | ||
| Every integration would have to listen to this hook and apply its logic to spans. | ||
| SDKs need to add a subscriber to the hook everywhere where they currently add an event processor. | ||
| - Pro: Clear separation and semantics | ||
| - Pro: Easy to implement and maintain | ||
| - Con: Leads to a lot of duplication whenever event processors apply to more than transaction events (these we can eventually drop once span-first becomes the default) | ||
| - Con: Users have to rewrite their event processors or perhaps their integrations. Not many users write their own processors but they definitely exist. Also 3rd party published integrations would be affected. | ||
| 2. Construct a pseudo-event from the span and invoke event processors during `captureSpan`. | ||
| Once the processors were applied, back-merge the modified pseudo event into the span. | ||
| - Pro: Less duplication of code | ||
| - Pro: No/less need to rewrite existing instrumentations/integrations to support span-first | ||
| - Con: Because of the single-span processing approach, we cannot add child spans to the pseudo event. Even if we somehow made this possible, we have no guarantee that the entire span tree would be present. Similarly to the [span filtering implications](#tmp-solution-span-filtering). | ||
| - Con: back-merging is complex and might not be able to cover every aspect | ||
| - Con: Very obscure behaviour (to us and users) and contradicts our commitment to move away from events in the future. | ||
|
|
||
| SDK authors working on Span-First are encouraged to evaluate both options, try them out and provide perspective as well as better solutions. | ||
|
|
||
| ## Span Buffer | ||
|
|
||
| This section is intentionally short because all buffering specification is being added to the [Telemetry Buffer](../../telemetry-buffer) page. | ||
|
|
||
| Some rough pointers: | ||
| - Given that SDKs SHOULD materialize and freeze the DSC as late as possible, the span buffer SHOULD enqueue span instances and at _flush time_ serialize them to JSON. | ||
| Before serialization, the span buffer SHOULD materialize and freeze the DSC on the segment span if not already done so. | ||
| This ensures that the `trace` envelope header has the most up to date data from the DSC (e.g. relevant for `transaction` names in the DSC). | ||
| - SDKs SHOULD follow one of the backend, mobile or browser telemetry buffer specifications. | ||
| - It is expected and fine to implement the proper, weighted buffering logic as a final step in the Span-First project. | ||
| Intermediate buffers MAY be simpler, for example disregard the priority logic and just buffer until a certain span length, size or time interval is reached. | ||
| ## Release | ||
|
|
||
| The initial PoC implementation of Span-First **SHOULD** be released in a **minor version** of the SDK. | ||
|
|
||
| - This feature is entirely opt-in via `traceLifecycle = 'stream'` and therefore does **not** introduce breaking changes to existing users. | ||
| - The default tracing behavior (transaction-based) MUST remain unchanged until Span-First becomes the default in a future major release. | ||
| - Release notes and user facing documentation SHOULD clearly describe: | ||
| - the availability of Span-First behind the opt-in flag | ||
| - any known limitations | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Relative link
[Spans](..)on line 16 points to the wrong parent directory.Severity: HIGH | Confidence: High
🔍 Detailed Analysis
The relative link
[Spans](..)on line 16 ofimplementation.mdxis incorrect. The(..)path points to the parent directory (develop-docs/sdk/telemetry/), which is the Telemetry overview page. The intended target is the Spans overview page atdevelop-docs/sdk/telemetry/spans/index.mdx. This causes the link to navigate to the wrong page or fail.💡 Suggested Fix
Change
[Spans](..)to[Spans](./)or[Spans](./index.mdx)to correctly reference the Spans overview page.🤖 Prompt for AI Agent
Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID:
6181154