feat(span-buffer): Add a new write/read option for removing payloads from sets by evanh · Pull Request #107809 · getsentry/sentry

evanh · 2026-02-06T21:41:06Z

There is a hypothesis that some of the CPU usage on Redis is from copying the compressed payload
bytes into the new sets when the sets are merged. Add a new write/read method that stores payloads
over a certain size into a separate key. This key is then fetched and populated when flushing the
segment.

THIS IS MISSING TESTS

…from sets There is a hypothesis that some of the CPU usage on Redis is from copying the compressed payload bytes into the new sets when the sets are merged. Add a new write/read method that stores payloads over a certain size into a separate key. This key is then fetched and populated when flushing the segment.

sentry · 2026-02-06T21:44:08Z

src/sentry/spans/buffer.py

+                    if span_data.starswith("span-buf:p:ld:"):
+                        payload_keys.append(span_data)
+                    else:
+                        decompressed_spans.extend(self._decompress_batch(span_data))
+
+                if payload_keys:
+                    payloads = self.client.mget(*payload_keys)
+                    decompressed_spans.extend(map(self._decompressed_batch, payloads))


Bug: The payloads variable is reassigned from a dict to a list, causing a TypeError on subsequent dictionary-style access.
_{Severity: CRITICAL}

Suggested Fix

Use a different variable name for the result of self.client.mget(*payload_keys) to avoid overwriting the payloads dictionary. Then, iterate through the results to populate the original payloads dictionary correctly.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: src/sentry/spans/buffer.py#L818-L825 Potential issue: In the `_load_segment_data` method, the `payloads` variable is initialized as a dictionary. However, it is later reassigned to the result of `self.client.mget()`, which returns a list. Subsequent code attempts to access this variable using dictionary key lookups (e.g., `payloads[key]`), which will fail with a `TypeError` because list indices must be integers, not the `bytes` keys being used. This occurs when the `read_from_payload_set` feature is enabled and segments are being flushed.

_{Did we get this right? 👍 / 👎 to inform future reviews.}

sentry · 2026-02-06T21:44:08Z

src/sentry/spans/buffer.py

+                        if write_to_payload_set:
+                            p.hdel(payload_set_redirect_map_key, *span_ids)
+                            p.unlink(
+                                *(self._get_payload_key(project_id, trace_id, s) for s in span_ids)


Bug: The _get_payload_key method is called with the wrong number and types of arguments, which will cause a TypeError.
_{Severity: CRITICAL}

Suggested Fix

Combine and decode the project_id and trace_id bytes into a single project_and_trace string before calling _get_payload_key. The call should be updated to self._get_payload_key(f"{project_id.decode()}:{trace_id.decode()}", s).

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: src/sentry/spans/buffer.py#L956 Potential issue: The `_get_payload_key` method expects two string arguments: `project_and_trace` and `span_id`. However, in the `done_flush_segments` method, it is called with three separate `bytes` arguments: `project_id`, `trace_id`, and `s`. This mismatch in argument count and types will cause a `TypeError` when the cleanup phase of segment flushing is executed with the `write_to_payload_set` feature enabled.

_{Did we get this right? 👍 / 👎 to inform future reviews.}

sentry · 2026-02-06T21:44:09Z

src/sentry/spans/buffer.py


+        if write_to_payload_set:
+            if (
+                not self.add_buffer_payload_sha


Bug: The add_buffer_payload_sha attribute is accessed before it is initialized in __init__, which will cause an AttributeError.
_{Severity: CRITICAL}

Suggested Fix

Initialize the attribute in the __init__ method by adding the line self.add_buffer_payload_sha: str | None = None, consistent with how similar attributes like add_buffer_sha are handled.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: src/sentry/spans/buffer.py#L609 Potential issue: The `SpansBuffer` class `__init__` method does not initialize the `add_buffer_payload_sha` attribute. However, this attribute is accessed within the `_ensure_scripts` method when the `write_to_payload_set` feature is enabled. This will lead to an `AttributeError` when `process_spans` is called, as it tries to access an attribute that has not been defined on the object instance.

_{Did we get this right? 👍 / 👎 to inform future reviews.}

cursor

Cursor Bugbot has reviewed your changes and found 5 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-02-06T21:48:08Z

src/sentry/spans/buffer.py

+                        decompressed_spans.extend(self._decompress_batch(span_data))
+
+                if payload_keys:
+                    payloads = self.client.mget(*payload_keys)


Variable shadowing overwrites payloads dict with list

High Severity

The variable payloads is defined at line 795 as a dictionary mapping segment keys to lists of bytes. Line 824 reassigns payloads to the result of mget(), which is a list. This breaks lines 832 and 836 which expect payloads to remain a dictionary and attempt del payloads[key] and payloads[key].extend(...). When payload keys are fetched, this causes a TypeError.

Additional Locations (1)

src/sentry/spans/buffer.py#L831-L836

cursor · 2026-02-06T21:48:09Z

src/sentry/spans/buffer.py

+                        p.delete(b"span-buf:hrs:" + payload_set_key)
+                        p.delete(b"span-buf:ic:" + payload_set_key)
+                        p.delete(b"span-buf:ibc:" + payload_set_key)
+                        p.unlink(payload_set_key)


Missing payload_set_key in queue zrem call

Medium Severity

The zrem call only removes zset_key and set_key from the flush queue, but not payload_set_key. When reading from payload sets, the queue contains payload_set_key entries, so they won't be removed during cleanup, leading to orphaned queue entries that could cause duplicate flush attempts.

cursor · 2026-02-06T21:48:09Z

src/sentry/spans/buffer.py

                    span_data = scan_value[0] if isinstance(scan_value, tuple) else scan_value
-                    decompressed_spans.extend(self._decompress_batch(span_data))
+                    # Payload keys are prefixed with span-buf:p:ld: and have to be fetched separately
+                    if span_data.starswith("span-buf:p:ld:"):


Bytes vs string mismatch in startswith prefix check

High Severity

The comparison uses a string literal "span-buf:p:ld:" as the prefix, but span_data is bytes from Redis. In Python 3, bytes.startswith(str) raises a TypeError. The prefix needs to be bytes: b"span-buf:p:ld:".

cursor · 2026-02-06T21:48:09Z

src/sentry/spans/buffer.py

+
+                if payload_keys:
+                    payloads = self.client.mget(*payload_keys)
+                    decompressed_spans.extend(map(self._decompressed_batch, payloads))


map results not flattened before extend call

High Severity

The _decompress_batch method returns a list[bytes], so map(...) produces an iterator of lists. Using extend with this iterator adds each list as an element to decompressed_spans rather than flattening the bytes. The results need to be flattened, e.g., with itertools.chain.from_iterable.

cursor · 2026-02-06T21:48:09Z

src/sentry/spans/buffer.py

+                            payload_key = self._get_payload_key(project_and_trace, parent_span_id)
+                            compressed = prepared.keys()[0]  # Compressed payloads only have one key
+                            if payload_cutoff_size > 0 and len(compressed) > payload_cutoff_size:
+                                p.set(payload_key, compressed)


Missing TTL on payload keys causes resource leak

Medium Severity

The p.set(payload_key, compressed) call creates a Redis key without any TTL. Other span buffer keys get TTL set either in the Lua scripts (via expire calls) or explicitly. If a segment is never flushed or cleanup fails, these payload keys will persist in Redis indefinitely, causing a resource leak. The call needs to include a TTL, e.g., using p.setex(payload_key, redis_ttl, compressed).

evanh requested review from a team as code owners February 6, 2026 21:41

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Feb 6, 2026

vercel bot deployed to Preview February 6, 2026 21:43 View deployment

sentry bot reviewed Feb 6, 2026

View reviewed changes

cursor bot reviewed Feb 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(span-buffer): Add a new write/read option for removing payloads from sets#107809

feat(span-buffer): Add a new write/read option for removing payloads from sets#107809
evanh wants to merge 1 commit intomasterfrom
evanh/feat/store-payloads-in-separate-keys

evanh commented Feb 6, 2026

Uh oh!

sentry bot Feb 6, 2026

Uh oh!

sentry bot Feb 6, 2026

Uh oh!

sentry bot Feb 6, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Feb 6, 2026

Uh oh!

cursor bot Feb 6, 2026

Uh oh!

cursor bot Feb 6, 2026

Uh oh!

cursor bot Feb 6, 2026

Uh oh!

cursor bot Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

evanh commented Feb 6, 2026

Uh oh!

sentry bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

sentry bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

sentry bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 6, 2026

Choose a reason for hiding this comment

Variable shadowing overwrites payloads dict with list

Uh oh!

cursor bot Feb 6, 2026

Choose a reason for hiding this comment

Missing payload_set_key in queue zrem call

Uh oh!

cursor bot Feb 6, 2026

Choose a reason for hiding this comment

Bytes vs string mismatch in startswith prefix check

Uh oh!

cursor bot Feb 6, 2026

Choose a reason for hiding this comment

map results not flattened before extend call

Uh oh!

cursor bot Feb 6, 2026

Choose a reason for hiding this comment

Missing TTL on payload keys causes resource leak

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Variable shadowing overwrites `payloads` dict with list

Missing `payload_set_key` in queue `zrem` call

Bytes vs string mismatch in `startswith` prefix check

`map` results not flattened before `extend` call