perf(eap-items): Index-prune, dedupe, and sort sentry.timestamp on the raw column by phacops · Pull Request #8018 · getsentry/snuba

phacops · 2026-06-11T00:05:24Z

EAP item queries put sentry.timestamp on the read path in three ways — as a range filter, as a redundant copy of the mandatory time-range bound, and as a sort key. All three went through attribute_key_to_expression, which maps sentry.timestamp to CAST(timestamp, <type>). Since timestamp is in the primary key (organization_id, project_id, item_type, timestamp) and the partition key toMonday(timestamp), the CAST defeats granule pruning, partition pruning, and optimize_read_in_order alike — showing up as one shard being consistently slow on skewed data.

Range filters compare the raw column

Range comparisons (<, <=, >, >=) on sentry.timestamp are now rewritten to target the raw DateTime column:

-- before
(CAST(timestamp, 'Float64') AS `sentry.timestamp_TYPE_DOUBLE`) >= 1781040732

-- after
timestamp >= toDateTime('2026-06-09 21:32:12')

The column is second-resolution DateTime, so CAST(timestamp, 'Float64') already yields whole seconds — comparing against toDateTime(value) is equivalent for the integer unix-second values clients send. EQUALS/IN and SELECTing sentry.timestamp (which must still return a number) keep the CAST path.

Dedupe of identical conditions

The rewrite reuses the same canonical toDateTime('YYYY-MM-DD HH:MM:SS') form as the mandatory time-range bound, so a client filter whose bounds equal the request window becomes a byte-identical expression. treeify_or_and_conditions now drops structurally-identical conjuncts from the top-level WHERE AND (A AND A == A) before re-nesting, collapsing those duplicates. Distinct ranges are left untouched — they're all rewritten to the raw column and kept, and the AND of them yields the tightest window. The pass is a no-op (no rebuild) when there are no duplicates.

Nothing downstream dedupes conditions (the ClickHouse formatter joins AND args verbatim), so this is the only place the duplicate gets removed. Because every RPC endpoint calls treeify_or_and_conditions, they all get both the raw-column rewrite (shared filter path) and the dedupe.

ORDER BY targets the raw column

sentry.timestamp sorts had the same problem. The trace-item-table resolver already expanded a first-position, single-project, no-group-by timestamp sort into the full table sort key for optimize_read_in_order, but every other timestamp sort (secondary sort key, alongside a group by, or across multiple projects) fell through to attribute_key_to_expression and ordered by CAST(timestamp, <type>). _convert_order_by now orders by the raw timestamp column for sentry.timestamp in that branch too. The cast is monotonic, so the result order is identical, but ClickHouse can read in primary-key order.

Tests

Tests build the query AST via build_query and assert on the generated WHERE/ORDER BY clauses (no live ClickHouse needed): equal ranges collapse to one bound, different ranges keep both, 3+ distinct bounds are all kept and never use a Float cast, and sentry.timestamp sorts emit the raw timestamp column rather than a CAST.

`sentry.timestamp` is a normalized column that attribute_key_to_expression maps to `CAST(timestamp, 'Float64')`. Wrapping the primary-key/partition column in a CAST stops ClickHouse from using it for granule and partition pruning, so client-supplied range filters on sentry.timestamp degenerate into per-row scans on top of the mandatory time-range condition already applied on the raw column. Rewrite range comparisons (<, <=, >, >=) on sentry.timestamp to compare the raw DateTime `timestamp` column against toDateTime(value) so the condition is index- and partition-prunable and folds with the mandatory time bounds. This mirrors the pattern the trace-item-table resolver already uses to enable optimize_read_in_order. EQUALS/IN and SELECTing sentry.timestamp are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…nge bound Builds on the sentry.timestamp range-filter rewrite: emit the rewritten bound using the same canonical toDateTime('YYYY-MM-DD HH:MM:SS') form as the mandatory time-range condition, so a client filter whose bounds equal the request window produces a byte-identical expression. treeify_or_and_conditions now drops structurally-identical conjuncts from the top-level WHERE AND (A AND A == A) before re-nesting. Nothing downstream in the query pipeline dedupes conditions (the ClickHouse formatter joins AND args verbatim), so without this the duplicate bound would reach ClickHouse. Because every RPC endpoint treeifies, they all get the dedupe. Distinct ranges are left untouched -- the AND of them yields the tightest window -- and the pass is a no-op (no rebuild) when there are no duplicates. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 2053538. Configure here.}

cursor · 2026-06-12T22:59:29Z

+            scalar_value = _scalar_value(v)
+            assert isinstance(scalar_value, (int, float))
+            raw_timestamp = column("timestamp")
+            raw_value = timestamp_seconds_to_datetime_literal(int(scalar_value))


Fractional timestamp ranges truncated wrongly

Medium Severity

For sentry.timestamp range filters with val_float or val_double, the rewrite passes int(scalar_value) into timestamp_seconds_to_datetime_literal. Truncating toward zero changes the bound versus comparing CAST(timestamp, 'Float64') to the full float, so some rows match the old query but not the new one (and vice versa).

^{Reviewed by Cursor Bugbot for commit 2053538. Configure here.}

sentry · 2026-06-12T23:02:56Z

+    conjuncts = flatten_and(condition)
+    deduped: list[Expression] = []
+    for conjunct in conjuncts:
+        if conjunct not in deduped:


Bug: Timestamp filters with floating-point values are silently truncated to integers, losing sub-second precision and potentially altering query boundaries.
_{Severity: MEDIUM}

Suggested Fix

Consider either supporting sub-second precision by not casting to int, or add validation to reject or warn when a client sends a fractional timestamp. If truncation is the desired behavior, add a comment explaining why it is safe and intentional.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: snuba/web/rpc/common/common.py#L354 Potential issue: When processing timestamp filters, floating-point values from `val_float` or `val_double` are explicitly cast to integers using `int(scalar_value)`. This truncates any sub-second precision from the timestamp. A client sending a filter with a fractional timestamp, such as `sentry.timestamp >= 1781040732.7`, will have its filter silently rewritten to use the truncated integer value, `1781040732`. This can lead to unexpected query results, as the filter boundaries are shifted by up to one second without any warning or error to the client.

_{Did we get this right? 👍 / 👎 to inform future reviews.}

…orts When a client orders by sentry.timestamp outside the first-position single-project no-groupby fast path, _convert_order_by mapped it through attribute_key_to_expression, which emits CAST(timestamp, <type>). That cast stops ClickHouse from sorting against the primary-key timestamp column, the same way the CAST defeated index pruning for range filters. Order by the raw DateTime timestamp column instead. The cast is monotonic, so the result order is identical, but ClickHouse can now read in primary-key order. This covers timestamp ordering as a secondary key, alongside group bys, or across multiple projects. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/_OVFmpNByU-pVFTu3TbDsraFzPaFx3DrJX8_H8CH0xk

phacops and others added 2 commits June 10, 2026 17:04

phacops changed the title ~~perf(eap-items): Make sentry.timestamp range filters index-prunable~~ perf(eap-items): Index-prune and dedupe sentry.timestamp range filters Jun 11, 2026

phacops marked this pull request as ready for review June 12, 2026 22:58

phacops requested review from a team as code owners June 12, 2026 22:58

cursor Bot reviewed Jun 12, 2026

View reviewed changes

sentry Bot reviewed Jun 12, 2026

View reviewed changes

phacops changed the title ~~perf(eap-items): Index-prune and dedupe sentry.timestamp range filters~~ perf(eap-items): Index-prune, dedupe, and sort sentry.timestamp on the raw column Jun 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(eap-items): Index-prune, dedupe, and sort sentry.timestamp on the raw column#8018

perf(eap-items): Index-prune, dedupe, and sort sentry.timestamp on the raw column#8018
phacops wants to merge 3 commits into
masterfrom
feat/eap-items-timestamp-filter-index

phacops commented Jun 11, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 12, 2026

Uh oh!

sentry Bot Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

phacops commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 12, 2026

Choose a reason for hiding this comment

Fractional timestamp ranges truncated wrongly

Uh oh!

sentry Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

phacops commented Jun 11, 2026 •

edited

Loading