Skip to content

fix(subscriptions): Skip non-retryable query exceptions in executor poll#8039

Open
phacops wants to merge 1 commit into
masterfrom
fix/snuba-9e1-unbound-transformed-message
Open

fix(subscriptions): Skip non-retryable query exceptions in executor poll#8039
phacops wants to merge 1 commit into
masterfrom
fix/snuba-9e1-unbound-transformed-message

Conversation

@phacops

@phacops phacops commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes the UnboundLocalError: cannot access local variable 'transformed_message' crash in the subscriptions executor consumer, reported in SNUBA-9E1 (2632 occurrences, EAP items subscriptions executor).

In ExecuteQuery.poll(), the except QueryException block only handled exceptions whose __cause__ was a ClickhouseError:

  • a retryable ClickHouse error → raise SubscriptionQueryException (crash & retry)
  • a non-retryable ClickHouse error → log + metric, then fall through
  • anything else (e.g. QueryTooLongException) → fall through with no handling

In both fall-through cases, execution reached self.__next_step.submit(transformed_message) while transformed_message had never been assigned (the assignment is what raised), producing the UnboundLocalError. The reported events are QueryTooLongException ("query is 163075 bytes, too long for ClickHouse"), which is non-retryable and not a ClickhouseError, so it hit the unhandled path.

Fix

Treat every non-retryable failure uniformly — non-retryable ClickHouse error codes and non-ClickHouse causes (like QueryTooLongException) alike: log the exception, emit the subscription_executor_nonretryable_error metric, and continue to skip the message instead of submitting an unassigned value downstream. Only retryable ClickHouse errors are re-raised so the consumer crashes and retries.

Testing

  • Added test_poll_skips_non_retryable_query_exception reproducing the SNUBA-9E1 scenario (a QueryException caused by QueryTooLongException): asserts poll() does not raise, does not call next_step.submit, and emits the non-retryable error metric.
  • Existing test_execute_query_exception (retryable error → SubscriptionQueryException) still passes.

Fixes SNUBA-9E1

🤖 Generated with Claude Code

Agent transcript: https://claudescope.sentry.dev/share/SkrksjFrrBLx2z5LtHtzExnDKAalhvyk2gd41CZYyy0

The executor consumer's `poll()` only handled `QueryException`s whose
cause was a `ClickhouseError`. When the cause was anything else (e.g.
`QueryTooLongException`), neither branch ran, nothing was re-raised, and
execution fell through to `self.__next_step.submit(transformed_message)`
where `transformed_message` had never been assigned, raising
`UnboundLocalError`. The same fall-through happened for non-retryable
ClickHouse error codes.

Treat every non-retryable failure (non-retryable ClickHouse codes and
non-ClickHouse causes alike) uniformly: log it, emit the
`subscription_executor_nonretryable_error` metric, and skip the message
instead of submitting an unassigned value downstream. Only retryable
ClickHouse errors are re-raised.

Fixes SNUBA-9E1

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@phacops phacops requested a review from a team as a code owner June 16, 2026 05:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant