fix: enable SDK-level retry for Slack 503s and extract full message content#7
Open
braghettos wants to merge 1 commit intokagent-dev:mainfrom
Open
Conversation
…ontent ## Problem The Slack SDK's default retry configuration only handles TCP connection errors (AsyncConnectionErrorRetryHandler, 1 retry). HTTP 503 responses — which Slack returns transiently under load — are treated as final results and immediately raise SlackApiError. This causes chat_postMessage to fail intermittently with: Error: HTTP Error 503: Network communication error: All connection attempts failed Additionally, the handle_app_mention handler only reads event["text"] (the plain-text fallback), missing rich content from blocks and attachments that webhook integrations like HyperDX include. ## Root cause The SDK ships AsyncServerErrorRetryHandler (retries HTTP 500/503) but does NOT enable it by default. The existing AsyncConnectionErrorRetryHandler only catches aiohttp connection exceptions (ServerConnectionError, ClientOSError), not HTTP status codes. ## Fix **main.py:** - Create AsyncWebClient with three explicit retry handlers: - AsyncServerErrorRetryHandler (3 retries) — HTTP 500/503 - AsyncConnectionErrorRetryHandler (3 retries) — TCP failures - AsyncRateLimitErrorRetryHandler (2 retries) — HTTP 429 - Inject the configured client into AsyncApp **handlers.py:** - Add extract_full_message_text() to merge text from event["text"], event["blocks"], and event["attachments"] — gives downstream agents the complete alert context - Reuse a single httpx.AsyncClient for A2A calls (prevents connection pool leak from creating a new client per request) - Clean up code structure, use module-level logger, add type hints Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
AsyncServerErrorRetryHandleron theAsyncWebClient— the Slack SDK ships this handler but does not enable it by default, causing every transient HTTP 503 from Slack to hard-fail withSlackApiErrorAsyncRateLimitErrorRetryHandlerfor HTTP 429 and bumpAsyncConnectionErrorRetryHandlerto 3 retries (default is 1)extract_full_message_text()to read full event content fromblocksandattachments, not just the plain-textevent["text"]fallback — critical for webhook integrations (e.g. HyperDX, PagerDuty) that embed alert details in block kithttpx.AsyncClientfor A2A calls instead of creating one per request (prevents TCP connection pool leak)Problem
When Slack returns a transient 503, the SDK's default
AsyncConnectionErrorRetryHandlerdoes not catch it because 503 is an HTTP-level status code, not a TCP connection exception (ServerConnectionError,ClientOSError). The SDK's built-inAsyncServerErrorRetryHandlerhandles exactly this case (retries on 500/503) but is not included inasync_default_handlers().This causes intermittent failures like:
Changes
main.pyAsyncWebClientwith explicit retry handlers and inject intoAsyncAppAsyncServerErrorRetryHandler(max_retry_count=3)— HTTP 500/503AsyncConnectionErrorRetryHandler(max_retry_count=3)— TCP failuresAsyncRateLimitErrorRetryHandler(max_retry_count=2)— HTTP 429handlers.pyextract_full_message_text()— merges content fromevent["text"],event["blocks"], andevent["attachments"]handle_app_mentionnow passes the full alert context to the A2A agenthttpx.AsyncClientreuse (was creating a new one per invocation)Test plan
extract_full_message_text()correctly extracts HyperDX webhook alert content from blocks/attachments🤖 Generated with Claude Code