fix: enable SDK-level retry for Slack 503s and extract full message content by braghettos · Pull Request #7 · kagent-dev/a2a-slack-template

braghettos · 2026-03-21T19:04:40Z

Summary

Enable AsyncServerErrorRetryHandler on the AsyncWebClient — the Slack SDK ships this handler but does not enable it by default, causing every transient HTTP 503 from Slack to hard-fail with SlackApiError
Add AsyncRateLimitErrorRetryHandler for HTTP 429 and bump AsyncConnectionErrorRetryHandler to 3 retries (default is 1)
Add extract_full_message_text() to read full event content from blocks and attachments, not just the plain-text event["text"] fallback — critical for webhook integrations (e.g. HyperDX, PagerDuty) that embed alert details in block kit
Reuse a single httpx.AsyncClient for A2A calls instead of creating one per request (prevents TCP connection pool leak)

Problem

When Slack returns a transient 503, the SDK's default AsyncConnectionErrorRetryHandler does not catch it because 503 is an HTTP-level status code, not a TCP connection exception (ServerConnectionError, ClientOSError). The SDK's built-in AsyncServerErrorRetryHandler handles exactly this case (retries on 500/503) but is not included in async_default_handlers().

This causes intermittent failures like:

Error: HTTP Error 503: Network communication error: All connection attempts failed

Changes

`main.py`

Construct AsyncWebClient with explicit retry handlers and inject into AsyncApp
AsyncServerErrorRetryHandler(max_retry_count=3) — HTTP 500/503
AsyncConnectionErrorRetryHandler(max_retry_count=3) — TCP failures
AsyncRateLimitErrorRetryHandler(max_retry_count=2) — HTTP 429

`handlers.py`

New extract_full_message_text() — merges content from event["text"], event["blocks"], and event["attachments"]
handle_app_mention now passes the full alert context to the A2A agent
Module-level httpx.AsyncClient reuse (was creating a new one per invocation)
Code cleanup: type hints, module-level logger, removed unused imports

Test plan

Verified all 3 retry handlers are active in running pod
Confirmed extract_full_message_text() correctly extracts HyperDX webhook alert content from blocks/attachments
Tested outbound Slack API connectivity with 10 rapid calls (all 200 OK)
Monitor for 503 errors over next 24h — should be silently retried by SDK

🤖 Generated with Claude Code

…ontent ## Problem The Slack SDK's default retry configuration only handles TCP connection errors (AsyncConnectionErrorRetryHandler, 1 retry). HTTP 503 responses — which Slack returns transiently under load — are treated as final results and immediately raise SlackApiError. This causes chat_postMessage to fail intermittently with: Error: HTTP Error 503: Network communication error: All connection attempts failed Additionally, the handle_app_mention handler only reads event["text"] (the plain-text fallback), missing rich content from blocks and attachments that webhook integrations like HyperDX include. ## Root cause The SDK ships AsyncServerErrorRetryHandler (retries HTTP 500/503) but does NOT enable it by default. The existing AsyncConnectionErrorRetryHandler only catches aiohttp connection exceptions (ServerConnectionError, ClientOSError), not HTTP status codes. ## Fix **main.py:** - Create AsyncWebClient with three explicit retry handlers: - AsyncServerErrorRetryHandler (3 retries) — HTTP 500/503 - AsyncConnectionErrorRetryHandler (3 retries) — TCP failures - AsyncRateLimitErrorRetryHandler (2 retries) — HTTP 429 - Inject the configured client into AsyncApp **handlers.py:** - Add extract_full_message_text() to merge text from event["text"], event["blocks"], and event["attachments"] — gives downstream agents the complete alert context - Reuse a single httpx.AsyncClient for A2A calls (prevents connection pool leak from creating a new client per request) - Clean up code structure, use module-level logger, add type hints Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: enable SDK-level retry for Slack 503s and extract full message content#7

fix: enable SDK-level retry for Slack 503s and extract full message content#7
braghettos wants to merge 1 commit intokagent-dev:mainfrom
braghettos:fix/slack-503-retry-and-message-extraction

braghettos commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

braghettos commented Mar 21, 2026

Summary

Problem

Changes

main.py

handlers.py

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`main.py`

`handlers.py`