Skip to content

Fix dead retry loop in asyncio_helper._process_request#2599

Open
hsn8086 wants to merge 1 commit into
eternnoir:masterfrom
hsn8086:fix-asyncio-helper-retries
Open

Fix dead retry loop in asyncio_helper._process_request#2599
hsn8086 wants to merge 1 commit into
eternnoir:masterfrom
hsn8086:fix-asyncio-helper-retries

Conversation

@hsn8086

@hsn8086 hsn8086 commented Jun 11, 2026

Copy link
Copy Markdown

Description

In asyncio_helper._process_request the raise RequestTimeout(...) statement sits inside the retry while loop:

while not got_result and current_try<MAX_RETRIES-1:
current_try +=1
try:
async with session.request(method=method, url=API_URL.format(token, url), data=params, timeout=timeout, proxy=proxy) as resp:
got_result = True
logger.debug("Request: method={0} url={1} params={2} files={3} request_timeout={4} current_try={5}".format(method, url, params, files, request_timeout, current_try).replace(token, token.split(':')[0] + ":{TOKEN}"))
json_result = await _check_result(url, resp)
if json_result:
return json_result['result']
except (ApiTelegramException,ApiInvalidJSONException, ApiHTTPException) as e:
raise e
except aiohttp.ClientError as e:
logger.error('Aiohttp ClientError: {0}'.format(e.__class__.__name__))
except Exception as e:
logger.error(f'Unknown error: {e.__class__.__name__}')
if not got_result:
raise RequestTimeout("Request timeout. Request: method={0} url={1} params={2} files={3} request_timeout={4}".format(method, url, params, files, request_timeout, current_try))

As a result, the first network error (aiohttp.ClientError, timeout, etc.) always aborts the call immediately and MAX_RETRIES has no effect — the loop can never reach a second iteration. Async bots therefore lose outgoing calls (send_message, ...) on any transient connection hiccup.

This PR aligns the async helper with the opt-in retry convention that already exists in telebot.apihelper (RETRY_ON_ERROR, RETRY_TIMEOUT):

  • RETRY_ON_ERROR = False (default): behaviour is unchanged — fail fast on the first network error.
  • RETRY_ON_ERROR = True: network errors are retried up to MAX_RETRIES times with RETRY_TIMEOUT seconds between attempts; RequestTimeout is raised only after all attempts are exhausted.
  • Errors reported by the Bot API itself (ApiTelegramException, ApiInvalidJSONException, ApiHTTPException) keep propagating immediately, as before.

Note for reviewers: when retries are enabled, a request whose response was lost mid-flight can be repeated, i.e. an API call may be executed twice (e.g. a message sent twice). This matches the semantics of the existing sync retry engine, and is one more reason the flag stays off by default.

Describe your tests

How did you test your change?

  • cd tests && py.test: 58 passed, 72 skipped. Master baseline on the same machine: 53 passed, 72 skipped — the 5 extra passes are the new unit tests added here, no regressions.
  • New self-contained unit tests in tests/test_asyncio_helper.py (no TOKEN needed, network stubbed): success path, default fail-fast, recovery after transient errors, attempt count == MAX_RETRIES, immediate propagation of API errors.
  • Live test against a local TCP server that drops the first N connections (real aiohttp I/O): with retries enabled the call recovers after 2 dropped connections (3 connection attempts observed on the server, RETRY_TIMEOUT sleeps in between), the exhaustion path raises RequestTimeout after exactly MAX_RETRIES attempts, and the default path still fails fast with a single attempt.
  • Live getMe against api.telegram.org with the patched module and default flags — works unchanged.

Python version: 3.12.3

OS: Ubuntu Linux

Checklist:

  • I added/edited example on new feature/change (if exists) — there is no example covering the sync RETRY_ON_ERROR/RETRY_TIMEOUT flags either; happy to add one if desired
  • My changes won't break backward compatibility — RETRY_ON_ERROR defaults to False, preserving the current fail-fast behaviour exactly
  • I made changes both for sync and async — sync (apihelper) already has working opt-in retries; this brings the async helper to parity

The RequestTimeout raise was inside the retry while-loop, so the first
network error always aborted the call and MAX_RETRIES had no effect.

Align the async helper with the opt-in retry convention of
telebot.apihelper: add RETRY_ON_ERROR / RETRY_TIMEOUT module flags
(default off, preserving current behaviour). When enabled, network
errors are retried up to MAX_RETRIES times with RETRY_TIMEOUT seconds
between attempts, and RequestTimeout is raised only after all attempts
are exhausted. Errors reported by the Bot API itself
(ApiTelegramException etc.) keep propagating immediately.

Also covers the helper with self-contained unit tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant