Skip to content

Additional retry protection throughout sync pipeline#342

Open
bjester wants to merge 5 commits into
learningequality:release-v0.8.xfrom
bjester:retry-defense
Open

Additional retry protection throughout sync pipeline#342
bjester wants to merge 5 commits into
learningequality:release-v0.8.xfrom
bjester:retry-defense

Conversation

@bjester

@bjester bjester commented Jun 30, 2026

Copy link
Copy Markdown
Member

Summary

  • Updates urllib3 imports to import directly from the package instead of through requests.packages proxy, because it stopped bundling the dependency nearly a decade ago. Dependency restrictions were added for requests and urllib3 to enforce compatible versions for Morango's use. Minimum requests version matches pinned version in Kolibri 0.19.x
  • Updates buffer creation during transfer to be idempotent-- the same buffer chunk can be pulled or pushed multiple times without issue or inflating record transfer count
  • Makes the sync session requests to close sync or transfer sessions passive to 404s. A 404 would likely indicate it has already happened due to the active=True filter.
  • Adds retry behavior to the SessionWrapper that utilizes the Retry utility, if configured, to retry low-level connection issues that are not automatically retried by urllib3.
  • Refactors bandwidth tracking in the SessionWrapper to be more self-contained

TODO

  • Have tests been written for the new code?
  • Has changelog been written/updated?
  • New dependencies (if any) added to requirements file

Reviewer guidance

Morango integration tests in Kolibri are passing locally with these changes.

Careful attention was given to version specific gotchas and to ensure support for Kolibri's supported python versions. This PR also focuses on lower-level retries-- retrying complex behaviors like certificate requests, which would require new nonces, are not addressed. The transfer session API is passive to one already existing, to support resumption, as long as the request is the same. Overall, the most important area for retries is the buffer transfer since it would involve many requests, increasing the likelihood of a failure.

I've added comments for the reasoning of specific changes.

Issues addressed

Closes #339

AI Usage

AI was used to jumpstart the changes, although its approach got convoluted quickly. So most of this is handcrafted to integrate with requests and urllib3 as smoothly as possible. AI was used to create some tests and to keep tests up-to-date through several iterations of this work.

@bjester bjester requested a review from rtibblesbot June 30, 2026 21:52
rtibblesbot

This comment was marked as resolved.

@bjester bjester requested a review from rtibblesbot July 1, 2026 00:24
rtibblesbot

This comment was marked as resolved.

@bjester bjester requested a review from rtibblesbot July 1, 2026 15:16
rtibblesbot

This comment was marked as resolved.

@bjester bjester requested a review from rtibblesbot July 1, 2026 20:05
rtibblesbot

This comment was marked as resolved.

@bjester bjester requested review from rtibblesbot and removed request for rtibblesbot July 1, 2026 21:02

@rtibblesbot rtibblesbot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review at the same HEAD SHA (ddb551fc) as the prior COMMENTED verdict. 1 of 2 prior findings resolved; 1 still open (see below). One new suggestion from this pass. CI: all real test suites pass; the only failing check (Check if author is contributor) is unrelated repo automation.

  • suggestionsyncsession.py:504 — still missing a test_close_transfer_session_raises_500 symmetric to test_close_sync_session_raises_500 (see inline).
  • suggestionsession.py:240request() now catches bare Exception instead of exceptions.RequestException (see inline).
Prior-finding status

RESOLVED — morango/sync/session.py:222 — request() docstring vs bare-Exception catch mismatch
UNADDRESSED — morango/sync/syncsession.py:504 — missing test_close_transfer_session_raises_500


@rtibblesbot's comments are generated by an LLM, and should be evaluated accordingly

How was this generated?

Compared the current PR state against findings from a prior review:

  • Retrieved prior bot reviews via the GitHub API
  • Classified each prior finding as RESOLVED, UNADDRESSED, ACKNOWLEDGED, or CONTESTED
  • Only raised NEW findings for newly introduced code
  • Ran the same phased review passes as a first review (core, frontend/backend lenses, manual QA when required)
  • Synthesized one review from the passes and chose the verdict from the findings, CI status, and QA evidence

Comment thread morango/sync/syncsession.py
Comment thread morango/sync/session.py
Comment thread morango/sync/utils.py
:return: The Response
"""
return context.connection._close_transfer_session(context.transfer_session)
context.connection._close_transfer_session(context.transfer_session)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The response object wasn't really necessary, so this no longer returns it.

Comment on lines -1770 to +1766
return remote_status
self.close_transfer_session(context)
return transfer_statuses.COMPLETED

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this code, we call raise_for_status() so this seemed unnecessary and because we may get a 404 with a retry, we wouldn't have an accurate response. This now relies on the controller to set the status.

).delete()

# run the validation logic to ensure no errors were returned
errors = validate_and_create_buffer_data(

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function didn't actually return anything.

Comment thread setup.py
"django-ipware==4.0.2",
"requests",
"requests>=2.27.1",
"urllib3>=1.26.0",

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed to pin above this because of an API change to Retry

Comment thread morango/sync/utils.py
transfer_session.records_transferred += len(data)
deleted_buffers, _ = Buffer.objects.filter(
transfer_session=transfer_session,
model_uuid__in=[record["model_uuid"] for record in data]

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default chunk size is 500. I think this should be okay?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed — the delete/filter queries are scoped to model_uuid__in=[record["model_uuid"] for record in data], i.e. bounded by the chunk itself, not the full transfer session's accumulated buffer set. So each chunk's DB work is O(chunk_size) regardless of how large the overall transfer gets — the O(n²) risk I flagged is gone independent of what the chunk size default is. Thanks for the fix.

Comment thread morango/sync/session.py

try:
# may raise if retries have been exhausted
retries = retries.increment(

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

increment returns a new Retry object every time, hence why we pass it around.

@bjester bjester requested a review from rtibbles July 1, 2026 22:46
@rtibbles rtibbles self-assigned this Jul 2, 2026

@rtibbles rtibbles left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the changes made here - I get the intention, and also it seems that they are all targeted at the same goal as the issue. The proof here will not come from my code review, so nothing else to say but approve!

except Exception as e:
# always log the error itself
logger.error(e)
logger.exception(e)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment thread morango/sync/session.py
)


class ContextualRetryHTTPAdapter(HTTPAdapter):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice - I think I came across this recently somewhere either in Kolibri or ricecooker and had wondered why we hadn't used this approach before. I approve!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants