Additional retry protection throughout sync pipeline by bjester · Pull Request #342 · learningequality/morango

bjester · 2026-06-30T21:52:02Z

Summary

Updates urllib3 imports to import directly from the package instead of through requests.packages proxy, because it stopped bundling the dependency nearly a decade ago. Dependency restrictions were added for requests and urllib3 to enforce compatible versions for Morango's use. Minimum requests version matches pinned version in Kolibri 0.19.x
Updates buffer creation during transfer to be idempotent-- the same buffer chunk can be pulled or pushed multiple times without issue or inflating record transfer count
Makes the sync session requests to close sync or transfer sessions passive to 404s. A 404 would likely indicate it has already happened due to the active=True filter.
Adds retry behavior to the SessionWrapper that utilizes the Retry utility, if configured, to retry low-level connection issues that are not automatically retried by urllib3.
Refactors bandwidth tracking in the SessionWrapper to be more self-contained

TODO

Have tests been written for the new code?
Has changelog been written/updated?
New dependencies (if any) added to requirements file

Reviewer guidance

Morango integration tests in Kolibri are passing locally with these changes.

Careful attention was given to version specific gotchas and to ensure support for Kolibri's supported python versions. This PR also focuses on lower-level retries-- retrying complex behaviors like certificate requests, which would require new nonces, are not addressed. The transfer session API is passive to one already existing, to support resumption, as long as the request is the same. Overall, the most important area for retries is the buffer transfer since it would involve many requests, increasing the likelihood of a failure.

I've added comments for the reasoning of specific changes.

Issues addressed

Closes #339

AI Usage

AI was used to jumpstart the changes, although its approach got convoluted quickly. So most of this is handcrafted to integrate with requests and urllib3 as smoothly as possible. AI was used to create some tests and to keep tests up-to-date through several iterations of this work.

…llib3's retries

rtibblesbot

Re-review at the same HEAD SHA (ddb551fc) as the prior COMMENTED verdict. 1 of 2 prior findings resolved; 1 still open (see below). One new suggestion from this pass. CI: all real test suites pass; the only failing check (Check if author is contributor) is unrelated repo automation.

suggestion — syncsession.py:504 — still missing a test_close_transfer_session_raises_500 symmetric to test_close_sync_session_raises_500 (see inline).
suggestion — session.py:240 — request() now catches bare Exception instead of exceptions.RequestException (see inline).

Prior-finding status

RESOLVED — morango/sync/session.py:222 — request() docstring vs bare-Exception catch mismatch
UNADDRESSED — morango/sync/syncsession.py:504 — missing test_close_transfer_session_raises_500

@rtibblesbot's comments are generated by an LLM, and should be evaluated accordingly

How was this generated?

Compared the current PR state against findings from a prior review:

Retrieved prior bot reviews via the GitHub API
Classified each prior finding as RESOLVED, UNADDRESSED, ACKNOWLEDGED, or CONTESTED
Only raised NEW findings for newly introduced code
Ran the same phased review passes as a first review (core, frontend/backend lenses, manual QA when required)
Synthesized one review from the passes and chose the verdict from the findings, CI status, and QA evidence

bjester · 2026-07-01T22:38:00Z

-        :return: The Response
        """
-        return context.connection._close_transfer_session(context.transfer_session)
+        context.connection._close_transfer_session(context.transfer_session)


The response object wasn't really necessary, so this no longer returns it.

bjester · 2026-07-01T22:39:59Z

-        return remote_status
+        self.close_transfer_session(context)
+        return transfer_statuses.COMPLETED


Before this code, we call raise_for_status() so this seemed unnecessary and because we may get a 404 with a retry, we wouldn't have an accurate response. This now relies on the controller to set the status.

bjester · 2026-07-01T22:41:54Z

            ).delete()

-            # run the validation logic to ensure no errors were returned
-            errors = validate_and_create_buffer_data(


This function didn't actually return anything.

bjester · 2026-07-01T22:43:10Z

        "django-ipware==4.0.2",
-        "requests",
+        "requests>=2.27.1",
+        "urllib3>=1.26.0",


Needed to pin above this because of an API change to Retry

bjester · 2026-07-01T22:44:40Z

-        transfer_session.records_transferred += len(data)
+        deleted_buffers, _ = Buffer.objects.filter(
+            transfer_session=transfer_session,
+            model_uuid__in=[record["model_uuid"] for record in data]


Default chunk size is 500. I think this should be okay?

Confirmed — the delete/filter queries are scoped to model_uuid__in=[record["model_uuid"] for record in data], i.e. bounded by the chunk itself, not the full transfer session's accumulated buffer set. So each chunk's DB work is O(chunk_size) regardless of how large the overall transfer gets — the O(n²) risk I flagged is gone independent of what the chunk size default is. Thanks for the fix.

bjester · 2026-07-01T22:45:49Z

+
+        try:
+            # may raise if retries have been exhausted
+            retries = retries.increment(


increment returns a new Retry object every time, hence why we pass it around.

rtibbles

I understand the changes made here - I get the intention, and also it seems that they are all targeted at the same goal as the issue. The proof here will not come from my code review, so nothing else to say but approve!

rtibbles · 2026-07-02T21:07:56Z

        except Exception as e:
            # always log the error itself
-            logger.error(e)
+            logger.exception(e)


rtibbles · 2026-07-02T21:09:43Z

+    )
+
+
+class ContextualRetryHTTPAdapter(HTTPAdapter):


Nice - I think I came across this recently somewhere either in Kolibri or ricecooker and had wondered why we hadn't used this approach before. I approve!

bjester requested a review from rtibblesbot June 30, 2026 21:52

This comment was marked as resolved.

Sign in to view

bjester force-pushed the retry-defense branch from 30e73c4 to 52ed285 Compare July 1, 2026 00:22

bjester requested a review from rtibblesbot July 1, 2026 00:24

This comment was marked as resolved.

Sign in to view

Post 2.17 requests declares urllib3 a normal dep

73bca99

bjester force-pushed the retry-defense branch from 52ed285 to 5adb818 Compare July 1, 2026 15:14

bjester requested a review from rtibblesbot July 1, 2026 15:16

This comment was marked as resolved.

Sign in to view

bjester added 2 commits July 1, 2026 13:01

Allow repeat push/pull of buffers

d3a64a1

Ignore 404s on close/destroy requests due to possible retry

0fd9fa3

bjester force-pushed the retry-defense branch from 5adb818 to c6611a0 Compare July 1, 2026 20:03

bjester requested a review from rtibblesbot July 1, 2026 20:05

This comment was marked as resolved.

Sign in to view

bjester added 2 commits July 1, 2026 14:00

Add retry behavior for low level connection issues not captured by ur…

ea045b9

…llib3's retries

Update changelog

ddb551f

bjester force-pushed the retry-defense branch from c6611a0 to ddb551f Compare July 1, 2026 21:00

bjester requested review from rtibblesbot and removed request for rtibblesbot July 1, 2026 21:02

rtibblesbot reviewed Jul 1, 2026

View reviewed changes

Comment thread morango/sync/syncsession.py

Comment thread morango/sync/session.py

Comment thread morango/sync/utils.py

bjester commented Jul 1, 2026

View reviewed changes

bjester requested a review from rtibbles July 1, 2026 22:46

rtibbles self-assigned this Jul 2, 2026

rtibbles approved these changes Jul 2, 2026

View reviewed changes

		)


		class ContextualRetryHTTPAdapter(HTTPAdapter):

Uh oh!

Uh oh!

Conversation

bjester commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

TODO

Reviewer guidance

Issues addressed

AI Usage

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

rtibblesbot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rtibbles left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bjester commented Jun 30, 2026 •

edited

Loading