Additional retry protection throughout sync pipeline#342
Conversation
rtibblesbot
left a comment
There was a problem hiding this comment.
Re-review at the same HEAD SHA (ddb551fc) as the prior COMMENTED verdict. 1 of 2 prior findings resolved; 1 still open (see below). One new suggestion from this pass. CI: all real test suites pass; the only failing check (Check if author is contributor) is unrelated repo automation.
- suggestion —
syncsession.py:504— still missing atest_close_transfer_session_raises_500symmetric totest_close_sync_session_raises_500(see inline). - suggestion —
session.py:240—request()now catches bareExceptioninstead ofexceptions.RequestException(see inline).
Prior-finding status
RESOLVED — morango/sync/session.py:222 — request() docstring vs bare-Exception catch mismatch
UNADDRESSED — morango/sync/syncsession.py:504 — missing test_close_transfer_session_raises_500
@rtibblesbot's comments are generated by an LLM, and should be evaluated accordingly
How was this generated?
Compared the current PR state against findings from a prior review:
- Retrieved prior bot reviews via the GitHub API
- Classified each prior finding as RESOLVED, UNADDRESSED, ACKNOWLEDGED, or CONTESTED
- Only raised NEW findings for newly introduced code
- Ran the same phased review passes as a first review (core, frontend/backend lenses, manual QA when required)
- Synthesized one review from the passes and chose the verdict from the findings, CI status, and QA evidence
| :return: The Response | ||
| """ | ||
| return context.connection._close_transfer_session(context.transfer_session) | ||
| context.connection._close_transfer_session(context.transfer_session) |
There was a problem hiding this comment.
The response object wasn't really necessary, so this no longer returns it.
| return remote_status | ||
| self.close_transfer_session(context) | ||
| return transfer_statuses.COMPLETED |
There was a problem hiding this comment.
Before this code, we call raise_for_status() so this seemed unnecessary and because we may get a 404 with a retry, we wouldn't have an accurate response. This now relies on the controller to set the status.
| ).delete() | ||
|
|
||
| # run the validation logic to ensure no errors were returned | ||
| errors = validate_and_create_buffer_data( |
There was a problem hiding this comment.
This function didn't actually return anything.
| "django-ipware==4.0.2", | ||
| "requests", | ||
| "requests>=2.27.1", | ||
| "urllib3>=1.26.0", |
There was a problem hiding this comment.
Needed to pin above this because of an API change to Retry
| transfer_session.records_transferred += len(data) | ||
| deleted_buffers, _ = Buffer.objects.filter( | ||
| transfer_session=transfer_session, | ||
| model_uuid__in=[record["model_uuid"] for record in data] |
There was a problem hiding this comment.
Default chunk size is 500. I think this should be okay?
There was a problem hiding this comment.
Confirmed — the delete/filter queries are scoped to model_uuid__in=[record["model_uuid"] for record in data], i.e. bounded by the chunk itself, not the full transfer session's accumulated buffer set. So each chunk's DB work is O(chunk_size) regardless of how large the overall transfer gets — the O(n²) risk I flagged is gone independent of what the chunk size default is. Thanks for the fix.
|
|
||
| try: | ||
| # may raise if retries have been exhausted | ||
| retries = retries.increment( |
There was a problem hiding this comment.
increment returns a new Retry object every time, hence why we pass it around.
rtibbles
left a comment
There was a problem hiding this comment.
I understand the changes made here - I get the intention, and also it seems that they are all targeted at the same goal as the issue. The proof here will not come from my code review, so nothing else to say but approve!
| except Exception as e: | ||
| # always log the error itself | ||
| logger.error(e) | ||
| logger.exception(e) |
| ) | ||
|
|
||
|
|
||
| class ContextualRetryHTTPAdapter(HTTPAdapter): |
There was a problem hiding this comment.
Nice - I think I came across this recently somewhere either in Kolibri or ricecooker and had wondered why we hadn't used this approach before. I approve!
Summary
urllib3imports to import directly from the package instead of throughrequests.packagesproxy, because it stopped bundling the dependency nearly a decade ago. Dependency restrictions were added forrequestsandurllib3to enforce compatible versions for Morango's use. Minimumrequestsversion matches pinned version in Kolibri 0.19.xactive=Truefilter.SessionWrapperthat utilizes theRetryutility, if configured, to retry low-level connection issues that are not automatically retried byurllib3.SessionWrapperto be more self-containedTODO
Reviewer guidance
Morango integration tests in Kolibri are passing locally with these changes.
Careful attention was given to version specific gotchas and to ensure support for Kolibri's supported python versions. This PR also focuses on lower-level retries-- retrying complex behaviors like certificate requests, which would require new nonces, are not addressed. The transfer session API is passive to one already existing, to support resumption, as long as the request is the same. Overall, the most important area for retries is the buffer transfer since it would involve many requests, increasing the likelihood of a failure.
I've added comments for the reasoning of specific changes.
Issues addressed
Closes #339
AI Usage
AI was used to jumpstart the changes, although its approach got convoluted quickly. So most of this is handcrafted to integrate with
requestsandurllib3as smoothly as possible. AI was used to create some tests and to keep tests up-to-date through several iterations of this work.