fix Heap Out of memory issue in large file upload #407
+62
−48
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Describe your changes
Fix for Large File Upload Out of Memory issue
This was a classic producer-consumer problem.
What was happening:
When we upload large files, we're using 2 threads:
One thread reads chunks from the file and puts them in a queue
Another thread takes chunks from the queue and uploads them to SDM
The problem was that reading chunks became really fast recently (2-3 seconds per chunk), but uploading still takes time. So the reader thread was running way ahead of the uploader, filling up the queue.
The queue was set to hold up to 50 chunks. With each chunk being 20MB, that's potentially 1GB of memory just sitting in the queue - but our heap is only 305MB. That's why we were running out of memory around chunk 10-15.
The fix:
I changed the queue size from 50 down to 4 chunks. This gives us about 80MB max queue size, which is much more reasonable. Used BlockingQueue which handles this automatically - when the queue fills up, the reader thread just waits until the uploader catches up. So they naturally stay in sync now.
I also cleaned up some retry logic that was using RxJava and creating a bunch of temporary objects. Replaced it with a simpler retry loop.
Result: Memory usage went from ~1GB down to ~100MB per upload. This change should handle even multi-GB files now without issues.
Type of change
Please delete options that are not relevant.
Checklist before requesting a review
Upload Screenshots/lists of the scenarios tested
EU12 - AWS (Repository without trendmicro)

US31 - GCP (Repository with trendmicro)

Single-tenant Integration tests: https://github.com/cap-java/sdm/actions/runs/21087767783
Multi-tenant Integration tests: https://github.com/cap-java/sdm/actions/runs/21087862188