Skip to content

Conversation

@yashmeet29
Copy link
Collaborator

@yashmeet29 yashmeet29 commented Jan 17, 2026

Describe your changes

Fix for Large File Upload Out of Memory issue
This was a classic producer-consumer problem.

What was happening:

When we upload large files, we're using 2 threads:
One thread reads chunks from the file and puts them in a queue
Another thread takes chunks from the queue and uploads them to SDM
The problem was that reading chunks became really fast recently (2-3 seconds per chunk), but uploading still takes time. So the reader thread was running way ahead of the uploader, filling up the queue.
The queue was set to hold up to 50 chunks. With each chunk being 20MB, that's potentially 1GB of memory just sitting in the queue - but our heap is only 305MB. That's why we were running out of memory around chunk 10-15.

The fix:

I changed the queue size from 50 down to 4 chunks. This gives us about 80MB max queue size, which is much more reasonable. Used BlockingQueue which handles this automatically - when the queue fills up, the reader thread just waits until the uploader catches up. So they naturally stay in sync now.
I also cleaned up some retry logic that was using RxJava and creating a bunch of temporary objects. Replaced it with a simpler retry loop.
Result: Memory usage went from ~1GB down to ~100MB per upload. This change should handle even multi-GB files now without issues.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist before requesting a review

  • I follow Java Development Guidelines for SAP
  • I have tested the functionality on my cloud environment.
  • I have provided sufficient automated/ unit tests for the code.
  • I have increased or maintained the test coverage.
  • I have ran integration tests on my cloud environment.
  • I have validated blackduck portal for any vulnerability after my commit.

Upload Screenshots/lists of the scenarios tested

  • I have Uploaded Screenshots or added lists of the scenarios tested in description

EU12 - AWS (Repository without trendmicro)
Screenshot 2026-01-17 at 10 52 06 AM

US31 - GCP (Repository with trendmicro)
Screenshot 2026-01-17 at 12 31 47 AM

Single-tenant Integration tests: https://github.com/cap-java/sdm/actions/runs/21087767783

Multi-tenant Integration tests: https://github.com/cap-java/sdm/actions/runs/21087862188

pom.xml Outdated

<properties>
<revision>1.6.3-SNAPSHOT</revision>
<revision>1.0.0-RC1</revision>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change before merging

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@rishikunnath2747 rishikunnath2747 merged commit 3703995 into develop Jan 18, 2026
9 checks passed
@rishikunnath2747 rishikunnath2747 deleted the fix_HeapOOMForLargeFile branch January 18, 2026 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants