fix: prevent memory leak by closing unused context #1640
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
When scraping many URLs continuously, browser contexts accumulate in memory and are never cleaned up. The existing cleanup mechanism only runs when browsers go idle, which never happens under continuous load. This causes memory to grow unbounded until the process crashes or becomes unresponsive.
Fixes #943
List of files changed and why
browser_manager.py: Add _context_refcounts tracking, cleanup_contexts(), and release_context() methodsasync_crawler_strategy.py: Release context ref in finally block after crawldeploy/docker/api.py: Trigger context cleanup after each requestHow Has This Been Tested?
This has been tested locally by running the following script and comparing the before/after memory usage with both the master version and the patched version through a docker compose.
The script simply perform 100 scrape with 8 concurrency and report the status code repartition:
https://gist.github.com/Martichou/27555055d130d1c65f6a8457fbeb2a22
Result of the test:
Unpatched version:
Patched version:
It may not have eliminated every leaks (1% gains between run for unknown reason), but closing the browser using the kill browser endpoint make the memory go back to 10%.
Checklist: