fix: prevent memory leak by closing unused context #1640

Martichou · 2025-11-25T20:59:43Z

Summary

When scraping many URLs continuously, browser contexts accumulate in memory and are never cleaned up. The existing cleanup mechanism only runs when browsers go idle, which never happens under continuous load. This causes memory to grow unbounded until the process crashes or becomes unresponsive.

Fixes #943

Small note: I'm not used to python, I won't lie, Claude helped me a bit here, but I've checked what it did and tested it. So this is not just yet another AI slop :)

List of files changed and why

browser_manager.py: Add _context_refcounts tracking, cleanup_contexts(), and release_context() methods
async_crawler_strategy.py: Release context ref in finally block after crawl
deploy/docker/api.py: Trigger context cleanup after each request

How Has This Been Tested?

This has been tested locally by running the following script and comparing the before/after memory usage with both the master version and the patched version through a docker compose.

The script simply perform 100 scrape with 8 concurrency and report the status code repartition:
https://gist.github.com/Martichou/27555055d130d1c65f6a8457fbeb2a22

Result of the test:

Unpatched version:

Baseline memory usage: 4.5%
End of first test run using unpatched version: 23.4%
End of second test run using unpatched version: 27.6%
End of third test run using unpatched version: 32.8%

Patched version:

Baseline memory usage: 5.7%
End of first test run using unpatched version: 11.2%
End of second test run using unpatched version: 12.3%
End of third test run using unpatched version: 13.4%

It may not have eliminated every leaks (1% gains between run for unknown reason), but closing the browser using the kill browser endpoint make the memory go back to 10%.

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added/updated unit tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

When scraping many URLs continuously, browser contexts accumulated in memory and were never cleaned up. The existing cleanup only ran when browsers went idle, which never happened under continuous load. See: unclecode#943. Key changes: - browser_manager.py: Add _context_refcounts tracking, cleanup_contexts(), and release_context() methods - async_crawler_strategy.py: Release context ref in finally block after crawl - deploy/docker/api.py: Trigger context cleanup after each request This fixes or at least, drastically improve the memory leaks in my testing.

aravindkarnam and others added 2 commits November 24, 2025 13:29

Sponsors/new (unclecode#1637)

0024c82

Martichou force-pushed the fix/leaks branch from 4e1c406 to 8bdef83 Compare November 25, 2025 21:02

ntohidi changed the base branch from main to develop November 26, 2025 08:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: prevent memory leak by closing unused context #1640

fix: prevent memory leak by closing unused context #1640

Uh oh!

Martichou commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

fix: prevent memory leak by closing unused context #1640

Are you sure you want to change the base?

fix: prevent memory leak by closing unused context #1640

Uh oh!

Conversation

Martichou commented Nov 25, 2025

Summary

List of files changed and why

How Has This Been Tested?

Checklist:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants