Set time limit and interrupt crawling that takes too long #1586

ryangawei · 2025-11-05T17:54:54Z

ryangawei
Nov 5, 2025

Hi,

I am crawling 10K URLs using crawler.arun_many with batch size 100. In some batches, some URLs takes extremely long (a few hours) and blocks the entire process. I want to know if it is possible to add time limit so that, the URL job that takes more than X seconds to crawl will be canceled, or keep only the content that have been crawled. I'm willing to sacrifice that URL to ensure the time of entire dataset.

Thank you.

hafezparast · 2026-03-27T12:56:35Z

hafezparast
Mar 27, 2026
Sponsor

Use page_timeout in CrawlerRunConfig — it's the navigation timeout in milliseconds. The default is 60,000ms (60 seconds), which explains why slow pages block for a long time before giving up.

config = CrawlerRunConfig(
    page_timeout=30000,        # 30s hard cutoff for page navigation
    wait_for_timeout=10000,    # 10s cutoff for any wait_for conditions
)

results = await crawler.arun_many(urls, config=config)

If a page doesn't load within page_timeout, crawl4ai will return a failed result for that URL and move on — it won't block the batch.

For your 10K URL case, you can also tune the dispatcher:

from crawl4ai import MemoryAdaptiveDispatcher

dispatcher = MemoryAdaptiveDispatcher(
    max_session_permit=20,      # max concurrent crawls
    fairness_timeout=120.0,     # prioritize URLs waiting > 2 min
)

results = await crawler.arun_many(urls, config=config, dispatcher=dispatcher)

If you're seeing URLs that hang for hours even with page_timeout=60000, the issue might be post-navigation steps (JS execution, scrolling). In that case, also set:

config = CrawlerRunConfig(
    page_timeout=30000,
    wait_for_timeout=10000,
    delay_before_return_html=0,    # no extra delay
    scan_full_page=False,          # don't scroll if not needed
)

Failed URLs will have result.success = False so you can identify and optionally retry them separately.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Set time limit and interrupt crawling that takes too long #1586

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Set time limit and interrupt crawling that takes too long #1586

Uh oh!

ryangawei Nov 5, 2025

Replies: 1 comment

Uh oh!

hafezparast Mar 27, 2026 Sponsor

ryangawei
Nov 5, 2025

hafezparast
Mar 27, 2026
Sponsor