Add support for AI/LLM-based HTML parsing (selectors)

At recent events I attended, I was asked about AI/LLM-based HTML parsing. I also found a few dedicated AI-based scraping frameworks, such as [ScrapeGraphAI](https://github.com/ScrapeGraphAI/Scrapegraph-ai) and [Parsera](https://github.com/raznem/parsera), that appear to be gaining traction.

Right now, we provide an AI-selector workflow only through the `PlaywrightCrawler` via [Stagehand guide](https://crawlee.dev/python/docs/guides/playwright-crawler-stagehand).

This means:
- AI-based selectors are supported only for Playwright, not for HTTP-based crawlers.
- Even for `PlaywrightCrawler`, the integration is not very smooth compared to the tools mentioned above.

Example from the `ScrapeGraphAI`:

```python
# Create the SmartScraperGraph instance
smart_scraper_graph = SmartScraperGraph(
    prompt="Extract useful information from the webpage, including a description of what the company does, founders and social media links",
    source="https://scrapegraphai.com/",
    config=graph_config
)

# Run the pipeline
result = smart_scraper_graph.run()
```

It might be worth exploring a more native solution:
- Better Stagehand integration so that AI-based selectors in Playwright crawlers are as straightforward as in the dedicated AI-scraping libraries.
- Introduce an AI/LLM-powered crawler built on top of `AbstractHttpCrawler`, enabling AI/LLM selectors for HTTP-based scraping as well.

This could make Crawlee more usable for AI/LLM-based extractions, and/or for faster prototype scrapers without manual CSS/XPath selectors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for AI/LLM-based HTML parsing (selectors) #1593

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add support for AI/LLM-based HTML parsing (selectors) #1593

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions