Implement/document a way how to pass custom information to handlers

For the purposes of testability and nice code structure I thought I'd have some information dependency-injected top-down from the main function. I don't know how to do that. I'll illustrate my problem on a constant, but imagine there are some `click` options to my program which affect how the scraper behaves, so the value doesn't have to be immutable and the issue is the same. This is my program:

```python
import re
import asyncio
from enum import StrEnum, auto

import click
from crawlee.beautifulsoup_crawler import (
    BeautifulSoupCrawler,
    BeautifulSoupCrawlingContext,
)
from crawlee.router import Router


LENGTH_RE = re.compile(r"(\d+)\s+min")


class Label(StrEnum):
    DETAIL = auto()


router = Router[BeautifulSoupCrawlingContext]()


@click.command()
def edison():
    asyncio.run(scrape())


async def scrape():
    crawler = BeautifulSoupCrawler(request_handler=router)
    await crawler.run(["https://edisonfilmhub.cz/program"])
    await crawler.export_data("edison.json", dataset_name="edison")


@router.default_handler
async def detault_handler(context: BeautifulSoupCrawlingContext):
    await context.enqueue_links(selector=".program_table .name a", label=Label.DETAIL)


@router.handler(Label.DETAIL)
async def detail_handler(context: BeautifulSoupCrawlingContext):
    context.log.info(f"Scraping {context.request.url}")

    description = context.soup.select_one(".filmy_page .desc3").text
    length_min = LENGTH_RE.search(description).group(1)
    # TODO get starts_at, then calculate ends_at

    await context.push_data(
        {
            "url": context.request.url,
            "title": context.soup.select_one(".filmy_page h1").text.strip(),
            "csfd_url": context.soup.select_one(".filmy_page .hrefs a")["href"],
        },
        dataset_name="edison",
    )
```

In the main function, I have certain information I want to pass down. For example I want `"edison"` to be an argument:

```py
@click.command()
def edison():
    slug = "edison"
    asyncio.run(scrape(slug))
```

Then this is easy:

```py
async def scrape(slug: str):
    crawler = BeautifulSoupCrawler(request_handler=router)
    await crawler.run(["https://edisonfilmhub.cz/program"])
    await crawler.export_data(f"{slug}.json", dataset_name=slug)
```

But then, how do I pass that `slug` down to the handlers? I have no idea. What do you suggest as the best approach?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement/document a way how to pass custom information to handlers #525

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement/document a way how to pass custom information to handlers #525

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions