Skip to content

BitingSnakes/silkworm-example-products

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Product Spiders

Silkworm spiders for marketplace product listings from prom.ua, bon.ua, and olx.ua.

Crawl contract

  • prom.ua
    • Entry shape: search results such as https://prom.ua/search?search_term=кроссовки
    • Transport: http
    • Product fields: title, url, image, price, seller name from JSON-LD, listing id
  • bon.ua
    • Entry shape: category listing such as https://bon.ua/electronika-i-bitovaya-tehnika
    • Transport: http
    • Product fields: title, url, image, price, location, posted date, listing id
  • olx.ua
    • Entry shape: category listing such as https://www.olx.ua/uk/elektronika/
    • Transport: http
    • Product fields: title, url, image, price, location, posted date, listing id

Setup

uv sync --prerelease=allow

Run

Prom:

uv run product-spiders --site prom_ua --query "кроссовки" --max-items 10 --output-path output/prom_ua_items.jsonl

BON:

uv run product-spiders --site bon_ua --category electronika-i-bitovaya-tehnika --max-items 10 --output-path output/bon_ua_items.jsonl

OLX:

uv run product-spiders --site olx_ua --category elektronika --max-items 10 --output-path output/olx_ua_items.jsonl

Override the starting URL explicitly if you want a different search or category page:

uv run product-spiders --site prom_ua --start-url "https://prom.ua/search?search_term=ноутбук" --max-items 5

Prefect harness

Run the thin Prefect 3 flow locally:

uv run python -m prefect_app.main

The flow invokes the repository CLI instead of duplicating scraper logic.

Validation

uv run ruff format .
uv run ruff check .
uv run pyrefly check
uv run pytest

Notes and risks

  • olx.ua listing cards were present in static HTML during validation, but Silkworm's generic crawl blueprint returned zero items on the same page. The handwritten spider uses the verified selectors directly.
  • prom.ua seller names come from card-level JSON-LD, not a stable visible seller selector.
  • prom.ua uses a Silkworm fetch_html() runner in the CLI because run_spider() hit redirect handling issues against Prom during live validation on March 26, 2026, while the Silkworm fetch client returned the expected listing HTML.
  • bon.ua listing pages do not reliably expose seller names, so this spider keeps seller extraction listing-only and does not follow detail pages.

About

Spiders for OLX, Bon, and Prom to scrape product data

Resources

Stars

Watchers

Forks

Contributors

Languages