https://lablnet.com/project/alibabascraper
This is a robust web scraper that extracts data from Alibaba and Yiwugo (义乌购) websites. It's multi-threaded and utilizes Playwright to efficiently scrape data. The Alibaba scraper is capable of scraping the entire site (~4-6 months), while the Yiwugo module focuses on China's largest wholesale market platform.
| Platform | Directory | Description |
|---|---|---|
| Alibaba | alibaba/ |
International wholesale marketplace |
| Yiwugo | yiwugo/ |
China's Yiwu wholesale market (义乌购) — 75,000+ shops, 4M+ products |
- Clone the repository.
- Run
npm installto install the dependencies. - Copy
.env.exampleto.envand update the values.
- Run
node ./alibaba/categories.jsto get the categories and store them in the database. - Run
node ./alibaba/processProducts.jsto start the scraper.- As you can not keep the terminal open so you can use nohup to run the script in background.
nohup node ./alibaba/processProducts.js &- The script will create
categories_queue1queue file in the root directory, and it will keep running until the queue is empty.
- Run
node ./yiwugo/categories.jsto scrape Yiwugo product categories. - Run
node ./yiwugo/processProducts.jsto start scraping products from all categories.nohup node ./yiwugo/processProducts.js &- The script reads from
yiwugo_categories_queue.txtand processes until the queue is empty.
Each scraped Yiwugo product includes:
| Field | Description |
|---|---|
title |
Product name (Chinese) |
price |
Unit price or price range |
minOrder |
Minimum order quantity |
supplier |
Shop/supplier name |
supplierLink |
Link to supplier's Yiwugo store |
location |
Market district/address in Yiwu |
category |
Product category |
images |
Product image URLs |
Tip: For a hosted, no-code version of the Yiwugo scraper, check out the Yiwugo Scraper on Apify Store.
- Scrape data from Alibaba and Yiwugo websites
- Multi-threaded (worker threads for parallel category processing)
- Save data to Amazon DynamoDB
- Proxy support
- Chinese text encoding handled (Yiwugo)
- Proper error handling and logging
- MIT