WebScraper Pro 🕸️

A configurable Python web scraping tool that extracts structured data from multiple webpages and exports the results to CSV.
Built for automation, data collection, and Upwork-style client projects.

✨ Features

Scrapes multiple pages using a URL pattern with {page}
Fully configurable via JSON (no code changes needed)
Extracts data using CSS selectors (quotes, authors, tags, or any other fields)
Saves clean structured data to CSV
Logs scraping progress to logs/scraper.log
Easy CLI interface for clients and non-technical users

🧱 Project Structure

webscraper_pro/
├─ README.md
├─ LICENSE
├─ requirements.txt
├─ .gitignore
├─ data/
│  ├─ sample_urls.txt
│  └─ output/
├─ logs/
├─ webscraper/
│  ├─ __init__.py
│  ├─ config_example.json
│  ├─ cli.py
│  ├─ scraper.py
│  ├─ parser.py
│  └─ storage.py

⚙️ Configuration

Example config file: webscraper/config_example.json

{
    "base_url": "https://quotes.toscrape.com/page/{page}/",
    "start_page": 1,
    "end_page": 3,
    "selectors": {
        "quote": ".quote .text",
        "author": ".quote .author",
        "tags": ".quote .tags .tag"
    }
}

Fields explained:

base_url — must contain {page} so scraper can iterate
start_page / end_page — scraping range
selectors — CSS selectors for each extracted field

You can modify this JSON to scrape any website, not just quotes.

▶️ How to Run

Create and activate a virtual environment:

python3 -m venv .venv
source .venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Run the scraper:

python -m webscraper.cli --config webscraper/config_example.json --output data/output/quotes.csv

Result:

Fetches pages 1–3
Extracts quotes, authors, and tags
Saves them to data/output/quotes.csv

📜 License

This project is licensed under the MIT License.
You are free to use, modify, distribute, and incorporate the code into your own projects.

See the full license in the included LICENSE file.

📝 Notes

This project is for demonstration and educational purposes.
Always respect website terms of service and robots.txt when scraping real websites.
The scraper is modular and easy to extend for more complex automation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WebScraper Pro 🕸️

✨ Features

🧱 Project Structure

⚙️ Configuration

▶️ How to Run

📜 License

📝 Notes

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

WebScraper Pro 🕸️

✨ Features

🧱 Project Structure

⚙️ Configuration

▶️ How to Run

📜 License

📝 Notes