Web Scraping Pipeline

📑 Table of Contents

Overview
Approach
Results
Technology Stack
License
Credits

🧭 Overview

A web scraping pipeline that gathers verified business contact data from publicly available directories across selected industries.

Collects and saves all winery profile URLs from Europages
Visits those pages (or linked websites), extracts valid email addresses.
Produces and submits the final cleaned datasets(CSV).

The pipeline must follow this exact format.

NAME	EMAILS	COUNTRY

NAME	LINK

🛠️ Approach

Step 1 – Setup

Imported the libraries.
Identified the 3 source pages.

Step 2 – Data Extraction

Scraped data from each page.
Saved each page’s .html links into separate CSV files.

Step 3 – Data Merging

Merged the three CSV files into one master file.

Step 4 – URL Construction

Combined the standard base URL with the variable link parts.
Generated full industry URLs.

Step 5 – Final Outputs

Created NAME | LINKS CSV.
Created NAME | EMAILS | COUNTRY CSV.

📊 Results

NAME | LINK(.html)

NAME | LINK

NAME | LINK | COUNTRY

🧰 Technology Stack

Language: Python 3
Modules:
- requests
- BeautifulSoup (bs4)
- pandas
- urllib.parse
- re
Platform: Google Colab

🛡️ Licence

MIT License

🙌 Credits

Sources
https://www.notion.so/digiole/Scalable-Web-Scraping-Pipeline-21425969342680b7a99ef9f999a96f06
https://www.europages.co.uk/en/search?isPserpFirst=1&q=winery+supplies
https://beautiful-soup-4.readthedocs.io/en/latest/#quick-start
https://pandas.pydata.org/docs/user_guide/index.html
W3schools
Stackoverflow
Youtube
Chat GPT
Gemini AI
Geeks for Geeks

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
LICENSE		LICENSE
README.md		README.md
pipeline.ipynb		pipeline.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraping Pipeline

📑 Table of Contents

🧭 Overview

🛠️ Approach

📊 Results

🧰 Technology Stack

🛡️ Licence

🙌 Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Web Scraping Pipeline

📑 Table of Contents

🧭 Overview

🛠️ Approach

📊 Results

🧰 Technology Stack

🛡️ Licence

🙌 Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages