๐ Interactive Web Scraper (Static Websites) ๐ Project Overview
This project is an Interactive Web Scraping Tool built using Python. It allows users to dynamically enter a website URL and an HTML tag to extract content from static websites.
The extracted data is displayed in a clean and user-friendly table format.
๐ฏ Objective
To fetch data from a website and present it in a user-friendly way using a simple web scraping library.
This project completes the following steps:
Select a website and identify the data to be scraped.
Utilize a web scraping library to fetch the data.
Design a user-friendly presentation format.
Test the program with different websites.
๐ Features
User enters website URL
User selects HTML tag (h1, p, a, div, etc.)
Extracts first 20 elements for clean display
Displays results in formatted table
Simple CLI-based interaction
Error handling included
๐ Technologies Used
Python 3
Requests
BeautifulSoup4
Tabulate
๐ฆ Installation
1๏ธโฃ Clone the Repository
git clone https://github.com/Dhaniyal-Jose/web-scraping.git
cd web-scraping
2๏ธโฃ Create Virtual Environment (Optional)
python -m venv venv
venv\Scripts\activate # Windows
3๏ธโฃ Install Dependencies
pip install -r requirements.txt
๐ requirements.txt
requests
beautifulsoup4
tabulate
โ This version works best with static websites.
Website URL Suggested HTML Tag Quotes Practice Site http://quotes.toscrape.com span Example Website https://example.com p Books Practice Site http://books.toscrape.com h3 Wikipedia https://www.wikipedia.org a
โ Does NOT work properly with:
YouTube
Amazon
Flipkart
(Because they are JavaScript-rendered websites.)
๐งช Example Usage Enter website URL: http://quotes.toscrape.com Enter HTML tag to scrape: span
Output:
+--------+------------------------------------------+ | S.No | Content | +--------+------------------------------------------+ | 1 | โThe world as we have created it...โ | | 2 | Albert Einstein | +--------+------------------------------------------+ ๐ง How It Works
The program sends an HTTP request using requests.
The HTML content is parsed using BeautifulSoup.
The specified HTML tag is extracted.
The first 20 results are displayed in a formatted table using tabulate.
โ Limitations
Only works with static websites.
Cannot scrape JavaScript-rendered pages.
Some websites may block automated requests.
๐ Learning Outcomes
Understanding static web scraping
Using BeautifulSoup for HTML parsing
Extracting elements using HTML tags
Formatting output in table structure
Handling HTTP request exceptions
๐จโ๐ป Developed By
Dhaniyal Jose B.Tech CSE Student | Software Development Enthusiast