GitHub - Dhaniyal-Jose/web-scraping: Interactive Web Scraper using Python (Requests + BeautifulSoup) with CLI-based user input and formatted table output.

🌐 Interactive Web Scraper (Static Websites) 📌 Project Overview

This project is an Interactive Web Scraping Tool built using Python. It allows users to dynamically enter a website URL and an HTML tag to extract content from static websites.

The extracted data is displayed in a clean and user-friendly table format.

🎯 Objective

To fetch data from a website and present it in a user-friendly way using a simple web scraping library.

This project completes the following steps:

Select a website and identify the data to be scraped.

Utilize a web scraping library to fetch the data.

Design a user-friendly presentation format.

Test the program with different websites.

🚀 Features

User enters website URL

User selects HTML tag (h1, p, a, div, etc.)

Extracts first 20 elements for clean display

Displays results in formatted table

Simple CLI-based interaction

Error handling included

🛠 Technologies Used

Python 3

Requests

BeautifulSoup4

Tabulate

📦 Installation 1️⃣ Clone the Repository git clone https://github.com/Dhaniyal-Jose/web-scraping.git cd web-scraping 2️⃣ Create Virtual Environment (Optional) python -m venv venv venv\Scripts\activate # Windows 3️⃣ Install Dependencies pip install -r requirements.txt 📄 requirements.txt requests beautifulsoup4 tabulate ▶️ How to Run python scraper.py 🌍 Sample Websites for Testing

⚠ This version works best with static websites.

Website URL Suggested HTML Tag Quotes Practice Site http://quotes.toscrape.com span Example Website https://example.com p Books Practice Site http://books.toscrape.com h3 Wikipedia https://www.wikipedia.org a

❌ Does NOT work properly with:

YouTube

Amazon

Instagram

Flipkart

(Because they are JavaScript-rendered websites.)

🧪 Example Usage Enter website URL: http://quotes.toscrape.com Enter HTML tag to scrape: span

Output:

+--------+------------------------------------------+ | S.No | Content | +--------+------------------------------------------+ | 1 | “The world as we have created it...” | | 2 | Albert Einstein | +--------+------------------------------------------+ 🧠 How It Works

The program sends an HTTP request using requests.

The HTML content is parsed using BeautifulSoup.

The specified HTML tag is extracted.

The first 20 results are displayed in a formatted table using tabulate.

⚠ Limitations

Only works with static websites.

Cannot scrape JavaScript-rendered pages.

Some websites may block automated requests.

📚 Learning Outcomes

Understanding static web scraping

Using BeautifulSoup for HTML parsing

Extracting elements using HTML tags

Formatting output in table structure

Handling HTTP request exceptions

👨‍💻 Developed By

Dhaniyal Jose B.Tech CSE Student | Software Development Enthusiast

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
requirements.txt		requirements.txt
scraper.py		scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages