Skip to content

Dhaniyal-Jose/web-scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

2 Commits
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŒ Interactive Web Scraper (Static Websites) ๐Ÿ“Œ Project Overview

This project is an Interactive Web Scraping Tool built using Python. It allows users to dynamically enter a website URL and an HTML tag to extract content from static websites.

The extracted data is displayed in a clean and user-friendly table format.

๐ŸŽฏ Objective

To fetch data from a website and present it in a user-friendly way using a simple web scraping library.

This project completes the following steps:

Select a website and identify the data to be scraped.

Utilize a web scraping library to fetch the data.

Design a user-friendly presentation format.

Test the program with different websites.

๐Ÿš€ Features

User enters website URL

User selects HTML tag (h1, p, a, div, etc.)

Extracts first 20 elements for clean display

Displays results in formatted table

Simple CLI-based interaction

Error handling included

๐Ÿ›  Technologies Used

Python 3

Requests

BeautifulSoup4

Tabulate

๐Ÿ“ฆ Installation 1๏ธโƒฃ Clone the Repository git clone https://github.com/Dhaniyal-Jose/web-scraping.git cd web-scraping 2๏ธโƒฃ Create Virtual Environment (Optional) python -m venv venv venv\Scripts\activate # Windows 3๏ธโƒฃ Install Dependencies pip install -r requirements.txt ๐Ÿ“„ requirements.txt requests beautifulsoup4 tabulate โ–ถ๏ธ How to Run python scraper.py ๐ŸŒ Sample Websites for Testing

โš  This version works best with static websites.

Website URL Suggested HTML Tag Quotes Practice Site http://quotes.toscrape.com span Example Website https://example.com p Books Practice Site http://books.toscrape.com h3 Wikipedia https://www.wikipedia.org a

โŒ Does NOT work properly with:

YouTube

Amazon

Instagram

Flipkart

(Because they are JavaScript-rendered websites.)

๐Ÿงช Example Usage Enter website URL: http://quotes.toscrape.com Enter HTML tag to scrape: span

Output:

+--------+------------------------------------------+ | S.No | Content | +--------+------------------------------------------+ | 1 | โ€œThe world as we have created it...โ€ | | 2 | Albert Einstein | +--------+------------------------------------------+ ๐Ÿง  How It Works

The program sends an HTTP request using requests.

The HTML content is parsed using BeautifulSoup.

The specified HTML tag is extracted.

The first 20 results are displayed in a formatted table using tabulate.

โš  Limitations

Only works with static websites.

Cannot scrape JavaScript-rendered pages.

Some websites may block automated requests.

๐Ÿ“š Learning Outcomes

Understanding static web scraping

Using BeautifulSoup for HTML parsing

Extracting elements using HTML tags

Formatting output in table structure

Handling HTTP request exceptions

๐Ÿ‘จโ€๐Ÿ’ป Developed By

Dhaniyal Jose B.Tech CSE Student | Software Development Enthusiast

About

Interactive Web Scraper using Python (Requests + BeautifulSoup) with CLI-based user input and formatted table output.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages