An AI-powered web scraping and semantic analysis platform built with Python, Streamlit, Gemini embeddings, and FAISS.
WEBSAGE helps users extract, analyze, search, and visualize web data intelligently across multiple domains such as e-commerce, jobs, travel, and real estate.
-
🔎 Smart Web Scraping using BeautifulSoup
-
🧠 Semantic Search with Gemini embeddings
-
⚡ Vector Similarity Search powered by FAISS
-
📊 Interactive Analytics Dashboard using Streamlit
-
🌐 Multi-domain scraping support:
- E-commerce products
- Job listings
- Travel data
- Real estate listings
-
📁 HTML Report Export for analysis results
-
🔐 Basic login/user session support
-
🖥️ Windows executable packaging support
- Frontend/UI: Streamlit
- Backend: Python
- Web Scraping: BeautifulSoup, Requests
- AI Embeddings: Gemini API
- Vector Database: FAISS
- Data Processing: Pandas, NumPy
- Storage: JSON / Local files
WEBSAGE/
│── app.py
│── faiss_index/
│── .streamlit/
│── analysis_results.html
│── users.json
│── requirements.txt
│── README.md
git clone https://github.com/NithinGowda67/WEBSAGE.git
cd WEBSAGEpython -m venv venvWindows
venv\Scripts\activateMac/Linux
source venv/bin/activatepip install -r requirements.txtCreate a .env file:
GEMINI_API_KEY=your_api_key_herestreamlit run app.py- Product price comparison
- Job market intelligence
- Travel data insights
- Property listing analysis
- AI-powered semantic search over scraped content
- 🌍 Live multi-page crawling
- ☁️ Cloud deployment (AWS/GCP)
- 👥 Multi-user authentication
- 📈 Advanced ML insights
- 🗂️ PostgreSQL / MongoDB storage
- 🤖 Chat-based search assistant
Nithin Kumar N
- Computer Science Engineer
- AI | Full Stack | AWS | Blockchain Enthusiast
GitHub: https://github.com/NithinGowda67
If you like this project, star the repository and share your feedback.