An automated email scraping and classification system designed for IIT Gandhinagar's insIIT platform. The system periodically fetches emails, classifies them into categories, and stores structured data for various campus services.
- Automated Email Fetching: Scrapes new emails every 5 minutes
- Multi-Category Classification: Events, Mess Menu, Medical Updates, Opportunities, Timetables
- Non-AI Pre-filtering: Rule-based filtering to remove irrelevant emails
- Structured Data Storage: Follows predefined schemas for consistent data
- Secure API Access: Authentication-based REST API endpoints
- Attachment Handling: Extracts and processes email attachments
- Auto Event Updates: Campus events and announcements
- Medical Unavailability: Doctor schedules and availability
- Mess Menu Updates: Daily meal schedules
- Timetable Updates: Academic schedule changes
- Opportunity Updates: Internships, jobs, scholarships
Email Server (Gmail via IMAP)
↓
Email Access Module (OAuth2)
↓
Pre-Classifier (Rule-based)
↓
Email Parser Module
↓
AI/ML Classifier Module
↓
Database (MongoDB/SQL)
↓
REST API Service
- Email Access: Gmail API + OAuth2
- Scheduler: APScheduler / Celery
- Backend: Python (Flask/FastAPI)
- Database: MongoDB / PostgreSQL
- Authentication: JWT / OAuth2
- API: RESTful API
Team Members: Hardik, Satyam
Responsibilities:
- Implement OAuth2 authentication with Gmail
- Fetch new emails from IMAP server
- Extract email fields (from, to, subject, body, attachments)
- Handle email connection and session management
Module Location: src/email_access/
Team Member: Muder
Responsibilities:
- Create rule-based filtering logic
- Filter emails based on sender/subject patterns
- Remove spam and irrelevant emails
- Define filtering rules
Module Location: src/pre_classifier/
Team Members: Ishani, Lakshika
Responsibilities:
- Parse email content and extract text
- Handle various email formats (HTML, plain text)
- Process and extract attachments
- Clean and normalize email content
Module Location: src/parser/
Team Member: Yashvardhan
Responsibilities:
- Categorize emails (events, mess, medical, opportunities)
- Format data according to predefined schemas
- Implement classification logic
- Validate classified data
Module Location: src/classifier/
Team Member: Arpit
Responsibilities:
- Design and implement database schema
- Create database models
- Implement CRUD operations
- Handle data persistence and retrieval
- Optimize database queries
Module Location: src/database/
click the "Fork" button on the top-right corner of the repository page.
# Clone your fork
git clone https://github.com/YOUR_USERNAME/Email-Scrapper.git
cd Email-Scrapper# Create virtual environment
python -m venv venv
# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On Unix/MacOS:
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Install development dependencies
pip install -r requirements-dev.txt # if available# Update main branch
git checkout main
git pull origin main
# Create feature branch
git checkout -b feature/your-module-name- Work only in your assigned module directory
- Follow the code standards (see below)
- Write tests for your code
- Update documentation as needed
git add .
git commit -m "feat(module-name): brief description"git push origin feature/your-module-nameThen create a Pull Request on GitHub.