Email Spam Detection 📨

Overview

This project focuses on detecting whether an email is spam or not using machine learning techniques. The main objective is to classify incoming emails into two categories — Spam and Ham (Not Spam) — based on the email content.
The project demonstrates the process of data preprocessing, feature extraction, model building, and evaluation using Python.

Features

Cleans and preprocesses raw email text
Converts text data into numerical form using TF-IDF vectorization
Implements machine learning models such as Logistic Regression for classification
Evaluates model performance using metrics like accuracy, precision, recall, and F1 score
Includes exploratory data analysis (EDA) for better understanding of the dataset

Technologies Used

Language: Python
Libraries: pandas, numpy, scikit-learn, matplotlib, seaborn, nltk
Environment: Jupyter Notebook

Dataset

The dataset used in this project is mail_data.csv, which contains email messages and their corresponding labels (spam or ham).
Each record includes:

Email text – The actual content of the email
Label – Indicates whether the email is spam or not

Project Structure


Email_Spam_Detection/
│
├── mail_data.csv              # Dataset file
├── detection.ipynb            # Main Jupyter notebook (data preprocessing, training, evaluation)
├── detection-checkpoint.ipynb # Backup notebook file
└── README.md                  # Project documentation

Steps to Run the Project

1. Clone the repository

git clone https://github.com/Pallabi26313/Email_Spam_Detection.git
cd Email_Spam_Detection

2. Install dependencies

Make sure you have Python installed, then install required libraries:

pip install pandas numpy scikit-learn matplotlib seaborn nltk

3. Run the notebook

jupyter notebook detection.ipynb

4. Execute the cells sequentially to:

Load and explore the dataset
Preprocess the text data
Train and test the model
Evaluate model performance

Results

After training and testing, the model achieved:

Accuracy: [96%]

You can further improve accuracy by trying other models such as SVM, Random Forest, or XGBoost.

Future Enhancements

Add a web app interface using Streamlit or Flask
Try deep learning models (LSTM or BERT) for better text understanding
Improve text preprocessing using advanced NLP techniques
Add visualization dashboards for real-time spam classification

Conclusion

This project successfully demonstrates how machine learning can be applied to classify emails as spam or not spam. It covers end-to-end development — from data preprocessing to model evaluation — making it a useful beginner project for NLP and text classification.

Author

Pallabi Ghosh

GitHub: Pallabi26313
Email: [pallabighosh7142@gmail.com]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Email Spam Detection 📨

Overview

Features

Technologies Used

Dataset

Project Structure

Steps to Run the Project

1. Clone the repository

2. Install dependencies

3. Run the notebook

4. Execute the cells sequentially to:

Results

Future Enhancements

Conclusion

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
detection-checkpoint.ipynb		detection-checkpoint.ipynb
detection.ipynb		detection.ipynb
mail_data.csv		mail_data.csv

Folders and files

Latest commit

History

Repository files navigation

Email Spam Detection 📨

Overview

Features

Technologies Used

Dataset

Project Structure

Steps to Run the Project

1. Clone the repository

2. Install dependencies

3. Run the notebook

4. Execute the cells sequentially to:

Results

Future Enhancements

Conclusion

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages