University Project
Alvaro Bello β¨
Development Date: 14/05/2023 π
This project implements a simple spam filter for SMS messages using the Naive Bayes classifier in MATLAB. It classifies messages as either ham (non-spam) or spam based on their word content.
The classifier is trained using the well-known SMSSpamCollection dataset, and uses a Bag of Words model to estimate word frequencies in spam and ham messages, in order to predict if the message is ham or spam.
SMS-Spam-Filter/
βββ data/ # Folder storing datasets
β βββ smsspamcollection/
β β βββ SMSSpamCollection # Raw dataset file
β
βββ extra_functions/ # Provided helper functions (from La Salle University)
βββ main.m # Main MATLAB script
β
βββ Project_Documentation.pdf # Detailed documentation of the project
βββ README.md
The script was executed 10 times in total:
- 5 runs with the
crossDeleteWordsfunction enabled. - 5 runs without it.
The experiments yielded an average classification accuracy of 95%, demonstrating not only the effectiveness of the Naive Bayes approach on the SMSSpamCollection dataset, but also that even a simple implementation can achieve highly reliable results in spam detection tasks.
- Make sure you have MATLAB installed.
- Download or clone this repository.
- Run the main script:
>> main.m- Programming Language: Matlab
- IDE: Matlab IDE
- Dataset: SMSSpamCollection
- Helper functions provided by La Salle University.
- Developed as part of a practical assignment on spam detection and text classification.