Skip to content

Latest commit

 

History

History
51 lines (36 loc) · 1.37 KB

File metadata and controls

51 lines (36 loc) · 1.37 KB

Email Classification Project

This is my project for classifying emails into different types (Type 2, Type 3, and Type 4). I tried two different ways to do this:

  1. Chained approach - where each model feeds into the next one
  2. Hierarchical approach - where models are organized in a tree structure

My Project Files

The main folders are:

  • config - has some settings
  • data - for loading emails and cleaning them
  • models - contains my classifier code
  • utils - helper functions I wrote

The main file to run is main.py

Stuff you need to install

You need these packages:

  • scikit-learn (for ML algorithms)
  • numpy
  • pandas (not using it much yet)
  • nltk (for text processing)

I think Python 3.7 or newer should work fine.

How to run it

Just run this command in the terminal:

python main.py

How it works

Chained Approach

In this approach:

  • First I predict if it's Type 2 or not
  • Then I use that result to help predict if it's Type 3
  • Finally I use both previous results to predict Type 4

This works because the Types might be related to each other.

Hierarchical Approach

In this approach:

  • First figure out Type 2
  • Depending on Type 2 result, use a specific model for Type 3
  • Then use both results to pick the right model for Type 4

I'm still working on improving the accuracy. The hierarchical one is more complex but might work better for some email types.