GitHub - clairewangjia/Tweet-Text-Classifier: Pre-process and classify 6000 tweets into 10 pre-labeled class · GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
README.md		README.md
early_classifier.py		early_classifier.py
early_processor.py		early_processor.py
late_classifier.py		late_classifier.py
late_processor.py		late_processor.py

Repository files navigation

####A0176605B Wang Jia##### Lab 1: Microblog Clssification Deadline: 19 Feb 2018

Environment Setting

Python 3.6

Installation

nltk
simplejson
pickle
numpy
scipy
scikit-learn

Usage

Early Fusion Classifier

Run 'early_processor.py' to prepocess tweet attributes, including feature extraction and combination, data cleaning (e.g., remove url, punctuations, time), word tokenize, stemming, remove low-frequency words and stopping words.
Run 'early_classifier.py', which adopts a random forest classifier. You are supposed to see performances (classification score, average percision, average recall, f1 score) printing into the screen.

Late Fusion Classifier

Run 'late_processor.py' to prepocess tweet attributes, including data cleaning (e.g., remove url, punctuations, time), word tokenize, stemming, remove low-frequency words and stopping words. The three features(tweets texts, description, hashtag) will extracted and processed seperately.
Run 'late_classifier.py', which adopts 3 individual classifier and a combined model. You are supposed to see performances (classification score, average percision, average recall, f1 score) of the 3 single model and the final combined model printing into the screen.

About

Pre-process and classify 6000 tweets into 10 pre-labeled class

Report repository

Releases

No releases published

Packages

Contributors

Languages

Python 100.0%