The project is inpired by automatic tagging of transaction records by major banks.
We will break down the project into three disctint steps
- Natural Language Processing (NLP) to vectorise the descriptions column of the transaction
- Manual Clustering to identify the distinct categories based on the description of the each transaction
- Classification of transaction records based on the above found disctinc categories
Following are the break down of taks for this project
- Load data
- Clean data
- Explore data
- Find the column which describes the transation
- NLP
- Clustering - done manually for now by visuall means
- Classifcation
- Testing
- Evaluation
- Production pipeline
- Natural Language Processing module in General Assembly Data Science Course
- Machine Learning, NLP: Text Classification using scikit-learn, python and NLTK
- NLP in Python by Alice Zhao - PyOhio
- Australian Post codes repository
