-
Notifications
You must be signed in to change notification settings - Fork 94
Description
Background
Under Google Code-In, I used the sentiment analysis model in TextAnalysis.jl to analyse the amazon reviews dataset.. I performed basic text pre-processing to increase the metrics of the model. Some tasks undertaken were:
- stemming words in each review
- removing corrupted characters
- removing definite and indefinite articles
I also found that remove_numbers!() (another pre-processing function mentioned in the Docs) gave an error on running. On further inspection, I found that it isn't still implemented in the src/preprocessing.jl folder. It is an issue worth looking into.
Also, a BoundsError occurred in the midst of the run.
BoundsError: attempt to access 32×5000 Array{Float32,2} at index [Base.Slice(Base.OneTo(32)), 5001]
This didn't effect the running and the results that I get are presented below.
Result
I learnt how precision, recall and f1score are different metrics for measuring how well the model performs and was a wonderful learning experience.
Precision : 0.583117838593833
Recall : 0.5144996465068449
F1Score : 0.5466638895622987