We would like a function that takes in a vocabulary frequency table, a bag-of-words
- Computes TFIDF on the vocabulary
- Returns the vocabulary, but with the words with the lowest TFIDF removed. The cut off for low TFIDF should be based on some arbitrary, hard-coded parameter.
We will know this is done when we can run the cleaning script, and see a difference in the local vocabulary file.
We would like a function that takes in a vocabulary frequency table, a bag-of-words
We will know this is done when we can run the cleaning script, and see a difference in the local vocabulary file.