Skip to content

TFIDF Filtration #57

@pmhauck

Description

@pmhauck

We would like a function that takes in a vocabulary frequency table, a bag-of-words

  1. Computes TFIDF on the vocabulary
  2. Returns the vocabulary, but with the words with the lowest TFIDF removed. The cut off for low TFIDF should be based on some arbitrary, hard-coded parameter.

We will know this is done when we can run the cleaning script, and see a difference in the local vocabulary file.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions