-
Notifications
You must be signed in to change notification settings - Fork 0
Description
The quanteda.textstats PMI calculation seems faster and more robust, as the current widyr calculation can fail for very large matrices.
Requires turning tidy data into a dfm with tidytext::cast_dfm() (potentially setting a dummy "value" of 1 or count multiple occurrences of the same entity in a document to account for expected format)
While at it, should also implement native corpus/dfm support - also when using simple co-occurrence counts, rather than PMI
Would only affect calculate_network() (and potentially some data checks to accept corpus/dfm objects in the functions utilizing it)
If full corpora are supported, document that it is generally not recommended to use all words for text network analysis // check scalability of the method for large corpora