Implement quanteda.textstats PMI calculation / corpus and dfm support

The quanteda.textstats PMI calculation seems faster and more robust, as the current widyr calculation can fail for very large matrices.

Requires turning tidy data into a dfm with `tidytext::cast_dfm()` (potentially setting a dummy "value" of 1 or count multiple occurrences of the same entity in a document to account for expected format)

While at it, should also implement native corpus/dfm support -  also when using simple co-occurrence counts, rather than PMI

Would only affect `calculate_network()` (and potentially some data checks to accept corpus/dfm objects in the functions utilizing it)

If full corpora are supported, document that it is generally not recommended to use all words for text network analysis // check scalability of the method for large corpora

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement quanteda.textstats PMI calculation / corpus and dfm support #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Implement quanteda.textstats PMI calculation / corpus and dfm support #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions