Skip to content

Implement quanteda.textstats PMI calculation / corpus and dfm support #2

@TimBMK

Description

@TimBMK

The quanteda.textstats PMI calculation seems faster and more robust, as the current widyr calculation can fail for very large matrices.

Requires turning tidy data into a dfm with tidytext::cast_dfm() (potentially setting a dummy "value" of 1 or count multiple occurrences of the same entity in a document to account for expected format)

While at it, should also implement native corpus/dfm support - also when using simple co-occurrence counts, rather than PMI

Would only affect calculate_network() (and potentially some data checks to accept corpus/dfm objects in the functions utilizing it)

If full corpora are supported, document that it is generally not recommended to use all words for text network analysis // check scalability of the method for large corpora

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions