-
Notifications
You must be signed in to change notification settings - Fork 0
After the Python Workflow
Mariana Montes edited this page Sep 26, 2018
·
3 revisions
This is some R code to recover the cosine similarity matrix from the two files generated in the workflow (type and token level). In this input_file, the name must be complete (the filename you would normally need to open the file). The script unzips the .pac file, loads both the .npy and the .meta file, makes the matrix with names, and removes the unzipped files.
library(RcppCNPy);library (rjson)
get_tokvecs <- function(input_file){
temp <- unzip(input_file, unzip="internal")
tokvecs <- npyLoad(temp[2])
metadata <- fromJSON(file = temp[1])
dimid2item <- names(metadata$`dim-freq-dict`)
dimnames(tokvecs) <- list(dimid2item, dimid2item)
file.remove(temp[1], temp[2])
return(tokvecs)
}