Skip to content

After the Python Workflow

Mariana Montes edited this page Sep 26, 2018 · 3 revisions

This is some R code to recover the cosine similarity matrix from the two files generated in the workflow (type and token level). In this input_file, the name must be complete (the filename you would normally need to open the file). The script unzips the .pac file, loads both the .npy and the .meta file, makes the matrix with names, and removes the unzipped files.

library(RcppCNPy);library (rjson)
get_tokvecs <- function(input_file){
  temp <- unzip(input_file, unzip="internal")
  tokvecs <- npyLoad(temp[2])
  metadata <- fromJSON(file = temp[1])
  dimid2item <- names(metadata$`dim-freq-dict`)
  dimnames(tokvecs) <- list(dimid2item, dimid2item)
  file.remove(temp[1], temp[2])
  return(tokvecs)
}

Clone this wiki locally