After the Python Workflow

This is some R code to recover the cosine similarity matrix from the two files generated in the workflow (type and token level). In this input_file, the name must be complete (the filename you would normally need to open the file). The script unzips the .pac file, loads both the .npy and the .meta file, makes the matrix with names, and removes the unzipped files.

library(RcppCNPy);library (rjson)
get_tokvecs <- function(input_file){
  temp <- unzip(input_file, unzip="internal")
  tokvecs <- npyLoad(temp[2])
  metadata <- fromJSON(file = temp[1])
  dimid2item <- names(metadata$`dim-freq-dict`)
  dimnames(tokvecs) <- list(dimid2item, dimid2item)
  file.remove(temp[1], temp[2])
  return(tokvecs)
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After the Python Workflow

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally