- preprocess.py
when use SentenceTransformer's pretrained model to encode document(title + abstract), the document's collection is determined by the "files_path" variable in preprocess.py.
Why you annotate "data/keyphrase/json/kp20k/kp20k_train.json"(add # at the begin of this line) ?
I think kp20k_train.json's documents should be included when computing the cross-document attention just as your paper shows.
- the size of e and u_k/v_k
I change the sentenceTransformer model so I have a different size of u_k/v_k, should the size of word_vec_size is determined by the sentenceTransformer's model's embedding size?
when use SentenceTransformer's pretrained model to encode document(title + abstract), the document's collection is determined by the "files_path" variable in preprocess.py.
Why you annotate "data/keyphrase/json/kp20k/kp20k_train.json"(add # at the begin of this line) ?
I think kp20k_train.json's documents should be included when computing the cross-document attention just as your paper shows.
I change the sentenceTransformer model so I have a different size of u_k/v_k, should the size of word_vec_size is determined by the sentenceTransformer's model's embedding size?