2 issues about the range of  documents when computing cross-document attention and the size of sentenceTransformer's embedding u_k/v_k and sentential encoding e

1. preprocess.py

when use SentenceTransformer's pretrained model to encode document(title + abstract),  the document's collection is determined by the "files_path" variable in preprocess.py. 

Why you annotate "data/keyphrase/json/kp20k/kp20k_train.json"(add # at the begin of this line) ?

 I think kp20k_train.json's documents should be included when computing the cross-document attention just as your paper shows.

2. the size of e and u_k/v_k

I change the sentenceTransformer model so I have a different size of u_k/v_k,  should the size of word_vec_size is determined by the sentenceTransformer's model's embedding size?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2 issues about the range of documents when computing cross-document attention and the size of sentenceTransformer's embedding u_k/v_k and sentential encoding e #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

2 issues about the range of documents when computing cross-document attention and the size of sentenceTransformer's embedding u_k/v_k and sentential encoding e #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions