Skip to content

2 issues about the range of documents when computing cross-document attention and the size of sentenceTransformer's embedding u_k/v_k and sentential encoding e #6

@xxr5566833

Description

@xxr5566833
  1. preprocess.py

when use SentenceTransformer's pretrained model to encode document(title + abstract), the document's collection is determined by the "files_path" variable in preprocess.py.

Why you annotate "data/keyphrase/json/kp20k/kp20k_train.json"(add # at the begin of this line) ?

I think kp20k_train.json's documents should be included when computing the cross-document attention just as your paper shows.

  1. the size of e and u_k/v_k

I change the sentenceTransformer model so I have a different size of u_k/v_k, should the size of word_vec_size is determined by the sentenceTransformer's model's embedding size?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions