Slow tokenization

https://github.com/NKWBTB/PrefScore/blob/05e2ffba0672d16aec02fee710f992927a2b3a25/model.py#L10-L25

Text pieces or pairs are tokenized individually and move to GPU. This is very slow. 

At least we should tokenize them after a batch is loaded. 

	import config as CFG

	class Scorer(nn.Module):
	def __init__(self):
	super(Scorer, self).__init__()
	self.tokenizer = BertTokenizer.from_pretrained(CFG.BERT_MODEL)
	self.model = BertModel.from_pretrained(CFG.BERT_MODEL)
	self.fc = nn.Linear(self.model.config.hidden_size, 1)

	def forward(self, article, summary):
	inputs = self.tokenizer(article, summary, padding='longest', truncation="longest_first" , return_tensors='pt').to(CFG.DEVICE)
	outputs = self.model(**inputs)
	x = self.fc(outputs.pooler_output)
	return x

	class Siamese(nn.Module):

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow tokenization #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Slow tokenization #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions