Hello guys,
just like the Glove, I created a dictionary of all the possible words. with keys as words and values as 768 embedding vector for BERT.
But when I use this dictionary and train the model, the loss is getting nan in 1st epoch only.
- How to handle this problem?
- what are the possible reasons for getting a loss 'nan'?
- Is this a good approach, to make a dictionary of embedding vectors?