-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
I am quite confused about figure 8 in your paper. Assuming the shared vocabulary size is significantly larger than the embedding size in all models (which it appears to be from figure 2), I would expect matrices A and B to span the whole embedding space. I would also expect a randomly initialized matrix to span the full embedding space. In which case the maximum canonical angle would always be 0. What am I missing?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels