Explanation of figure 8

I am quite confused about figure 8 in your paper. Assuming the shared vocabulary size is significantly larger than the embedding size in all models (which it appears to be from figure 2), I would expect matrices A and B to span the whole embedding space. I would also expect a randomly initialized matrix to span the full embedding space. In which case the maximum canonical angle would always be 0. What am I missing?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explanation of figure 8 #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Explanation of figure 8 #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions