Skip to content

Explanation of figure 8 #2

@UFO-101

Description

@UFO-101

I am quite confused about figure 8 in your paper. Assuming the shared vocabulary size is significantly larger than the embedding size in all models (which it appears to be from figure 2), I would expect matrices A and B to span the whole embedding space. I would also expect a randomly initialized matrix to span the full embedding space. In which case the maximum canonical angle would always be 0. What am I missing?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions