Skip to content

Why is FastInstDecoder.meta_pos_embed uninitialized? #57

@jneeven

Description

@jneeven

In

self.meta_pos_embed = nn.Parameter(torch.empty(1, hidden_dim, meta_pos_size, meta_pos_size))
a set of weights is created that represents the learnable positional embeddings for the pixel features, as described in the paper. However, it doesn't receive any initialization, and this means that the initial values can possibly be NaN. Why is this? Does it not make sense to give it some kind of initialization, either sine/cosine or normally distributed?

Also, could you provide some insights as to why the resolution of the learnable positional embeddings is directly tied to the number of queries? In case one only wants to detect 2-3 objects per image, there is no need for 100 queries. But there is still a need for 100 positional embedding values.

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions