In
|
self.meta_pos_embed = nn.Parameter(torch.empty(1, hidden_dim, meta_pos_size, meta_pos_size)) |
a set of weights is created that represents the learnable positional embeddings for the pixel features, as described in the paper. However, it doesn't receive any initialization, and this means that the initial values can possibly be NaN. Why is this? Does it not make sense to give it some kind of initialization, either sine/cosine or normally distributed?
Also, could you provide some insights as to why the resolution of the learnable positional embeddings is directly tied to the number of queries? In case one only wants to detect 2-3 objects per image, there is no need for 100 queries. But there is still a need for 100 positional embedding values.
Thank you!
In
FastInst/fastinst/modeling/transformer_decoder/fastinst_decoder.py
Line 52 in 4996a61
Also, could you provide some insights as to why the resolution of the learnable positional embeddings is directly tied to the number of queries? In case one only wants to detect 2-3 objects per image, there is no need for 100 queries. But there is still a need for 100 positional embedding values.
Thank you!