Dear authors,
Thank you for making the code public. I was looking through the loss computation of prototypical networks in this line and wondering what is the rational behind multiplying the batch loss with the number of query samples. As it seems to me the loss was already computed using all query samples in the batch in this line, so this scaling may not be necessary? Probably I misunderstood it, so please let me know what you think. Thank you!