Why is GCE implemented as `F.cross_entropy(logits, targets, reduction='none') * (Yg.squeeze().detach()**self.q)*self.q` instead of `(1 - Yg.squeeze().detach()**self.q) / self.q` ?