Why the KL Divergence Loss needs to be multiplied by number of classes?

https://github.com/alinlab/Confident_classifier/blob/462db01967f8a96374f2ab6a534b7c81fd872d2f/src/run_joint_confidence.py#L156
I'm bit confused about why you needa multiply the KL divergence loss by the **number of classes**.
I can't find it from the definition of your paper, could you briefly explain it?