In the ‘train_one_epoch‘ function,I notice that when 'loss.backward()' is carried out for the first time,the Ci is not clustered by k-means method. Clustering does not work until the second data batch in the first epoch enters. This problem make me confused. In other words,why 'loss.backward()' function is defined before the k-means function? In the paper, the k-means function should do before computing the loss. right?