How did you solve the problem that the loss is always NaN when training SV and MV with ESAM?

How did you solve the problem that the loss is always NaN when training SV and MV with ESAM?
And did you really successfully reproduce the ESAM paper?