Hello,
Thank you for presenting awesome ideas with your work and addressing fundamental issues in previous works.
In the Training Setup section of your paper the learning rate is mentioned as 2e-3 whereas your implementation usws 2e-4.
2e-4 sounds more reasonable (due to hifigan baseline). However, I couldn't achieve a balanced training using this value, which always ended up with slight metallic artifact.
I am 1M steps in with 2e-3 and it looks better - but I still have doubts around it.
Can you explain the discrepancy?
Thank you
Hello,
Thank you for presenting awesome ideas with your work and addressing fundamental issues in previous works.
In the Training Setup section of your paper the learning rate is mentioned as 2e-3 whereas your implementation usws 2e-4.
2e-4 sounds more reasonable (due to hifigan baseline). However, I couldn't achieve a balanced training using this value, which always ended up with slight metallic artifact.
I am 1M steps in with 2e-3 and it looks better - but I still have doubts around it.
Can you explain the discrepancy?
Thank you