I would like to ask how the step hint was calculated and whether this could be changed to create a LR schedule based on runtime rather than steps, as this would affect an algorithm like shampoo with a longer time per step negatively if the step hint was based off of NadamW.
Thanks!
I would like to ask how the step hint was calculated and whether this could be changed to create a LR schedule based on runtime rather than steps, as this would affect an algorithm like shampoo with a longer time per step negatively if the step hint was based off of NadamW.
Thanks!