Thanks for providing the source code of this fantastic architecture. I am trying to clarify the learning rate decay as mentioned in your opts.lua file - is the learning rate decay 1e-1 or 1e-7 every 100 epochs? From your training, it seems that you didnt set the -d parameter, so would the decay go to 1e-7 by default?
However, the comment you gave for lrDecayEvery is:
--lrDecayEvery (default 100) Decay learning rate every X epoch by 1e-1
So I'd like to ask if the decay rate should be 1e-7 or 1e-1 every 100 epochs.
Also, what do you mean by # samples in this line?
-d,--learningRateDecay (default 1e-7) learning rate decay (in # samples)
Also, could I know how you performed your preprocessing for the training/evaluation data?
Thanks for providing the source code of this fantastic architecture. I am trying to clarify the learning rate decay as mentioned in your
opts.luafile - is the learning rate decay 1e-1 or 1e-7 every 100 epochs? From your training, it seems that you didnt set the-dparameter, so would the decay go to 1e-7 by default?However, the comment you gave for
lrDecayEveryis:--lrDecayEvery (default 100) Decay learning rate every X epoch by 1e-1So I'd like to ask if the decay rate should be 1e-7 or 1e-1 every 100 epochs.
Also, what do you mean by
# samplesin this line?-d,--learningRateDecay (default 1e-7) learning rate decay (in # samples)Also, could I know how you performed your preprocessing for the training/evaluation data?