Conversation
papers/Highway Networks.ipynb
Outdated
|
Nice and well-explained! For now, I thought that "paper" Recipes should aim to faithfully reproduce results from a paper, i.e., use the same hyperparameters and achieve the same results. That's probably not always feasible, though. Still, to get a bit closer to the paper, you could follow the hints in Section 4: Transform gate biases were initialized to -2 for MNIST, not -4, and throughout the paper they use 50 hidden units, not 40. It would be cool to try reproducing Figure 2, or at least to add a final comment that you didn't try that but we'd welcome pull requests doing so. |
|
I don't have much time to work on this right now (beyond cosmetic improvements) so I'm okay with moving it to 'examples' as well if you prefer. |
I still think it fits in "papers" with the perspective of it reproducing Figure 2 sometime. Don't worry if you can't add a comment in the end, we can also add an Issue for that. |
|
I was actually planning to have a look at it today. But I can do a new PR after you merge them as well, up to you. |
|
If you're working on them further that's great, let's wait till you're On Sun, Aug 30, 2015 at 10:25 AM, Sander Dieleman notifications@github.com
|
|
Sure :) I'll try to sort out @f0k's remarks by tonight. |
|
Just tried initializing the biases to I will have a go at reproducing (a subset of) figure 2 though! |
|
I tried changing some parameter values to match the paper better, but in the end I decided to leave most of them as they were because it just caused trouble :) Haven't gotten around to doing anything with the figure either, if anyone wants to do that in a separate PR feel free. |
Yes, if it's difficult to find settings that work, we should just leave it at that. I didn't expect that to cause trouble. Thanks for trying and documenting it! Looks good to merge for me. Eben, you could add an Issue about reproducing Figure 2, maybe somebody is interested in trying that sometime. |
Here's a revamped version of the notebook I did of "Highway Networks" by Srivastava et al. (2015): http://arxiv.org/abs/1505.00387