Conversation
… Fix weights and keys initialization
Traceback (most recent call last):
File "fancy_mlp.py", line 74, in <module>
main()
File "fancy_mlp.py", line 71, in main
train(inputs, keys, layer_one_weights, layer_two_weights)
File "fancy_mlp.py", line 53, in train
layer_one_weights += np.dot(layer_one_prediction.T, layer_one_change_in_error)
ValueError: operands could not be broadcast together with shapes (3,4) (4,4) (3,4)
Increase training steps to 40000
|
|
||
| # Figure out how wrong our output for layer_one was by seeing how wrong the layer_two_prediction was | ||
| layer_one_error = np.dot(layer_two_change_in_error, layer_two_weights.T) | ||
| # Figure out how wrong our output for layer_one was by seeing how wrong the layer_two_prediction was |
There was a problem hiding this comment.
Woops, I have been trying out vim and messed up the spacing. Reverting it :)
| # Just like in simple_mlp.py | ||
| layer_one_change_in_error = layer_one_error * applySigmoid(layer_one_error, True) | ||
| # Just like in simple_mlp.py | ||
| layer_one_change_in_error = layer_one_error * applySigmoid(layer_one_prediction, True) |
There was a problem hiding this comment.
Wait, why did this change? This should be layer_one_error
There was a problem hiding this comment.
Again, this is inconsistent with simple_mlp.py implementation and will lead to no convergence.
It is even inconsistent with the layer_two_change_in_error definition above.
layer_*N*_change_in_error should be defined as layer_*N*_error * applySigmoid(layer_*N*_prediction, True).
I refer to Wikipedia Backpropagation (See Phase 2: weight update in Algorithm in code) where for each weight:
- The weight's output delta and input activation are multiplied to find the gradient of the weight.
- A ratio of the weight's gradient is subtracted from the weight.
Give it a try on your own laptop, you will see the difference!
To check Python 3 compatibility you can create a conda virtual environment with Python 3, but I bet you know about it ;)
| print_data(iter, inputs, keys, weights, prediction) | ||
| # adjust your weights accoridngly. | ||
| layer_one_weights += np.dot(inputs.T, layer_one_change_in_error) | ||
| layer_two_weights += np.dot(layer_one_prediction.T, |
There was a problem hiding this comment.
I'll look at this more later, but shouldn't this be
layer_one_weights += np.dot(layer_one_prediction.T, layer_one_change_in_error)
layer_two_weights += np.dot(layer_two_prediction.T, layer_two_change_in_error)
There was a problem hiding this comment.
Then it's inconsistent with the simple_mlp.py.
In there you update weights by doing: weights += np.dot(inputs.T, change_in_error), which works fine.
Have you tried running your fancy_mlp implementation?
First off it throws an error since the shapes do not match (See PR commit message for details).
Now if I only change layer_one_weights += np.dot(layer_one_prediction.T, layer_one_change_in_error) and layer_two_weights += np.dot(layer_two_prediction.T, layer_two_change_in_error) it does not converge.
This is the training output after 40000 steps:
Output After Training:
[[0.49949126]
[0.49944283]
[0.49888797]
[0.50120373]]This is why I also change layer_one_change_in_error. Again, this is consistent with the simple_mlp.py implementation.
And there is the result after 40000 steps:
Output After Training:
[[0.0052001 ]
[0.99379807]
[0.99356872]
[0.00481841]]
I refactored the current code base so that it works with Python 3 as well as Python 2 (mostly adding parenthesis to print statements and ditching
xrangefor built-inrange). I updated theREADME.mdsection accordingly.I tested both using
condavirtual environments (plain Python 2.7/3.6 withnumpy).I fixed the broken
fancy_mlp.pyimplementation as it was throwing aValueError.I also increased the number of training steps in
fancy_mlp.pyfrom20000to40000as convergence has proven to be slow.