Start with optimization.ipynb, a jupyter notebook that creates a function and finds its minimum using gradient descent. The gradients are computed using auto-differentiation (AD). This uses a small AD module that works for scalars only.
The notebook Advanced_NN.ipynb demonstrates how a 2D matrix version of AD is used in a neural network.