In this post, I’ll describe how a neural network with two hidden layers works. The code is highly unoptimized to make it as simple to understand as possible. I’ll train the model on a part of MNIST dataset. So, you will need to download this file containing both the labels (1st column) and the variables. Size of y is 42000×1, and the size of X is 42000×784. Every line of X is a 28×28 grayscale picture of a handwritten number. Every element of y is a number from 0 to 9.
The whole code is here with the explanations following after it:
The neural network part is pretty short:
The most interesting is, probably, backpropagation:
Step 1. Calculate the loss and its derivative
The first thing you need to know here is the loss function you are going to use. Here I use Squared Error (more precisely it is 1/2 SE).
In order to propagate the loss, first of all, we need to calculate the derivative of the loss w.r.t the prediction vector yhat.
If you use any othe loss function, you need to find its derivative w.r.t yhat.
Step 2. Calculate the gradient of the matrix of parameters W3
Step 3. Calculate the gradient of the second hidden layer
Step 4. Calculate the gradient of the matrix of parameters W2
Step 5. Calculate the gradient of the first hidden layer
Step 6. Calculate the gradient of the matrix of parameters W1
Step 7. Update parameters W1, W2, and W3
The output will look like the following: