Purpose: For education purposes only. The code demonstrates supervised learning task using a very simple neural network.
Here is a follow-up post featuring a little bit more complicated code:
The core component of the code, the learning algorithm, is only 10 lines:
The loop above runs for 50 iterations (epochs) and fits the vector of attributes X to the vector of classes y. I am going to use 4 records from Iris flower dataset. The attributes (X) are sepal length, sepal width, petal length, and petal width. In my example, I have 2 (Iris Setosa (0) and Iris Virginica (1)) of 3 classes you can find in the original dataset. Predictions are stored in vector pred.
Neural network architecture. Values of vectors W0, W1, layer_1 and pred change over the course of training the network, while vectors X and y must not be changed:
The size of matrix X is the size of the batch by the number of attributes.
Line 3. Finding the values of the hidden layer:
In order to calculate the hidden layer, first of all, we will need to multiply a 4 x 4 matrix X by a 4 x 4 matrix W0. Then, we will need to apply an activation function; in this case, we will use a sigmoid function.
A subroutine for matrix multiplication:
A subroutine for the sigmoid function:
Sigmoid function (red) and its first derivative (blue graph):
Line 4. Finding the matrix with predictions pred. In order to do so, we will need to multiply a 4 x 8 matrix l1 by a 8 x 1 matrix W1. Then, we will need to apply an activation function:
Line 5. Determine the vector of prediction errors pred_d. First, subtract pred from y. Then, calculate sigmoid( pred ) and, finally, multiply (elementwise) the result of these two operations.
CUDA kernel which finds the difference between two matrices:
Elemetwise multiplicaton of two vectors:
Line 6. Back propagate the prediction errors to l_1_d. First, multiply pred_d by transposed W1. Then, calculate sigmoid( l1 ) and, finally, multiply (elementwise) the result of these two operations.
A subroutine that multiplies matrix by transposed matrix:
Line 7. Update weights W1 with the result of matrix multiplication of transposed l1 and pred_d:
This line computes weight updates. In order to do that, we need to perform matrix multiplication of transposed matrix X by matrix pred_delta.
Line 8. Update weights W0 with the result of matrix multiplication of transposed X and l_1_d:
… and run