**Purpose:** For education purposes only. The code demonstrates supervised learning task using a very simple neural network.

**Reference: **inspired by Andrew Trask‘s post.

**The core component of the code, the learning algorithm, is only 10 lines**:

The loop above runs for 50 iterations (epochs) and fits the vector of attributes **X** to the vector of classes **y**. I am going to use 4 records from Iris flower dataset. The attributes (**X**) are sepal length, sepal width, petal length, and petal width. In my example, I have 2 (Iris Setosa (**0**) and Iris Virginica (**1**)) of 3 classes you can find in the original dataset. Predictions are stored in vector **pred**.

**Neural network architecture**. Values of vectors **W0, W1, layer_1** and **pred** change over the course of training the network, while vectors **X** and **y** must not be changed**:**

The size of matrix **X** is the size of the batch by the number of attributes.

**Line 3**. Finding the values of the hidden layer:

In order to calculate the hidden layer, first of all, we will need to multiply a 4 x 4 matrix **X** by a 4 x 4 matrix **W0**. Then, we will need to apply an activation function; in this case, we will use a sigmoid function.

A subroutine for matrix multiplication:

A subroutine for the sigmoid function:

Sigmoid function (red) and its first derivative (blue graph):

**Line 4**. Finding the matrix with predictions **pred.** In order to do so, we will need to multiply a 4 x 8 matrix l1 by a 8 x 1 matrix W1. Then, we will need to apply an activation function:

**Line 5**. Determine the vector of prediction errors **pred_d**. First, subtract **pred** from **y**. Then, calculate **sigmoid( pred )** and, finally, multiply (elementwise) the result of these two operations.

CUDA kernel which finds the difference between two matrices:

Elemetwise multiplicaton of two vectors:

**Line 6**. Back propagate the prediction errors to **l_1_d**. First, multiply **pred_d** by transposed **W1**. Then, calculate **sigmoid( l1 )** and, finally, multiply (elementwise) the result of these two operations.

A subroutine that multiplies matrix by transposed matrix:

**Line 7**. Update weights **W1** with the result of matrix multiplication of transposed l1 and pred_d:

This line computes weight updates. In order to do that, we need to perform matrix multiplication of transposed matrix **X** by matrix **pred_delta**.

**Line 8**. Update weights **W0** with the result of matrix multiplication of transposed **X** and **l_1_d**:

**Complete code:**

**Output**:

Compile…

… and run