**Purpose:** For education purposes only. The code demonstrates supervised learning task using a very simple neural network. In my next post, I am going to replace the vast majority of subroutines with CUDA kernels.

**Reference:** Andrew Trask‘s post.

**The core component of the code, the learning algorithm, is only 10 lines**:

The loop above runs for 50 iterations (epochs) and fits the vector of attributes **X** to the vector of classes **y **through the vector of weights** W**. I am going to use 4 records from Iris flower dataset. The attributes (**X**) are sepal length, sepal width, petal length, and petal width. In my example, I have 2 (Iris Setosa (**0**) and Iris Virginica (**1**)) of 3 classes you can find in the original dataset. Predictions are stored in vector **pred**.

**Neural network architecture**. Values of vectors **W** and **pred** change over the course of training the network, while vectors **X** and **y** must not be changed**:**

The size of matrix **X** is the size of the batch by the number of attributes.

**Line 3**. Make predictions:

In order to calculate predictions, first of all, we will need to multiply a 4 x 4 matrix **X** by a 4 x 1 matrix **W**. Then, we will need to apply an activation function; in this case, we will use a sigmoid function.

A subroutine for matrix multiplication:

A subroutine for the sigmoid function:

Sigmoid function (red) and its first derivative (blue graph):

**Line 4**. Calculate **pred_error,** it is simply a difference between the predictions and the truth:

In order to subtract one vector from another, we will need to overload the “-” operator:

**Line 5**. Determine the vector of deltas **pred_delta**:

In order to perform elemetwise multiplicaton of two vectors, we will need to overload the “*” operator:

A subroutine for the derivative of the sigmoid function (d_sigmoid):

Basically, we use the first derivative to find the slope of the line tangent to the graph of the sigmoid function. At x = 0 the slope equals to 0.25. The further the prediction is from 0, the closer the slope is to 0: at x = ±10 the slope equals to 0.000045. Hence, the deltas will be small if either the error is small or the network is very confident about its prediction (i.e. abs(x) is greater than 4).

**Line 6**. Calculate **W_delta**:

This line computes weight updates. In order to do that, we need to perform matrix multiplication of transposed matrix **X** by matrix **pred_delta**.

The subroutine that transposes matrices:

**Line 7**. Update the weights **W**:

In order to perform matrix addition operation, we need to overload the “+” operator:

**Complete code:**

Output:

Hi, i copied your code after understanding it and try to run in dev c++ but it shows so many errors like this: X must be initialize by constructor not by {…}, and extended initializer lists only available with -std=c++11 or -std=gnu++11 and there are so many more.. please help me

LikeLike

Yeah, it must be the version of the compiler. I tested the code with c++11. I guess if you can rewrite the code in the way that works for your compiler, you would learn much more and in detail. Otherwise, you can use the compliment command in the comment at the very top of the code.

LikeLike

nice code. why did you pass in the X vector as a pointer in the transpose func?

LikeLike

i would love to read this, but the advertisement keeps making the page scroll to the bottom…

LikeLike

Try to read it here: https://translate.google.com/translate?sl=ru&tl=en&js=y&prev=_t&hl=en&ie=UTF-8&u=https%3A%2F%2Fcognitivedemons.wordpress.com%2F2017%2F07%2F06%2Fa-neural-network-in-10-lines-of-c-code%2F&edit-text=&act=url

LikeLike