A Neural Network in 10 lines of CUDA C++ Code

Purpose: For education purposes only. The code demonstrates supervised learning task using a very simple neural network.

Reference: inspired by Andrew Trask‘s post.

The core component of the code, the learning algorithm, is only 10 lines:

The loop above runs for 50 iterations (epochs) and fits the vector of attributes X to the vector of classes y. I am going to use 4 records from Iris flower dataset. The attributes (X) are sepal length, sepal width, petal length, and petal width. In my example, I have 2 (Iris Setosa (0) and Iris Virginica (1)) of 3 classes you can find in the original dataset. Predictions are stored in vector pred.

Neural network architecture. Values of vectors W0, W1, layer_1 and pred change over the course of training the network, while vectors X and y must not be changed:

The size of matrix X is the size of the batch by the number of attributes.

Line 3. Finding the values of the hidden layer:

In order to calculate the hidden layer, first of all, we will need to multiply a 4 x 4 matrix X by a 4 x 4 matrix W0. Then, we will need to apply an activation function; in this case, we will use a sigmoid function.

A subroutine for matrix multiplication:

A subroutine for the sigmoid function:

Sigmoid function (red) and its first derivative (blue graph):
desmos-graph

Line 4. Finding the matrix with predictions pred. In order to do so, we will need to multiply a 4 x 8 matrix l1 by a 8 x 1 matrix W1. Then, we will need to apply an activation function:

 

Line 5. Determine the vector of prediction errors pred_d. First, subtract pred from y. Then, calculate sigmoid( pred ) and, finally, multiply (elementwise) the result of these two operations.

CUDA kernel which finds the difference between two matrices:

Elemetwise multiplicaton of two vectors:

Line 6. Back propagate the prediction errors to l_1_d. First, multiply pred_d by transposed W1. Then, calculate sigmoid( l1 ) and, finally, multiply (elementwise) the result of these two operations.

A subroutine that multiplies matrix by transposed matrix:

 

Line 7. Update weights W1 with the result of matrix multiplication of transposed l1 and pred_d:

This line computes weight updates. In order to do that, we need to perform matrix multiplication of transposed matrix X by matrix pred_delta.

 

Line 8. Update weights W0 with the result of matrix multiplication of transposed X and l_1_d:

 

Complete code:

Output:

Compile…

… and run

Advertisements

One thought on “A Neural Network in 10 lines of CUDA C++ Code

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s