# A Neural Network in 10 lines of CUDA C++ Code

Purpose: For education purposes only. The code demonstrates supervised learning task using a very simple neural network.

Reference: inspired by Andrew Trask‘s post.

Here is a follow-up post featuring a little bit more complicated code:

Neural Network in C++ (Part 2: MNIST Handwritten Digits Dataset)

The core component of the code, the learning algorithm, is only 10 lines:

The loop above runs for 50 iterations (epochs) and fits the vector of attributes X to the vector of classes y. I am going to use 4 records from Iris flower dataset. The attributes (X) are sepal length, sepal width, petal length, and petal width. In my example, I have 2 (Iris Setosa (0) and Iris Virginica (1)) of 3 classes you can find in the original dataset. Predictions are stored in vector pred.

Neural network architecture. Values of vectors W0, W1, layer_1 and pred change over the course of training the network, while vectors X and y must not be changed:

The size of matrix X is the size of the batch by the number of attributes.

Line 3. Finding the values of the hidden layer:

In order to calculate the hidden layer, first of all, we will need to multiply a 4 x 4 matrix X by a 4 x 4 matrix W0. Then, we will need to apply an activation function; in this case, we will use a sigmoid function.

A subroutine for matrix multiplication:

A subroutine for the sigmoid function:

Sigmoid function (red) and its first derivative (blue graph): Line 4. Finding the matrix with predictions pred. In order to do so, we will need to multiply a 4 x 8 matrix l1 by a 8 x 1 matrix W1. Then, we will need to apply an activation function:

Line 5. Determine the vector of prediction errors pred_d. First, subtract pred from y. Then, calculate sigmoid( pred ) and, finally, multiply (elementwise) the result of these two operations.

CUDA kernel which finds the difference between two matrices:

Elemetwise multiplicaton of two vectors:

Line 6. Back propagate the prediction errors to l_1_d. First, multiply pred_d by transposed W1. Then, calculate sigmoid( l1 ) and, finally, multiply (elementwise) the result of these two operations.

A subroutine that multiplies matrix by transposed matrix:

Line 7. Update weights W1 with the result of matrix multiplication of transposed l1 and pred_d:

This line computes weight updates. In order to do that, we need to perform matrix multiplication of transposed matrix X by matrix pred_delta.

Line 8. Update weights W0 with the result of matrix multiplication of transposed X and l_1_d:

Complete code:

Output:

Compile…

… and run

## 4 thoughts on “A Neural Network in 10 lines of CUDA C++ Code”

1. Tejas Arlimatti says:

Do you know how the OpenCV DNN module can be implemented using CUDA for GPU processing?

Like

• cognitivedemons says:

I have not looked into such things yet because the goal was to build custom NNs from scratch.

Like

• Tejas Arlimatti says:

Ok thank you.

Like