A Neural Network in 10 lines of CUDA C++ Code

Purpose: For education purposes only. The code demonstrates supervised learning task using a very simple neural network.

Reference: inspired by Andrew Trask‘s post.

The core component of the code, the learning algorithm, is only 10 lines:

The loop above runs for 50 iterations (epochs) and fits the vector of attributes X to the vector of classes y. I am going to use 4 records from Iris flower dataset. The attributes (X) are sepal length, sepal width, petal length, and petal width. In my example, I have 2 (Iris Setosa (0) and Iris Virginica (1)) of 3 classes you can find in the original dataset. Predictions are stored in vector pred.

Neural network architecture. Values of vectors W0, W1, layer_1 and pred change over the course of training the network, while vectors X and y must not be changed:

The size of matrix X is the size of the batch by the number of attributes.

Line 3. Finding the values of the hidden layer:

In order to calculate the hidden layer, first of all, we will need to multiply a 4 x 4 matrix X by a 4 x 4 matrix W0. Then, we will need to apply an activation function; in this case, we will use a sigmoid function.

A subroutine for matrix multiplication:

A subroutine for the sigmoid function:

Sigmoid function (red) and its first derivative (blue graph):
desmos-graph

Line 4. Finding the matrix with predictions pred. In order to do so, we will need to multiply a 4 x 8 matrix l1 by a 8 x 1 matrix W1. Then, we will need to apply an activation function:

 

Line 5. Determine the vector of prediction errors pred_d. First, subtract pred from y. Then, calculate sigmoid( pred ) and, finally, multiply (elementwise) the result of these two operations.

CUDA kernel which finds the difference between two matrices:

Elemetwise multiplicaton of two vectors:

Line 6. Back propagate the prediction errors to l_1_d. First, multiply pred_d by transposed W1. Then, calculate sigmoid( l1 ) and, finally, multiply (elementwise) the result of these two operations.

A subroutine that multiplies matrix by transposed matrix:

 

Line 7. Update weights W1 with the result of matrix multiplication of transposed l1 and pred_d:

This line computes weight updates. In order to do that, we need to perform matrix multiplication of transposed matrix X by matrix pred_delta.

 

Line 8. Update weights W0 with the result of matrix multiplication of transposed X and l_1_d:

 

Complete code:

Output:

Compile…

… and run

Advertisements

Presenting the Importance of Random Initialization of the Weights

The problem of weights initialization is explained here.

“This turns out to be a mistake, because if every neuron in the network computes the same output, then they will also all compute the same gradients during backpropagation and undergo the exact same parameter updates. In other words, there is no source of asymmetry between neurons if their weights are initialized to be the same.”

Basically, if done improperly, it would result in serious problems with learning features. This post is intended to provide some simple evidence of the importance of the asymmetry in weights initialization.

Configuration of the neural network:

Learning loop:

Predictions if initialized with assymetrical weights:

Predictions if all weights are initialized with 0.1s:

After 10,000 iterations the network failed to solve simple XOR problem — embarrasing, kind of.

de92b82ad724fb06d13a7ca60226a4c2-e1500839903780.jpeg

Complete code:

Downloading more than 20 years of The New York Times

Articles for the period from 1987 to present are available without subscription. Their copyright notice is web scraping friendly:

“… you may download material from The New York Times on the Web (one machine readable copy and one print copy per page) for your personal, noncommercial use only.”

Why waste the opportunity to download these articles then?

fat-pope-y-tho.jpg

Please read their terms of service here.
Please subscribe to The New York Times here.

Next time, I’ll modify the code so you can download articles from some other major online newspaper.

A Neural Network in 10 lines of C++ Code

Purpose: For education purposes only. The code demonstrates supervised learning task using a very simple neural network. In my next post, I am going to replace the vast majority of subroutines with CUDA kernels.

Reference: Andrew Trask‘s post.

The core component of the code, the learning algorithm, is only 10 lines:

The loop above runs for 50 iterations (epochs) and fits the vector of attributes X to the vector of classes y through the vector of weights W. I am going to use 4 records from Iris flower dataset. The attributes (X) are sepal length, sepal width, petal length, and petal width. In my example, I have 2 (Iris Setosa (0) and Iris Virginica (1)) of 3 classes you can find in the original dataset. Predictions are stored in vector pred.

Neural network architecture. Values of vectors W and pred change over the course of training the network, while vectors X and y must not be changed:

The size of matrix X is the size of the batch by the number of attributes.

Line 3. Make predictions:

In order to calculate predictions, first of all, we will need to multiply a 4 x 4 matrix X by a 4 x 1 matrix W. Then, we will need to apply an activation function; in this case, we will use a sigmoid function.

A subroutine for matrix multiplication:

A subroutine for the sigmoid function:

Sigmoid function (red) and its first derivative (blue graph):
desmos-graph

Line 4. Calculate pred_error, it is simply a difference between the predictions and the truth:

In order to subtract one vector from another, we will need to overload the “-” operator:

Line 5. Determine the vector of deltas pred_delta:

In order to perform elemetwise multiplicaton of two vectors, we will need to overload the “*” operator:

A subroutine for the derivative of the sigmoid function (d_sigmoid):

Basically, we use the first derivative to find the slope of the line tangent to the graph of the sigmoid function. At x = 0 the slope equals to 0.25. The further the prediction is from 0, the closer the slope is to 0: at x = ±10 the slope equals to 0.000045. Hence, the deltas will be small if either the error is small or the network is very confident about its prediction (i.e. abs(x) is greater than 4).

Line 6. Calculate W_delta:

This line computes weight updates. In order to do that, we need to perform matrix multiplication of transposed matrix X by matrix pred_delta.

The subroutine that transposes matrices:

Line 7. Update the weights W:

In order to perform matrix addition operation, we need to overload the “+” operator:

Complete code:

Output:

How to install NVIDIA CUDA 8.0, cuDNN 5.1, TensorFlow, and Keras on Ubuntu 16.04

Please follow the instructions below and you will be rewarded with Keras with Tenserflow backend and, most importantly, GPU support.

The latest version of CUDA Toolkit you can download from here. It is also clear from that page which versions of Ubuntu are supported.

screen-shot-2017-07-10-at-1-02-15-am.png

The latest version of cuDNN you can download from here. TensorFlow, however, requires cuDNN 5.1 and GPU card with CUDA Compute Capability 3.0 or higher.

Step 1. Linux

Update apt repositories and install the linux -image-extra-virtual package.
This package includes the kernel module that’s required by the NVIDIA drivers.

sudo apt-get update
sudo apt-get install -y linux-image-extra-virtual

Install the version of the headers that matches the freshly installed kernel from the previous step.

sudo apt-get install linux-source linux-headers-`uname-r`
sudo reboot

Step 2. Python

Download (from here) and Install Anaconda Python 3.6 64 bit

chmod +x Anaconda3-4.4.0-Linux-x86_64.sh
sudo ./Anaconda3-4.3.1-Linux-x86_64.sh

Step 3. NVIDIA Drivers and CUDA

Blacklist Noveau which has a conflict with the NVIDIA Drivers

echo -e "blacklist nouveau\nblacklist lbm-nouveau\noptions nouveau modeset=0\nalias nouveau off\nalias lbm-nouveau off\n" | sudo tee /etc/modprobe.d/blacklist-nouveau.conf

Disable the Kernel Nouveau

echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf
sudo update-initramfs -u
sudo reboot

Download the Installer and make it executable

chmod +x cuda_8.0.61_375.26_linux.run

Hit Ctrl + Alt + F1

Kill X server

sudo systemctl stop lightdm.service
sudo init 3

Run the Installer and accept the license agreement and install samples

sudo sh cuda_8.0.61_375.26_linux.run

Enable NVIDIA Driver

sudo modprobe nvidia

Restart X server

sudo service lightdm restart

Compile and run the deviceQuery sample from the CUDA distribution to validate the NVIDIA driver installation was successful.

cd /home/evg/NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery/
make ./deviceQuery

valid-results-from-sample-cuda-devicequery-program.png

Step 4. cuDNN v5.1 for CUDA 8.0

Download cuDNN

Unzip the .tar archive

tar -xzf cudnn-8.0-linux-x64-v5.1.tgz

Copy the cuDNN libraries and header file to the CUDA folders

sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*

Add some environment variables

gedit ~/.bashrc

Insert the following lines and save the changes

export CUDA_HOME="/usr/local/cuda"
export LD_LIBRARY_PATH="/usr/local/cuda-8.0/lib64"
export PATH="/usr/local/cuda-8.0/bin:$PATH"

Enable changes in bashrc

source ~/.bashrc

Check if the environment variables contain the paths from the previous step

echo $CUDA_HOME
echo $PATH
echo $LD_LIBRARY_PATH

alias sudo='sudo env PATH=$PATH'

Step 5. Tensorflow

Create a conda environment named tensorflow to run a version of Python by invoking the following command:

conda create -n tensorflow

Activate the conda environment by issuing the following command:

source activate tensorflow

Issue a command of the following format to install TensorFlow inside your conda environment:

sudo pip install –ignore-installed –upgrade TF_PYTHON_URL where TF_PYTHON_URL is the URL of the TensorFlow Python package. For example, the following command installs the CPU-only version of TensorFlow for Python 3.6:

sudo pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.1.0-cp36-cp36m-linux_x86_64.whl

Test Tensorflow

Invoke python from your shell as follows:

python

Enter the following short program inside the python interactive shell:

import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))

Exit python interactive shell

exit()

Step 6. Keras

sudo pip install keras