top of page

Accelerate Neural Networks on GPUs with PyTorch and Autograd

Updated: Aug 26, 2020



Objective


To understand PyTorch as one of the frameworks for developing Neural Network codes on GPU. Describe the autograd functionality, show how to use it and explain how it simplifies the neural network computations. Also share a simple experiment done using the PyTorch. The whole exercise is meant for learning purposes.


Background


It is a known fact that the Graphics Processing Unit (GPU) architecture accelerates performance of AI/Python workloads. Let us talk about NVIDIA GPUs. One can develop C/C++/fortran programs using the CUDA (Compute Unified Device Architecture) framework or the Open ACC specification to exploit the GPUs. But, is there an equivalent framework for AI/Python programs on GPU systems? This question led me to PyTorch, Tensors and autograd. At the end, I was very convinced about the autograd functionality.


Victor’s Introduction to Neural Networks blog is our base reference. Here the neural network (NN) is trained with a simple data given in the Table. Coding is done using Python and NumPy. In this reference, NN basics, math. derivations, gradient descent algorithm and python code are lucidly explained. Readers are encouraged to review the reference first.


The Data and model


Weight, height and the gender of Alice, Bob, Charlie and Diana are given in the Table. Using this data, train the neural Network and build the model. Given a new person’s weight and height, the model should predict the gender.


The NN consists of an input layer (persons' weight and height), one hidden layer of weights and biases, and one output layer of persons-gender. The NN is trained with the help of above data by iterating the feed-forward and back-propagation(back-prop) phases till the model is fully trained.


Autograd package


Instead of explicitly calculating the derivatives, the autograd functionality automatically calculates it for users. This is also known as the automatic differentiation. As the feed-forward is progressing, the gradient is tracked and stored. The Loss, L = (actual – predicted) is a multi-variable function as shown below:


L(w1,w2,w3,w4,w5,w6,b1,b2,b3)


What are the impacts of w1​ w2... on L. For, what values of weights(w) and biases(b) will the loss L be minimum. To answer these, we need to find partial derivatives w1/​∂L, ∂w2/​∂L,… ∂b3/​∂L ​. In a traditional approach, one calculates each of these derivatives. By using autograd, user gets all the required derivatives. Except for the introduction of autograd, there is no changes to the NN algorithm, whatsoever.


PyTorch is the python based scientific computing software optimized for GPUs. Unlike Python which uses NumPy, PyTorch uses Tensors to accelerate the GPU computing. torch.Tensor is the central class of the package. If user sets its attribute .requires_grad as True, as in e.g.

	self.w1 = torch.rand(1, requires_grad=True)

it begins to track operations on it. When the user finishes computation, user can call .backward() as in e.g.

        mseloss.backward()

and have all the gradients computed automatically. The gradient for this tensor will be accumulated into .grad attribute.


Links to the PyTorch code, which was tested on Google Colab using a GPU, and the outputs are provided.


Key Takeaways


In addition to reducing the number of lines of code, the use of autograd also eliminates any possible error in gradient calculations. With autograd, one can simplify the NN backprop phase. The automation is all the more helpful when the size of the network is big. Importantly, PyTorch is a robust programming framework for developing AI/Python applications to run on GPUs, optimally.

bottom of page