PyTorch on Google Colab (1)

Vaibhav Kumar Chaudhary
8 min readMay 25, 2019

--

This article is a part of a series that I am writing on PyTorch. If you are starting with Deep Learning then these articles will be of great benefit to you.

Basic Outline for this article is

  • What is PyTorch
  • What are tensors
  • Working with tensors
  • Numpy and PyTorch interfacing
  • GPU support on PyTorch
  • Speed Comparisons between Numpy, PyTorch, and PyTorch on GPU
  • Autograd concepts and writing a basic learning loop using autograd

This is “The Long Night” and we GOT a lot to cover

1. What is PyTorch

PyTorch is an OpenSource Machine Learning framework for Python based on Torch. Torch is also a Machine learning framework but it is based on the Lua programming language and PyTorch brings it to the Python world. PyTorch was originally developed by Facebook but now it is open source.

PyTorch provides two high-level features that are important for us

  • Tensor computation with strong GPU acceleration
  • Deep neural networks built on a tape-based autograd system

To learn more about PyTorch you can visit its official website

2. What are tensors

Tensors can be a little bit tricky to understand. but let’s give it a shot. We know that vector can be used to describe magnitude and direction, and also we can take the projection of a vector in x,y, and z direction. So if we take the projection of many vectors in x,y, and z direction and represent them as a multidimensional matrix then it will be called as a Tensor.
This is not a very good explanation of tensor but we don’t need to understand it in further detail to understand Deep Learning. All we need to know that tensor computation are much faster than numpy and matrices computation.
In this article, tensor refers to PyTorch tensor.

3. Working with tensor

First, we import the torch library that is required to use all the functions of pytorch and with that, we will also import numpy and matplotlib

Now let’s see how we can initialise tensors

so here we are initialising x with different tensors. First, x will give an output of a tensor with 3 rows and 2 columns containing value 1. The second x, will give an output a tensor with 3 rows and 2 columns containing the value 0, and the third x will print a tensor containing random numbers between 1 and 0.
The output will be

You can also initialise empty tensors by writing x=torch.empty(3,2). If we try to print the value of x it will be garbage value stored in that location, but the memory for that tensor is allocated and if you want to create tensor y of same row and column of tensor x, you can do that by writing y=torch.zeros_like(x).
There are few other ways to create tensors like torch.linspacpe(0,1,steps=5)
it gives 5 values between 0 to 1 at equal distance.
And Obviously, you can create your own tensor torch.tensor([[1, 2],[3,4]])

You can find more operations on Torch here

4.Numpy and PyTorch interfacing

You can create a Numpy ndarray and convert it into tensors. To do that we call from_numpyfunction and pass a numpy.ndarray and it converted into torch.tensor. But remember that it is a bridge, not a copy in the sense that if we do any changes in numpy.ndarray our tensor will change accordingly.

and output will be

Now let’s compare the time taken in computing with numpy and tensor. You can do this experiment by taking large values and do some basic computation with them. Here I am attaching pictures of my experiment. Remember I didn’t use GPU.

You can see that in the first cell we are taking two 100*100 ndarray and multiplying them. It takes 115 ms whereas if we do the same operation with tensors it takes 87.3 ms. Not a large difference. right? But if we increase the size of our ndarray to 10000*10000 it takes 1min 30s and on tensor it takes only 19s. So we can see that if we have a fairly large number of data we should use tensor. BTW %%timeis a magic command which displays the time taken by that cell to execute.
There is performance enhancement when we use numpy rather than using general python matrices, and that can also be increased by using tensors. Now we will see how we can improve the performance of tensors.

4.GPU support on PyTorch and comparisons

Before talking about GPU we must talk about CUDA. Cuda is the language extension provided by NVIDIA to support programming on GPU. When you import torch library on colab it automatically enables runtime on GPU. If you want to check the number of GPU for your machine you can do print(torch.cuda.device_count()) it will return 1 on colab . You can do some more things like

Now usually we store device in a variable. For example cuda0=torch.device('cuda:0') cuda:0 is referencing 0th GPU. If you have multiple GPU you can use it like cuda:1, cuda:2, etc.

Now cuda0 is an important variable

Earlier when we created tensors it was on CPU, but now we are creating it on GPU by using the argument device=cuda0 . The output of this code will be

Here we can see that our a,b, and c is on GPU

Now let’s compare speed between numpy, tensor and tensor on GPU

I hope it is legible

Now let's discuss these cells and time taken to execute them, for understanding, I’ll refer cell number that is written in the left side of the cell. So basically in cell 3,4 and 7, I am taking 1000*1000 multi-dimensional matrices with numpy , tensor, and tensor on GPU respectively. The operation we are performing is the addition. It takes 1 min 32 sec for numpy, 18.9 sec for torch.tensor and only 6.43 sec for tensor on GPU. Huge difference right?.BTW b.add_(a) is called in-place addition similar to a+=b in any other language. Now let’s make it more complex and change our

operation to matrices multiplication to increase complexity and check how our all three performing. That is done in cell no. 8,9, and 10. Here numpy is taking 10 min (Yeah a hell lot). Tensor on CPU is taking 4 min 26 sec but Tensor on GPU is taking **drumroll** only 30.9 sec and that what we need in Deep learning.
That is the power of GPU

Now that I have convinced you that Tensor on GPU performs much better, now let’s discuss one of the most important features of PyTorch

5.Autograd concepts and writing a basic learning loop using autograd

One of the important features of PyTorch is the ability to do the automatic computation of gradients. This is how we do it

We are initialising tensor normally as we usually do but we are adding this new parameter requires_grad=True . By doing so we are telling PyTorch that x is something that can be differentiated again. Let’s see an example. First, we will do a number of operation on x and then we’ll find it’s gradient.

We are adding something to x then squaring y then again adding all number of matrix z and storing it into t. Then we are doing t.backward(). If we try to print t it does not print anything instead it calculates differentiation of t with respect to x .

Here how it happens

and answer to this or if we print x.

You can check this example by printing values of x,y,z, and t for better understanding.

Let’s take another example try out your self

Let’s see another method to achieve this where you don’t have to sum all values like we are doing in line number 5, instead of this you can create a tensor of size r and do r.backwards(a) it does pointwise multiplication with r and give the same result. Here is the example code

How to use autograd of PyTorch

Here I will take some data and relate it with output. Then I will propose a model and see if we can find the actual values and loss.

Here we are creating the data x and y and our value for w and b is 3 and -2 respectively

Now we will propose a model and try to find the value of w and b to match with our original value of w and b

Here proposed model is w*x+b and we are using a square error loss function to calculate the loss. One thing to notice is that the value of w and b we are initialised as 1 and our actual value of w and b is 3 and -2 respectively so when we are calculating w’s grad it should be negative to increase the value of w from 1 to 3 according to this formula w=w-w.grad and value of grad b should be positive to decrease it from 1 to -2 so after doing loss.backwards() we calculate w and b gradients.

as you can see our assumptions about these gradients were true.

Writing learning loop in PyTorch

Here we are doing everything in one piece of code only difference is that we are doing it for 10 times so that our model can learn parameters and we can see from the output that it almost achieved the right value of w and b.

When we do loss.backwards() PyTorch understand everything below it will be done in backpropagation, to remove that we use torch.no_grad() , By doing so PyTorch understand from now it is no longer part of backpropagation. Do not forget to set grad to 0 by writing w.grad.zero_() so that it can learn new values.

Compare the learning loop on CPU and GPU

In the first and second code, the only difference is that in the 2nd code we are taking data on GPU by using argument device=cuda0 and by doing so performance increase is very satisfactory.
This is what we require in Deep Learning.

You can find the full notebook here.

This article is part of a series that I am writing If you wish to receive more connect with me on below mentioned Social media links.

I hope you find this article useful and I can use some clap to boost my confidence for the upcoming articles . If we are meeting for the first time Hi, I am Vaibhav and if you wish to connect with me I am active on LinkedIn and Twitter.

Poka Poka :)

--

--