PyTorch — Getting started

Introduction

6 min readMay 28, 2020

PyTorch is an opensource machine learning library based on Torch library, used for developing and training neural network based deep learning models. It is built to be deeply integrated into python. PyTorch is known for providing two most high level features:

1. Tensor computation that uses the power of Graphic Processing Units and

2. Building deep neural networks on a tape based autograd system.

In the past few years, PyTorch has helped accelerate the research that goes into deep learning models by making them computationally faster using dynamic computation that allows greater flexibility in building complex architecture. It is very useful in application such as Natural Language Processing(NLP). At present 62 companies reportedly use PyTorch including Walmart, Hepisburada and ABEJA.

Tensor in PyTorch

Tensor is a generic n-dimensional array to be used for numeric computation. It is basically same as numpy array however, former can run on either device i.e. CPU or GPU. To run operations on GPU, just cast the Tensor to a cuda datatype.

PyTorch supports multiple types of tensors :

1. FloatTensor — 32-bit float

2. DoubleTensor — 64-bit float

3. HalfTensor — 16-bit float

4. IntTensor — 32-bit int

5. LongTensor — 64-bit int

Here we will go through tensor and some basic functions of PyTorch.

torch.randn()
torch.reshape()
torch.pow()
torch.eq()
torch.var_mean()

Function 1 — torch.randn()

torch.randn function generates a tensor filled with random numbers from a normal distribution with mean’0' and variance ‘1’. Signature: torch.randn(*size,out,dtype,device,requires_grad) size is the mandatory parameter for randn(). Rest other parameters are optional. Lets see how this function and it’s arguments work.

Here in this example we created a random tensor using the .randn() function. We passed the parameter size(int,int): a sequence of integer defining the shape of output tensor and dtype as the desired datatype of output tensor. The default dtype is float.32 it can be checked using function: torch.get_default_dtype().

Let’s check another example:

In this example we have set another parameter requires_grad it takes values in boolean i.e. True/False. This is required when autograd record operations on the returned tensor. By default it is set to ‘False’.

Now let’s see when the function breaks.

Note that torch.randn() always creates a tensor with random numbers from a normal distribution with mean’0' and variance ‘1’. So we cannot create integer datatype tensor here and getting RuntimeError.

randn() is used in developing neural network models for initialising the weight matrix that is multiplied with input to increase the steepness of activation function.

Function 2 — torch.reshape()

Using reshape function, we can change the shape of the input tensor to a desired shape.The input and output tensor have same data and number of elements as input.

Signature: torch.reshape(*input,* shape)

input is the tensor whose shape we want to change and shape is the size of desired output tensor. Both the arguments are mandatory for reshape().

In this example I initialised a tensor r as 32 matrix. In the output tensor we reshaped the same to a 23 matrix. If we want the output in a single dimension or a vector, the shape can be -1:: torch.reshape(r,(-1,)) or r.reshape(-1).

Let’s see how vectors/ 1-D matrix are converted to multidimensional matrix

In this example I initialised a vector r1 using arange() function(used to initialise vector with range of numbers 0 to ≤value passed. You can also pass the initial value here). Using the vector r1, a 4*2 matrix r2 is created. Note that the datatype of both the tensors i.e input and result will always be same.

Again, Let’s check when reshape() breaks.

In this example, I initialised a matrix r3 with 4*6=24 elements. Our output matrix r4 is desired to be of 6*3=18 elements. Note that reshape() always copies the input tensor to a output tensor with same elements. Values, no. of elements and datatype remains the same as input. Only the dimension of tensor changes. So to use reshape() function, input m*n must be equal to output p*q. Hence it is throwing RuntimeError with invalid shape.

reshape() is used for instances where we want to add or remove dimensions or change number of elements in each dimension in deep learning models. reshape(-1) converts multidimensional array to single dimensional array. This is the function used in flatten layer of Convolution Neural Network models.

Function 3 — torch.pow()

torch.pow() function is used to find the exponential value of each input with the exponent passed. Exponent must be a float or a tensor.

Signature: torch.pow(*input,*exponent,out=None)

input and exponent are mandatory arguments.

The returned output is a tensor of dtype same as input tensor when exponent is float. When exponent is tensor, output has the datatype of exponent.

In this example we have p as input tensor of datatype float32, exp as exponent of datatype float64. We obtain the output tensor p1 with values as(3.)⁴, (4.)⁶ , (5.)². Note that when both the input and exponents are tensors, the output tensor has datatype of input i.e. float32.

Now we will see the output with a scalar base.

Here we have exponent as tensor e with datatype float.64. Note that Output tensor has same shape and datatype as exponent tensor.

Let’s check where pow() breaks.

Note that when exponent is a tensor, the shapes of input and exponent must be broadcastable. Here input is a 1-D matrix and exponent is 2*2 matrix . So they aren’t broadcastable. To know more about broadcastable check https://pytorch.org/docs/stable/notes/broadcasting.html#broadcasting-semantics

pow() is used for many-fold application in day to day programming, where python Math functioning and exponential operations are used.

Function 4 — torch.eq

We use torch.eq() for comparing element-wise equality of different tensors or a number and a tensor.

Signature: torch.eq(input1*,input2* ,out=None)

Output is a tensor with boolean values. It has True at each location where comparison results equal values. We will see how eq() works with some examples.

In this example, we have input1 a 3*3 tensor and input2 is a float. torch.eq() compares each of it’s item with the input2 and prints ‘True’ in the output where it finds match. Rest other places are false. Note that datatype of input1 is int64 and input2 is a float. While comparing the items in tensor PyTorch maps the input1 and input2 to real. So int 3 and float 3.0 are considered equal.

Let’s check with with two multidimensional matrix.

Here we passed two tensors as input. Each item of the input1 is compared with the item on same position of input2 tensor. And the output is True where it is a match in values. Again the datatypes are different for the tensors. First is float64 and second is float.32.

Now let’s check where .eq() breaks

Note that when two tensors are compared, their shape has to be broadcastable. Thus here the second argument’s size is not matching with the first argument. Hence prerequisites aren’t met and it is throwing error.

Neural network models generally use huge datasets. eq() is used to train the models with appropriate data before feeding it to the model. Also it can be used to find the accuracy of prediction and compare the prediction with targets:: preds.eq(tar)

Function 5 — torch.var_mean()

As the name suggest var_mean() of PyTorch is used to find the variance and mean of all the items in the tensor respectively.

Signature: torch.var_mean(input*,dim,keepdim=False,unbiased=True)

input is the tensor whose variance and mean we want to find. It is the mandatory attribute.

dim is the dimension in which we are finding variance & mean of each row.

If unbiased is True, then the variance will be calculated via the Bessel’s correction. Otherwise, biased estimator will be used.

If keepdim is True, the output tensor is of the same size as input except in the dimension dim where it is of size 1. Otherwise, dim is squeezed, resulting in the output tensor having 1 (or len(dim)) fewer dimension(s). We will explore some examples.

In this example, only input tensor is passed. So it is calculating the variance and mean respectively of all the elements together. By default it is using Bessel’s correction method.

As we saw var_mean() supports various arguments so we will explore the other arguments now.

Now I have passed all the arguments and I want to calculate var_mean of each row with respect to 0th dimension.So the result is two tensors. First is the tensor with variance values of first,second and third column respectively.Similarly, second is the tensor containing mean values.

Now it’s time to see the breaking case of var_mean().

One important thing to keep in mind while using var_mean() function is to keep note of the dimension. Here I passed a 4-D matrix but I want my mean and variance with respect to 5th dimension. Thus it’s throwing IndexError. Indexing starts with 0 by default.

var_mean() can be used when we are working with activation functions. This is used for statistical analysis where we need to work with threshold value or to measure how far a set of (random) numbers are spread out from their average values.

Conclusion

Here some basic PyTorch functions and the datatype used in PyTorch have been covered. Torch defines nine CPU tensor types and nine GPU tensor types. torch.Tensor is an alias for the default tensor type (torch.FloatTensor). These can be explored using the below mentioned links. There are many more PyTorch functions that can be implemented while working with tensors.

Reference Links

Official documentation for torch.Tensor: https://pytorch.org/docs/stable/tensors.html
https://kites.com/python/