[PyTorch] 2. Model(x) vs Forward(x), Load pre-trained Model, Finetuning, Length of the DataLoader, How to Send model to GPU

Published in

jun-devpBlog

3 min readApr 4, 2020

1. model(x) vs model.forward(x)

Figure 1. Two ways to feed input to the network(model)

In PyTorch, in order to define our own model the class needs to inherit the ‘nn.Module’ with overriding two functions (1) __init__() and (2) forward(input). As the function forward() takes input as its argument, one might ask we feed input to the network by calling the function forward() like model.forward(input).

However, unlike our intuition, it is recommended to feed input by model(input) which actually calls the function __call__().

The reason is that __call__() does not only call the function model.forward() but also does a little extra(Which are called the hooks) than model.forward(). This means that if we feed input by model.forward() then some those extra works in __call__() might be dropped and this could cause unexpected outcomes.

Figure 2. the __call__() function from PyTorch

As is shown above, the defined forward function is eventually called in the __call__ function. Therefore, in order not to miss those extra operations(hook), using the model(input) is desirable than model.forward(input).

2. loading pre-trained model for finetuning

Pytorch supports pre-structured and pre-trained representative models such as ResNet, VGG, DenseNet and we can easily load them with the help of the ‘torchvision.models’ package.

The following code shows one example of how to load the pre-trained model from ‘torchvision.models’.

Depends on what we want to do further with those loaded models, we need to set the parameters(weights) in model trainable or not.

For example, if we want to finetune the model then we only need to modify the final fully connected layer while keeping values of all the other parameters fixed in backpropagation. This can be easily done in PyTorch by simply setting the attribute ‘.required_grad’ False of parameters as is implemented in the function ‘set_parameter_requires_grad()’ in above code.

In addition, as the pre-trained ResNet and VGG was for classifying 1000 classes, the fully connected layer is set to output 1000 units. This has to be changed accordingly to the number of classes we want to classify.

3. Length of the DataLoader

DataLoader is the class that returns the generated mini-batch sets given arguments such as batch_size, shuffle, and num_worker. By going over the whole batch sets generated by the DataLoader, our model can complete one epoch of training.

This means, for example, the number of mini-batch sets(equivalent to len(Data_loader)) from Dataloader depends on the batch size and the number of all data in the dataset. Suppose we have 100 images in our dataset and we set the batch size 5 then the number of mini-batch sets is 20, as 20 multiply 5 is 100. As we use one mini-batch set per iteration in training, the one epoch is composed of 20 iterations. If the batch size is 10, the number of mini-batch sets is also 10.

4. Sending Tensors, model to GPU

Why do we need to send tensors, models(graphs) to GPU?

: There could be several reasons, but it is mainly because the required time for sharing the information between CPU and GPU is huge. Thus, in order to shorten this required time.

How do we transfer tensors and models to GPU?

: There are two ways. (1) use the function ‘.to(device)’ or (2) ‘.cuda()’. However, the second way with ‘.cuda()’ is an old technique. The first way with ‘.to(device)’, on the other hand, is what commonly used nowadays as it is more flexible than (2).

Let’s take a look at below code to get some ideas of ‘.to(device)’

Note that if one model is moved to GPU, then everything that belongs to such a model(Input data, Labels) also should be transferred to GPU. Otherwise, you are likely to encounter some runtime errors.