[PyTorch] 3. Tensor vs Variable, zero_grad(), Retrieving value from Tensor

Published in

jun-devpBlog

3 min readApr 4, 2020

1. Tensor vs Variable

Tensor and Variable are a class provided by Pytorch. According to the official PyTorch document, Both classes are a multi-dimensional matrix containing elements of a single data type, have the same API, almost any operation provided by tensor can be also done in Variable. The difference between Tensor and Variable is that the Variable is a wrapper of Tensor.

What this word ‘wrapper’ means is that the Tensor alone wasn’t capable of recording operations for automatic differentiation of the AutoGrad in PyTorch. In order to record those and calculate the derivatives, the Tensor necessarily needed to be wrapped by the class Variable.

However, as is shown in figure 1, as of now the Variable class has been deprecated and we are no longer bother between Variable and Tensor since the Autograd also supports Tensor!

Figure 1. from PyTorch document, description of the Variable class

2. why zero_grad()?

In most of the models structured with the help of PyTorch, the one line of code ‘optimizer.zero_grad()’ is commonly found. What that code does, according to here, it clears the gradient in Tensor(mostly model parameters, weights).

But why do we need this code?

: It is simply because the gradients with respect to parameters are accumulated from the previous mini-batches by default in PyTorch. If we don’t want the gradients from previous steps to influence the gradients of parameters in the current step, and most of case we don’t want, we need to use this ‘zero_grad()’ to clear the gradient buffers in parameters(Tensor, Variable for weights).

3. Difference between zero_grad(), loss.backward(), optimizer.step()

Unlike other deep learning frameworks, PyTorch specifically divided the gradient updates into two parts, which are loss.backward() and optimizer.step().

loss.backward(): Plays a role of the Backpropagation, computes the gradients(derivative) of the loss function with respect to the parameters(more specifically, only for the parameters whose attribute(requires_grad) is set True).
Optimizer.step(): Makes an update(step) in parameter space, meaning that updates the parameters(weights) based on the computed gradients from loss.backward()
zero_grad(): clears the old gradients from the last step.