[PyTorch] 6. model.train() vs model.eval(), no_grad(), Hyperparameter tuning

jun94

Published in

jun-devpBlog

2 min readApr 10, 2020

1. model.train() vs model.eval

As is shown in the above codes, the model.train() sets the modules in the network in training mode. It tells our model that we are currently in the training phase so the model keeps some layers, like dropout, batch-normalization which behaves differently depends on the current phase, active. whereas the model.eval() does the opposite. Therefore, once the model.eval() has been called then, our model deactivate such layers so that the model outputs its inference as is expected.

2. no_grad()

The wrapper “with torch.no_grad()” temporarily set the attribute reguireds_grad of tensor False and deactivates the Autograd engine which computes the gradients with respect to parameters. This wrapper is recommended to use in the test phase as we don’t need gradients in test step since the parameter updates has been done in training step. Using ‘torch.no_grad()’ in the test and validation phase yields the faster inference(speed up computation) and reduced memory usage(which allows us to use larger size of batch).

3. Hyperparameter Tuning

Hyperparameters such as the learning rate, epoch, filter size, momentum, etc.., are one of the most important components which define the structure and behavior of the network.

However, unlike the parameters(weights), tuning the hyperparameters can not be done as the model trains and optimizes its parameters. Thus, it often has been done manually, defining possible values for hyperparameters, training models for respective values of hyperparameters, and choosing the one which shows the best performance.

Since this way requires quite a lot of man resources, there are some packages that automize hyperparameter tuning which has been done manually.

What is advantageous in using those packages is that we mostly only need to define the search space of hyperparameters, then the packages find the hyperparameters which optimize the model with state-of-the-art techniques such as the random search, the Bayesian model-based optimization. Some packages even support the visualization of such an optimization process.

The list of such packages is as below.

ray

Tune: a Python library for fast hyperparameter tuning at any scale

How do you tune hyperparameters with thousands of cores in just 18 lines of code?

towardsdatascience.com

Ax · Adaptive Experimentation Platform

This tutorial uses synthetic functions to illustrate Bayesian optimization using a multi-task Gaussian Process in Ax. A…

ax.dev

NNI

microsoft/nni

NNI (Neural Network Intelligence) is a lightweight but powerful toolkit to help users automate Feature Engineering…

github.com

4. Reference

[1] model.train() vs model.eval()

[2] no_grad()

[3] Guide to the hyperparameter tuning

Any corrections, suggestions, and comments are welcome