TRUST REGION METHODS FOR DEEP REINFORCEMENT LEARNING

Published in

Analytics Vidhya

10 min readJul 4, 2021

TRUST REGION METHODS

The basic building block of Machine learning is to optimize some kind of objective function like the mean squared loss or max log-likelihood. Minimize the loss and in the process optimize the weights and biases, do this iteratively and you will get a well-performing model.
The optimization of weights can be done in two ways. i) Line search method ii)Trust region method. In the former, we select a step-size length (called the learning rate) to update our weights. In trust-region, we select a trust region around (having a maximum step size or learning rate) and within that region, we find a point of improvement. Within each trusted region, you are choosing a point for improvement, but in line search you have a global point to reach, hence you are taking steps towards it. It is stable to do region-wise improvement.

Adaptive learning rate are good alternatives, but they are regulated based on some optimizing function. Like time step, after reaching certain iteration decrease your learning rate. But this comes with its own set of problems of choosing proper time step to reduce learning rate and RL algorithms are sensitive to hyperparameters.
Drawing parallels to real-life, imagine having a fixed range flashlight while moving through an open area. So…