TRUST REGION METHODS FOR DEEP REINFORCEMENT LEARNING

Astarag Mohapatra
Analytics Vidhya
Published in
10 min readJul 4, 2021

--

TRUST REGION METHODS

  • The basic building block of Machine learning is to optimize some kind of objective function like the mean squared loss or max log-likelihood. Minimize the loss and in the process optimize the weights and biases, do this iteratively and you will get a well-performing model.
  • The optimization of weights can be done in two ways. i) Line search method ii)Trust region method. In the former, we select a step-size length (called the learning rate) to update our weights. In trust-region, we select a trust region around (having a maximum step size or learning rate) and within that region, we find a point of improvement. Within each trusted region, you are choosing a point for improvement, but in line search you have a global point to reach, hence you are taking steps towards it. It is stable to do region-wise improvement.
LINE SEARCH AND TRUST REGION METHOD.
  • Adaptive learning rate are good alternatives, but they are regulated based on some optimizing function. Like time step, after reaching certain iteration decrease your learning rate. But this comes with its own set of problems of choosing proper time step to reduce learning rate and RL algorithms are sensitive to hyperparameters.
  • Drawing parallels to real-life, imagine having a fixed range flashlight while moving through an open area. So

--

--

Astarag Mohapatra
Analytics Vidhya

Hi Astarag here, I am interested in topics about Deep learning and other topics. If you have any queries I am one comment away