Self adaptive tuning of the neural network learning rate

James Wang
4 min readJul 2, 2019

--

1. Introduction

According to [Brownlee Jason 2019] the learning rate is the most important hyper-parameter in neural networks. It involves many trials and errors to find the correct learning rate for your neural networks.

The author had experience in adaptive control of industrial robotic manipulators. The purpose of this article shows how to use control theory for the neural networks learning rate tuning.

2. Adaptive control theory

Figure 2.1 adaptive control

As shown in Figure 2.1, the Plant is unknown and can be highly non-linear. However, we can use some reference model to stand for it. A second order model can be described by the following equation (ARMA model [Karam 1989]):

Equation 2.1 plant model

a1,a2 and b are process parameters to be identified in real-time and Recursive Least Square (RLS [Ljung 2001]) identify process parameters.

Equation 2.2 RLS

process parameter vector.

lagged inputs outputs. lambda is the forgetting factor and P the co-variance matrix and I the identity matrix (all 1).

The digital controller (Wellstead 1979):

Equation 2.3 digital controller

The overall transfer function of Figure 2.1 can be expressed as:

Equation 2.4 transfer function

Ensure Equation 2.4 has poles in right hand side circle (centre point 0.5,0 and radius 0.5. One example t1=-0.9, t2=t3=0. From Equation 2.4 it can be easily get the following control parameters:

Equation 2.5

For a 1st order system, Equation 2.3 can further be simplified as:

Equation 2.6

And plant model can be simplified as:

3. Neural network for this experiment

Train data 23k time serious and test data 5k time serious. Test data are totally different from the train data.

Figure 3.1

Architecture — we start from the simplest 1 hidden layer:

Figure 3.2 NN architecture

Also, I tried as less nodes as possible and started at 8 hidden nodes. But it failed to converge for my networks. So I increased nodes to 16.

4. Test results

Figure 4.1

As can be seen from Figure 4.1 val_acc approached 100% after 80 epochs.

Figure 4.2

As can be seen from Figure 4.2 the learning rate (lrate) is not static and changes over time.

5. Conclusions and suggestions

This article showed that adaptive control can be perfectly used for neural networks learning rate tuning. I hope this is the first time the idea is used for neural networks. To my best knowledge, I have not read anything similar so far.

Only CNN used in this test. It should be as effective for RNN et al.

Shall try to apply this to other applications as well (e.g. https://cpury.github.io/learning-math/).

6. References

Brownlee Jason 2019 https://machinelearningmastery.com/learning-rate-for-deep-learning-neural-networks/

Ljung 2001 Recursive Identication Algorithms http://www.control.isy.liu.se/research/reports/2001/2366.pdf

Karam 1989 A microprocessor based adaptive controller for robotic manipulators

https://ieeexplore.ieee.org/document/198937

Wellstead 1979 P E Zarrop M B “Pole Assignment Self-tuning Regulator” Proc IEE aol 126 p781–787

--

--