Self adaptive tuning of the neural network learning rate
1. Introduction
According to [Brownlee Jason 2019] the learning rate is the most important hyper-parameter in neural networks. It involves many trials and errors to find the correct learning rate for your neural networks.
The author had experience in adaptive control of industrial robotic manipulators. The purpose of this article shows how to use control theory for the neural networks learning rate tuning.
2. Adaptive control theory
As shown in Figure 2.1, the Plant is unknown and can be highly non-linear. However, we can use some reference model to stand for it. A second order model can be described by the following equation (ARMA model [Karam 1989]):
a1,a2 and b are process parameters to be identified in real-time and Recursive Least Square (RLS [Ljung 2001]) identify process parameters.
process parameter vector.
lagged inputs outputs. lambda is the forgetting factor and P the co-variance matrix and I the identity matrix (all 1).
The digital controller (Wellstead 1979):
The overall transfer function of Figure 2.1 can be expressed as:
Ensure Equation 2.4 has poles in right hand side circle (centre point 0.5,0 and radius 0.5. One example t1=-0.9, t2=t3=0. From Equation 2.4 it can be easily get the following control parameters:
For a 1st order system, Equation 2.3 can further be simplified as:
And plant model can be simplified as:
3. Neural network for this experiment
Train data 23k time serious and test data 5k time serious. Test data are totally different from the train data.
Architecture — we start from the simplest 1 hidden layer:
Also, I tried as less nodes as possible and started at 8 hidden nodes. But it failed to converge for my networks. So I increased nodes to 16.
4. Test results
As can be seen from Figure 4.1 val_acc approached 100% after 80 epochs.
As can be seen from Figure 4.2 the learning rate (lrate) is not static and changes over time.
5. Conclusions and suggestions
This article showed that adaptive control can be perfectly used for neural networks learning rate tuning. I hope this is the first time the idea is used for neural networks. To my best knowledge, I have not read anything similar so far.
Only CNN used in this test. It should be as effective for RNN et al.
Shall try to apply this to other applications as well (e.g. https://cpury.github.io/learning-math/).
6. References
Brownlee Jason 2019 https://machinelearningmastery.com/learning-rate-for-deep-learning-neural-networks/
Ljung 2001 Recursive Identication Algorithms http://www.control.isy.liu.se/research/reports/2001/2366.pdf
Karam 1989 A microprocessor based adaptive controller for robotic manipulators
https://ieeexplore.ieee.org/document/198937
Wellstead 1979 P E Zarrop M B “Pole Assignment Self-tuning Regulator” Proc IEE aol 126 p781–787