Self adaptive tuning of the neural network learning rate

4 min readJul 2, 2019

1. Introduction

According to [Brownlee Jason 2019] the learning rate is the most important hyper-parameter in neural networks. It involves many trials and errors to find the correct learning rate for your neural networks.

The author had experience in adaptive control of industrial robotic manipulators. The purpose of this article shows how to use control theory for the neural networks learning rate tuning.

2. Adaptive control theory

As shown in Figure 2.1, the Plant is unknown and can be highly non-linear. However, we can use some reference model to stand for it. A second order model can be described by the following equation (ARMA model [Karam 1989]):

Equation 2.1 plant model

a1,a2 and b are process parameters to be identified in real-time and Recursive Least Square (RLS [Ljung 2001]) identify process parameters.

process parameter vector.

lagged inputs outputs. lambda is the forgetting factor and P the co-variance matrix and I the identity matrix (all 1).

The digital controller (Wellstead 1979):

Equation 2.3 digital controller

The overall transfer function of Figure 2.1 can be expressed as:

Ensure Equation 2.4 has poles in right hand side circle (centre point 0.5,0 and radius 0.5. One example t1=-0.9, t2=t3=0. From Equation 2.4 it can be easily get the following control parameters:

For a 1st order system, Equation 2.3 can further be simplified as:

Equation 2.6

And plant model can be simplified as:

3. Neural network for this experiment

Train data 23k time serious and test data 5k time serious. Test data are totally different from the train data.

Architecture — we start from the simplest 1 hidden layer:

Also, I tried as less nodes as possible and started at 8 hidden nodes. But it failed to converge for my networks. So I increased nodes to 16.

4. Test results

As can be seen from Figure 4.1 val_acc approached 100% after 80 epochs.

As can be seen from Figure 4.2 the learning rate (lrate) is not static and changes over time.

5. Conclusions and suggestions

This article showed that adaptive control can be perfectly used for neural networks learning rate tuning. I hope this is the first time the idea is used for neural networks. To my best knowledge, I have not read anything similar so far.

Only CNN used in this test. It should be as effective for RNN et al.

Shall try to apply this to other applications as well (e.g. https://cpury.github.io/learning-math/).