Continuing on Adaptive Method: ADADELTA and RMSProp
In our last post, we have discussed the difficulties of setting the learning rate hyper-parameter, which can be mitigated by using the Adagrad algorithm. However, this algorithm has a serious flaw, i.e., the effective learning rate…