An Introduction to AdaGrad
We have discussed several algorithms in the last two posts, and there is a hyper-parameter that used in all algorithms, i.e., the learning rate (η). To refresh again, a hyper-parameter is a parameter that has to be chosen manually before training. The prefix ‘hyper-‘ is to…