Types of Optimization Algorithms used in Neural Networks and Ways to Optimize Gradient Descent
Anish Singh Walia

You note that Stochastic Gradient Descent is also used for mini-batch gradient descent. However, I would add that this is erroneously done: SGD is for one data point at a time. Using SGD when people mean mini-batch gradient descent is inaccurate and should be discouraged. There’s unfortunately a lot of misuse of theory and terminology these days.

