Linear Regression Simplified - Ordinary Least Square vs Gradient Descent

Multi-variate dataset contains a single independent variables set and multiple dependent variables sets, require us to use a machine learning algorithm called “Gradient Descent”.

Where does this comes from ? I’m not sure to following you here… It seems that you confuse the model and the optimization of it.

It is important to note that when you deal with large dataset, you have to face with sparsity. In this case you absolutely must use what is called regularized regression. Using a Lasso regularization will help you to “filter” independent variables in order to use only the relevant ones. If you don’t regularize, both OLS and SGD will fail to provide a good predictive model.

Beyond that, if you take a look on the probabilistic interpretation of OLS and GSD, you will discover that both are doing the same thing: this is the MLE: Maximum Likelihood Estimation. Both are optimizing the negative log likelihood of your dataset. You may take a look on this and this.