Statistical Learning vs Machine Learning
There is a subtle difference between statistical learning models and machine learning models.
Statistical learning involves forming a hypothesis before we proceed with building a model. The hypothesis could involve making certain assumptions which we validate after building the models.
For example, let us consider Linear Regression(LR) which is an example of a statistical model. While building a LR model, a set of 3 assumptions are made.
- All the residuals follow a normal distribution around the mean.
- The attributes in the dataset are all independent.
- There is homoscedasticity in the data.
The model is assumed to take form, Y= b1 + b2X. So, we finally end up with an equation of precisely this form, b1 and b2 being the unknown coefficients.
With the assumptions regarding the model and the type of equation being made, a cost function is calculated and minimized using methods like gradient descent and thus we finally arrive at a LR model and diagnose our model if the assumptions we made are followed by the data. If the assumptions are not fulfilled, we reject the initial hypothesis and start over again.
So, our initial hypothesis certainly plays an important role in the case of statistical learning models.
But, in the case of machine learning(ML) models, we directly run the ML algorithms on the model, thus allowing the data to speak out instead of directing it in a certain direction with our initial hypothesis/assumptions.
For example, while building a decision tree/random forest, we assume no hypotheses and directly run the algorithms. The ML algorithm returns the crucial features and their importance. Here, we are not setting up any hypotheses which might affect our final model. The model totally learns the data without any user imposed conditions.
Thus, the machine learning models are said to be flexible in nature, because the user doesn't intervene in telling a model how to build an equation/classifier and thus learning the data better!