Machine Learning and Statistical Learning

Both Machine learning and Statistical learning solve similar problems but with different approaches. Let us look at each of the approaches to understand these concepts further.

Machine Learning

Machine Learning is a sub-domain of Computer Science that grew out of Artificial Intelligence. It is a new capability of computers to solve various kinds of problems in a generalized manner.

Modern definition:

“Field of study that gives computers the ability to learn without being explicitly programmed” — Arthur Samuel

Formal definition:

“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.” — Tom M. Mitchell

Machine learning algorithms learn from data without any hypothesis or assumptions, allowing for more flexibility in solving the problem.

Statistical Learning

Statistical learning is a field of Mathematics which deals with formulating a relationship on certain assumptions or hypotheses to predict a target variable which is not easy to measure using one or more dependent variables.

Machine Learning and Statistical Learning are broadly classified as shown below. This diagram could be one of the limited versions of the classification and I am sure one can extend it further.

Supervised learning: The setting that involves building a model which learns from one or more inputs, whose outputs are known and which can be used for predicting or estimating an output variable for new inputs.

Unsupervised learning: The setting that involves building a model which understand relationships and structure from the data which has no supervised output.

Reinforcement learning: The setting that involves performing a certain goal by interacting with the dynamic environment under no supervision. For example, self driving cars.

There are multiple ways to solve a problem/answer a question about the population in Data Science. Below is my approach to solve a problem.

Depending on the type of problem, one can use multiple machine learning algorithms or statistical learning techniques to arrive at a solution. The question, however, “How to choose the right model?”, is a different discussion altogether.

“Essentially, all models are wrong, but some are useful” — George E. P. Box, (Mathematician and professor of statistics at the University of Wisconsin).

Linear Regression


Practical (Yet to be uploaded)

I will update this space with theory and a few mathematical proofs involved with more techniques in machine learning along with their practicals in python.

Data sets used to describe each technique could be either from UCI repository or kaggle.