To predict the future, look to the past
Say, you are given a basket with a lot of different kinds of fruits and you know nothing about fruits. What if I pick up a fruit and ask you which fruit it is? Will you be able to answer it?
Definitely not (Without Google Image Search, of course).
Now say, I take you to a orchard and show you many different kinds of fruits and tell you their names. Now will you be able to identify all of the fruits from the basket? I’m sure you’re a fast learner and will be able to identify most with good accuracy. Not only because you’re a fast learning but also because you’ve learnt under my supervision. 😉
What if I try to teach the same to a computer with access to some fancy algorithms? I definitely can’t show the orchard to the computer to make it understand so I give it a list of fruits from the orchard with some features (say color, shape, volume, texture, etc) and the name of the fruits, ask it to learn it through some algorithm and then identify fruits from the basket using whatever it has learnt. It’ll also be able to identify most of the fruits correctly.
Now this type of problem where you need to classify stuffs after learning through some labelled data is known as Classification Problem. You give the learning model a list of observations with classified labels and ask it to classify unseen observations.
Let’s take another example, say you have a bicycle rental shop and you want to estimate how many bicycles will you be able to rent today. Since you are in this business, you’ll be able to guess a rough estimate based on the day of the week, the current weather, temperature, humidity, the season, whether it is a holiday or not, etc, right?
Now if I give the quantified data and the number of bikes rented for each day in the last 2 years to a computer and ask it to give me the number of bikes that your shop will be able to rent for the next 7 days. Will it be able to do so? Sure, why not?
Now this type of problem where you need to predict a real numbered value after learning through some labelled data is known as Regression Problem. You give the learning model a list of observations with the observed output and ask it to find the output for unseen observations.
These types of problems (Classification and Regression) are solved using Supervised Learning models.
In supervised learning, the algorithm is given a list of observations with desired output (training set) and is asked to learn and create a generalized model from it such that it can predict the desired output for any unseen observation that may or may not be from the training set.
The primary difference between a regression problem and a classification problem is that in regression, you need to predict a value from a set of continuous values which can be literally infinite whereas in classification, you need to predict a value from a set of discrete finite values.
There are basically two different types of classification problems namely:
- Binary Classification (Classify in one of the two possible classes)
- Multiclass Classification (Classify in one of the many possible classes (>2))
There is another widely encountered type of classification problem where you need to classify an observation into multiple classes from a set of finite classes. This is known as Multi-Label classification.
To learn more about Binary Classification, go through my brief notes on it.
Stay tuned as I learn and share more of my learnings on Learning Machine Learning.