Supervised VS Unsupervised ML
Hello dear readers!
Today I thought I would post about supervised and unsupervised learning. In fact I’ve had several discussions about ML lately and as you may have found out by yourself already, things can be easily mixed up when talking about AI (BTW that was the case for me too).
In this post I cover:
- Supervised ML
- Classification and Regression
- Unsupervised ML
- Clustering and Association
ML learning type: Supervised VS unsupervised
Why do we care?
Machine learning, as the name says, is about learning data. But even though a machine can learn a lot about data, we would not be able to exploit it without data mining. In fact, data mining that helps us understanding the relationship between a set of data, such as finding pattern, recurrence…
Supervised:
So here it is, let’s say you have a set of data that which could be :
- A spreadsheet of Rose, Jasmin, Tulips, Lila flower all with specific features, such as petal length, depth, colour etc
or
- Thousands of picture of orange, apple, strawberries and plum, with information about what makes them being cats.
Why this is supervised learning:
- A set of data has been given to the Algorithm
- By doing this you taught the algorithm about recognising Roses and fruits by giving features for each of them. You gave the algorithm “training Data”.
- The probability that the algorithm recognize a Rose of a fruit with accuracy is High
- With new data (x) you can predict the output (Y) for that data
Example: By giving the picture of a forest Strawberry, you can expect a fairly high probability for the result to be a Strawberry.
- By giving more picture to the algorithm, you actually make more performant as the prediction become more accurate. This is how the algorithm reaches the best performances.
Classification and Regression:
Supervised Leaning can be grouped into classification and regression.
Example here following the Flowers and the Fruits.
Classification:
The classification happens if the output variable is a label such as,
- Flower: “sharp”, “Dark Red”, “Red”, “large”, “showy”
- Fruit: “round”, “Green”, Yellow”, “Malus”, “Citrus”, “Rosaceae”
Regression
Regression happens when the output is not a label but a value such as,
- Flower: “Petal Width value”, “Sepal Length value”,” Sepal Class”
- Fruit: “Weigh”, “sugar quantity”, “sodium”, “fat”, “Vitamin (X)”
Unsupervised:
Things are a little bit different here, in unsupervised learning, the difference is that there is no set of data given to train the algorithm. In other words you have the input (x) but cannot predict the output (Y) as the mapping does not exist.
Think about it like a new born coming to life with none knowledge of our worldJ. Everything is to learn and there was no data to train the baby while the pregnancy, no way to tell him what’s is the difference between an apple and an orange, or cats and dog… The baby will have to use sensors to learn about each characteristic that constitute what is around him.
We can see unsupervised learning in a similar way where there is no algorithm found to map an input with an output.
So how is unsupervised Learning performed?
Clustering and Association:
Clustering:
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). (Wikipedia)
Example: Apple separated from Strawberry, and Banana based on shape
Association: Association rule learning is a method for discovering interesting relations between variables in large databases. (Wikipedia)
Online stores are a good example of association analysis. They usually suggest to you a new item based on the items you have bought. They analyze online transactions to find patterns in the buyer’s behavior. ( source : https://www.packtpub.com/books/content/clustering-and-other-unsupervised-learning-methods )
Christelle