For Beginners — Supervised Learning and Unsupervised Learning Explained

Lakshmi Prakash
Design and Development
4 min readAug 23, 2022

Open any machine learning book or course for beginners, and pretty soon, you’ll come across “supervised learning” and “unsupervised learning”. What do these terms mean? Note that these are two of the most common types of machine learning techniques used. Why? Because they are highly effective and can be applied to a broad range of problems in different industries.

Supervised Learning: In supervised learning, while giving the machine your data, you tell the machine which is what. That is, you use “labels”, so the machine doesn’t have to figure that one out by itself when it is learning. What machine learning does is that based on this large volume of data, it automatically makes predictions for new, unlabelled data that you would input. That’s the theory of supervised learning.

But wait, in practice, how does a machine understand something so complicated? For example, in an image, for a machine that has basically not yet been taught the subject you’re interested in, how would it understand which is what? That’s where features come into place. You use feature vectors to teach the machine in ways it can understand, and the machine’s duty is to understand the mapping between weighted features and labels.

Teaching Machines to Learn

If you want to teach your machine to be able to identify photos of dogs, you should use features like, “a pair of eyes”, “a pair of ears”, “four legs”, “four paws”, “furry”, “tail”, “tongue”, “teeth”, “whiskers”, “fur colour” (black or brown or white), etc. After you’ve trained it to understand these features, if it looks at an image of a tree, it might not notice any of of these features, so it would not label the tree a “dog”. If you’d give it an image of a dolphin, once again, it might notice a pair of eyes and tail, maybe, but not fur, legs, paws, whiskers, ears, and such.

Supervised Learning Problems:

Two of the most common types of supervised learning problems are regression problems and classification problems.

Regression problems involve predicting an output value for a given input value when the input and output are related by a mathematical function. These usually involve what’s called a “continuous variable”. Examples include weather prediction, stock value prediction,

Classification problems, as the name itself suggests, involves classifying data into different categories based on the labels. Some of the common examples are

Machine grouping fruits into clusters

Unsupervised Learning: In unsupervised learning, you give a machine data and hope for it to find out patterns on its own, without using any kind of labels in your data. In this case, you ask for the machine to figure out possible relationships by itself, which is why it’s called “unsupervised learning” based on any sense of similarity the machine can find. It’s now the machine’s responsibility to figure out some associations among the different input values and recognize patterns because the labels we give in supervised learning are missing here.

For unsupervised learning, usually, the machine requires very large amounts of data.

Unsupervised Learning Problems:

Clustering problems are often solved by unsupervised machine learning. As mentioned earlier, for unsupervised learning, you don’t give the data any labels, so the machine analyzes all the data and tries to form clusters or groups based on common patterns it can find. The clusters need not always be based on a pattern you might have in mind.

For example, if you give the machine large volumes of customer information, the machine can divide the customers into different clusters based on a relationship between age and amount spent or relationship between time of activity and location, etc. Hopefully, you get the point. The pattern is not for the machine learning engineer or data scientist to say; the machine finds and follows a pattern. This method could sometimes give you information you didn’t see coming.

Summary:

The most basic difference between supervised learning and unsupervised learning is that in the former, you use labels, while in the latter, the data will not have labels.

Both of these techniques have their own advantages and disadvantages, and mistakes can happen in any of these cases. There’s no guarantee of results never being wrong, so how would you know which type of machine learning you should be using?

You’d use supervised learning when your input clearly is clearly labelled and you know what kind of output you expect.

You’d use unsupervised learning when you don’t know what output you expect, but you just have lots of recorded information.

Note: There is also semi-supervised learning, a method that uses supervised learning and unsupervised learning, but this one is not as frequently used as either one of the above.

If you’re interested in checking out what reinforcement learning, also known as the “third paradigm of machine learning”, is, check out this post —

--

--

Lakshmi Prakash
Design and Development

A conversation designer and writer interested in technology, mental health, gender equality, behavioral sciences, and more.