Unsupervised Learning — Understand Patterns

Rohan Saha
Samur.AI
Published in
3 min readFeb 8, 2019
Photo by Benjamin Voros on Unsplash

I hope you had a refreshing and productive week. If you went through the previous two articles, the fundamentals and the idea of supervised learning must be registered in your memory.

Let’s not stall, and continue breaking the barriers. Let us understand the concept of Unsupervised Learning.

Before moving on, keep in mind, that unsupervised learning can be categorized into two types:

1. Clustering

2. Association

To keep the article short and simple, only clustering will be discussed here. Association will be discussed in the separate article.

Wikipedia defines the term Unsupervised Learning as follows:

Unsupervised learning is a branch of machine learning that learns from test data that has not been labeled, classified or categorized.

If that’s difficult to understand, don’t worry, this blog exists for a reason

Unsupervised Learning, as the name suggests, is not supervised. More specifically, the training period is not supervised by your presence. It’s analogous to giving an algorithm some rules and asking it to find patterns in new data. But unlike supervised learning, where you provided the learning algorithm(model) with some data for learning; subsequently running predictions on new data, here you just provide the model with data with no learning period. You just expect the model to learn on new data on the first go.
All in all, in supervised learning, there is data for the model to be trained on, whereas in unsupervised learning, there is no data for the model to be trained on.

To better understand the difference between supervised and unsupervised learning, try to observe the diagram below:

Supervised vs Unsupervised Learning

The image on the left shows a classic example of classification. given some data points on a two-dimensional plane, the algorithm separates the data points using a boundary. On the other hand, the image on the right shows the prime example of an unsupervised learning problem known as clustering. Since there already an existing article on supervised learning, let’s put our attention on unsupervised learning section.

Again, have a look at the picture below,

Imagine random data points with some degree of separation between each other. After running an unsupervised learning algorithm on the dataset, the result was a beautifully segregated ‘cluster’ of data points. So what was the magic?

It turns out that the algorithm was smart enough to identify and understand the groups of data. It used a metric to analyze the similarity among data points and thus formed a group. It clearly labeled (colored) the data points indicating the group each data point belongs to. Intelligent, eh! This is called ‘clustering’. You basically try to group similar data using some metric.
It may seem magic to you, but believe me, it’s just simple math, wait, maybe not that simple, lol!

To give you some more information, here are some commonly used ways to cluster data.

1. K-Means Clustering.

2. Hierarchical Clustering.

They will be explored in later articles. However, you are free to learn more about them if you want to.

Until then, join the AI revolution, upgrade your mind, upgrade your life!

The next article will be on Reinforcement Learning — Concepts and Terminology!

If you like this article, consider buying me a coffee :)

--

--

Rohan Saha
Samur.AI

I write about byte sized articles on machine learning and how to survive academia.