A Brief Introduction To Unsupervised Learning

Salman Ibne Eunus
CodeX
Published in
3 min readSep 26, 2021

Unsupervised Learning is a type of machine learning in which training is carried out without any human assistance or supervision. This means the model only gets to know the input values from the data set and there is no target variable or output values in the data set like supervised learning. In supervised learning we have labelled data but here the data is unlabeled. The model just gets the input values to extract important patterns or to discover knowledge from the data. They can also be helpful in feature extraction process.

We can classify unsupervised learning into 2 types which helps in the process of finding hidden information from the data. They are — Transformations of the data set and Clustering.

Let us briefly explain the 2 types of Unsupervised Learning mentioned above.

  1. Transformations of the data set — It refers to algorithms which create a new form of representation of the older data set which makes it less complicated for humans or other machine learning algorithms to comprehend. A commonly used application of unsupervised transformations is known as dimensionality reduction. Dimensionality reduction takes high dimensional representation of the data as input and reduces the dimension by selecting and extracting only the important features from the data. It reduces the dimension of the data set and thus allows more accurate predictions for supervised learning models and also leads to less computation time for prediction. Another application of unsupervised transformations is to look for components which make up a data set. For example — extracting topics from a group of documents. In this case, the algorithm might find the different topics discussed in an online newspaper such as — politics, entertainment, sports, etc. Latent Dirichlet Allocation is often used as a topic extraction algorithm.
  2. Clustering — In this algorithm, the data with similar characteristics are grouped together to form individual clusters. This can be used to segment customers in a market according to their choice of goods and preferences to increase customer satisfaction. It is also commonly used in recommendation systems in online platforms such as — Netflix and you tube to know user preference and to suggest new videos according to their choice. It can also be used in many cases such as — image segmentation to group similar pictures. It can be used in many more applications such as — result grouping in search engines, anomaly detection, etc. Common clustering algorithms include — K-means clustering, Hierarchical clustering, Gaussian mixture model and density based clustering such as — DBSCAN, etc.

Limitations of Unsupervised Learning —

One of the major limitations of Unsupervised learning is that there is no way to measure the performance of the model as there is no labelled data. As a result, we cannot create a loss function by comparing the actual values and predicted values as we do in supervised learning. Therefore, we do not accurately know what the correct output should be and whether the model performed good or bad. There is also no way to tell the algorithm what we are looking for in our data set. This creates huge uncertainty and for this reason unsupervised learning algorithms are often used in data exploratory settings in which the data scientists wants to get more insights about the data or as a pre-processing step for supervised learning. It can help supervised learning by grouping the data set into clusters which might be helpful for labeling a huge data set.

--

--

Salman Ibne Eunus
CodeX
Writer for

Data Scientist|Robotics Engineer||AI Researcher| Bioinformatics