Machine Learning Introduction: Supervised vs Unsupervised (Part 2)

In my previous blog post, I went over the basic fundamentals of supervised learning. As a reminder, supervised learning is a type of machine learning where the data points used have to have predefined explicit attributes. For this blog post, I want to focus on the second type of machine learning, unsupervised learning. The key distinction with unsupervised learning is that the data points used can be ambiguous with no predefined relationships.

Unsupervised Learning Overview

Unlike supervised learning, unsupervised learning develops insights with unlabeled data. As a result, there is no evaluation of the accuracy of the structure that is outputted by the relevant algorithm. An unsupervised learning algorithm analyzes a set of data, groups data points based on perceived similarities and derives conclusions from these similarities. A common example of an unsupervised learning problem is the cocktail party algorithm. Suppose you have two people at a cocktail party speaking over each other so that as a third party listener you cannot tell what each person is saying. To make the example even simpler, suppose on speaker is speaking in English and the other is speaking in Spanish or there is loud music playing in the background. A machine learning algorithm coined blind source separation can recognize different patterns for different sounds and separate the distinct sound sources. See this link for an example. And what is amazing is all of the code to implement the appropriate algorithm is written on one line!

Types of Unsupervised Learning

Unlike supervised learning which only had two main types, regression and classification, unsupervised learning has many methodologies and number of methods continues to grow with new discoveries. Below are some of the more common ones.

  1. Clustering: Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).
  2. Neural Networks: A neural network is a powerful computational data model that is able to capture and represent complex input/output relationships. Neural Networks require strong training data. Using training data, neural networks can identify relationships between different objects and use that relationship to analyze new sets of data.
  3. Anomaly Detection: Anomaly detection is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset.
  4. Blind Signal Separation: Blind signal separation is the separation of a set of source signals from a set of mixed signals, without the aid of information (or with very little information) about the source signals or the mixing process.
  5. Natural Language Processing: Natural Language Processing is a way for computers to analyze, understand, and derive meaning from human language in a smart and useful way. NLP uses machine learning to automatically learn patterns by analyzing a set of examples (collection of articles), and make a statical inference.

Conclusion

Unsupervised learning is a powerful tool that can make sense out of abstract data set using pattern recognition. With enough training these algorithms can predict insights, decisions, and results across a multitude of data sets allowing automation of many industry tasks. The rise of machine learning has truly changed the technology environment over the past few years and will continue to innovate industries and economies.