Thinking Fast
Published in

Thinking Fast

A High-Level View of Unsupervised Learning

Like Managing Kids in a Candy Store

Photo by Elizaveta Dushechkina on Unsplash

What happens when you leave a child unsupervised at Target (or any store with cool stuff)? You end up spending at least twice the amount you intended to spend.


Because they find things you weren’t there to buy in the first place.

In data science, unsupervised algorithms work in a similar way. We unleash them in stores (e.g. data sets) and wait for them to find things on the shelves that we didn’t know existed.

And with that segue we move into our topic in this multi-part series on building a stronger data science-driven mindset; unsupervised algorithms. As a quick recap, I started this series with the idea that in order to be an effective data scientist we need to be able to translate real problems into data science problems. And most data science problems can be reframed in one of 5 different ways.

Up to this point, we have covered regression problems,

classification problems,

and forecasting problems (as a special type of regression problem).

The next step in our data science journey is to uncover what is meant by unsupervised analytics and explore some of the most common algorithms associated with this type of data science.

As implied in the examples above, unsupervised algorithms are a form of data science algorithms that attempt to look for unknown patterns or groupings in data. In our prior discussions, regression, classification, and forecasting all require that know the outcome, or target, variable that we are trying to learn.

In the case of unsupervised analytics, we do not know what the target variable is or how to group the data. But this does not mean that we don’t have at least some idea or hypothesis surrounding the possible groupings inherent in a data set.

As we will learn, some unsupervised algorithms require that we set a number of possible groupings prior to running the analysis. Here are some of the more common tasks that unsupervised algorithms can be used to solve:

1. Clustering:

Probably the “poster child” of unsupervised algorithms, clustering is a process of attempting to find natural groups in data. The underlying approach to most of these algorithms is to compare the distance between observations in some multi-dimensional (read, multi-variable) space. These distances are then used to group points that are close together as groups, separated by groups of observations that are further away. The challenge with clustering is that interpreting groups in too high of dimensional space is very confusing and often not very useful. Thus, clustering attempts to find natural groupings based on a limited number of variables (typically fewer than 5).

2. Dimensionality Reduction:

If clustering attempts to find clusters of observations (e.g. rows in a data frame) in data then dimension reduction techniques attempt to find groupings of variables (e.g. columns in a data frame). The goal of dimensionality reduction is to reduce the number of variables such that models can be trained with as much relevant information as possible, while removing the influence of noise that can be posed by having too many features to train on.

3. Anomaly Detection:

Although anomaly detection can be both supervised and unsupervised, here the focus is on using unsupervised algorithms to detect anomalies in data. The general idea is to use these unsupervised algorithms to learn the typical distribution of observations and identify observations that deviate from those distributions. In univariate space (e.g. one variable) we can use z-scores to identify these observations but in multi-variate space it becomes much more challenging and so unsupervised algorithms can be used to help in multi-variate scenarios.

4. Associative learning:

Associative learning or associative rule learning or just association analysis is a task focused on identifying the patterns that govern associations between things in data. For example, supermarkets often want to know when people by product X how often do they also buy product Y. A simple approach for bivariate associations may simply be correlations. But in more complex spaces with hundreds or thousands of possible associations, associative algorithms do a better job of more quickly helping us to identify the patterns of associations between observations in data.

Each of the above tasks are associated with different algorithms that leverage various mathematical formulas to achieve the above task. Here are some of the more common algorithms associated with each task:


a. K-Means: the single most common clustering algorithm that attempts to learn clusters based on algorithms that minimize distance between observations with a cluster and maximize distance between observations in different clusters. K-Means requires that we enter the number of groups we want to attempt to optimize prior to running the analysis. Thus, it requires some experimentation if the number of optimal groups is not known, which it usually is not.

b. hierarchical clustering: hierarchical clustering is used to group clusters into hierarchies where the top level includes all observations in a single group and the lowest level partitions each observation into its own unique group of one. The goal is to find a solution in the middle of the hierarchy that provides a reasonable and understandable grouping of observations.

c. DBSCAN: a useful clustering algorithm that does not require prior suggestions for number of groups to look for and is therefore fully unsupervised. The algorithm is based on grouping observations based purely on their closeness in space to other points, known as densities. Data scientists can adjust the epsilon value to adjust the sensitivity of the algorithm for how far away it looks for “nearby” observations to include in a single group.

Dimensionality reduction:

The two most common dimensionality reduction algorithms are Principal Components Analysis and Exploratory Factor Analysis. PCA helps us to identify features in a data set that are most important to a related group of features ultimately allowing us to select the most important features for each grouping of related features. EFA on the other hand shows us the features that tend to group together and provide us with options for creating indexes such as average combinations of features that make up different factors, or groups.

Anomaly Detection:

The single most useful unsupervised anomaly detection algorithm is a tree-based algorithm known as an Isolation Forest. The goal of an Isolation Forest is to partition data into normal and abnormal data, under the assumption that abnormal data require fewer partitions of the data to identify.

Associative Learning:

The single most common algorithm for association learning is known as the Apriori Algorithm and is typically the algorithm used in a Market Basket Analysis. The goal of this algorithm is to learn the relationships between items and determine a “lift” metric that gives us some information on the likelihood of one item set (e.g. product, store, etc.) showing up with another item set. For example, the algorithm helps us make a statement like people who buy X are 12 times more likely to also buy Y.

Like engaging to learn about data science, career growth, life, or poor business decisions? Sign up for my newsletter here and get a link to my free ebook.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Brandon Cosley

Brandon Cosley


Data Science Transformation Specialist | Start with newsletter and get my end-to-end approach to data science here