Towards Machine Learning - Unsupervised Algorithms

Fatima Mubarak
Tech Blog
Published in
5 min readDec 27, 2022
Reference: scopeblog.stanford.edu

Unlabeled datasets are analyzed and clustered using unsupervised machine learning methods. Without human interaction, these algorithms uncover hidden patterns or data groupings.

Unsupervised representation (Reference: javatpoint)

In reality, we only sometimes have input data with the corresponding output. Hence unsupervised learning is required to solve such issues.

This article explains two fundamental unsupervised learning algorithms, the Principal Component Analysis (PCA) and K-Means Clustering.

This is the third article of the “Towards Machine Learning” series. You can check the previously published articles on the following links:

Dimensionality reduction

Before explaining the PCA algorithm, it is worth illustrating what are dimensionality reduction techniques.

Dimensionality reduction is data transformation from a high-dimensional space into a low-dimensional space. Since not all data are useful, and some data is redundant, we use dimensionality reduction.

If x and y are correlated, we can remove one of these features.

Correlated data

If x and y are not correlated, which means both data are important, we cannot remove either of these two features.

Uncorrelated data

The principle of dimensionality reduction is projecting to the main direction of the data, which means that we are deducting dimension. Dimensionality reduction on data means fewer features and, therefore, less complexity.

Dimensionality reduction

Principal Component Analysis (PCA)

PCA is an unsupervised learning technique for reducing data dimensionality by finding the directions that produce the smaller projection errors. To reduce dimensionality, PCA uses eigenvectors and eigenvalues.

Direction of projection

What are Eigenvectors and Eigenvalues?

The multiplication of a Matrix and a Vector yields another vector. This is defined as the transformation of that vector in the given vector space concerning the specific matrix. However, there are some vectors for a given Matrix whose direction does not change after the transformation. These vectors are known as the eigenvectors of the given matrix. After the transformation, the vector’s scaled value is known as the eigenvalue corresponding to that eigenvector.

Method 1: Eigenvectors of the covariance matrix

Given a dataset X = {x1…xn} of sample xi in Rd:

First, the mean of the dataset is calculated using the following formula.

Mean

Then, the covariance matrix of the data is computed as follows.

Covariance matrix

The Covariance matrix shows how data distributed in space.

Next, eigenvectors are computed.

Eigen vector

Finally, covariance matrix is used to find the PCA.

The highest variance is the direction, and the second highest variance is the projection.

After obtaining the n x n covariance matrix, this matrix is divided into eigenvalues and eigenvectors.

A covariance matrix contains an entry in the i, j position that represents the covariance between a random vector’s i and j elements (A random vector is a random variable with multiple dimensions).

Method 2: Eigenvectors of the design matrix

Matrix transforms a set of orthogonal vectors to another set of orthogonal vectors with a scaling factor, which is the main intuition behind SVD. As a result, the Singular Value corresponds to the singular vectors u and v.

We realize that V is the eigenvector of the covariance matrix while the eigenvalues of it are hidden in singular values.

First, the Singular Value Decomposition (SVD) of the data matrix is calculated.

SVD

Then, Eigenvectors are given by

Eigen vector

Also, Eigenvalues are given by

Eigen values

In short, PCA is a dimensionality reduction approach that converts a collection of features in a dataset into a smaller number of features known as principal components while trying to maintain as much data as possible in the original dataset.

K-means Clustering

K-Means clustering is an unsupervised learning technique to cluster the data so that the related items will be grouped in one class.

How clustering works?

A central vector represents clusters called a centroid. The algorithm selects a centroid at random for each cluster. For example, if we specify a “k” of 2, the algorithm will randomly select two centroids.

Centroids

Centroids partition the input space; every data point in the dataset is assigned to the nearest centroid, which means that a data point is recognized to be in a specific cluster if it is close to that cluster’s centroid than to any other centroid.

Partitioning
Assigning data

Then, examples are assigned to centroids.

Adding centroids

Now, the algorithm updates the means (average of the assigned examples) and recomputes the centroid for each cluster by taking the average of all points in the cluster. The algorithm re-assigns the points to the closest centroid as the centroids change.

Updating the mean

It keep iterating the same steps until you get a stable centroid.

In short, K-means searches an unlabeled dataset for a specified number of clusters using an iterative technique to provide a final grouping based on the number of clusters defined by the user.

Summary

This article discusses PCA and K-means, which are the most known unsupervised algorithms used to analyze and cluster unlabeled datasets.

After we’ve covered the fundamental unsupervised algorithms, we’ll look at how the cloud relates to machine learning in the following article.

--

--

Fatima Mubarak
Tech Blog

Data scientist @montymobile | In my writing, I explore the fields of data science , machine learning and related topics.