Dimensionality Reduction in Machine Learning

Sachin D N
Analytics Vidhya
Published in
6 min readJul 13, 2020

--

Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data.

why We use Dimensionality Reduction Technique?

Human Being are Can’t visualize the High Dimensional data so we want to reduce in to low dimension.In real world data analysis tasks we analyze complex data i.e. multi dimensional data. We plot the data and find various patterns in it or use it to train some machine learning models. One way to think about dimensions is that suppose you have an data point x , if we consider this data point as a physical object then dimensions are merely a basis of view, like where is the data located when it is observed from horizontal axis or vertical axis.

As the dimensions of data increases, the difficulty to visualize it and perform computations on it also increases. So, how to reduce the dimensions of a data-
* Remove the redundant dimensions
* Only keep the most important dimensions

what are the Techniques in the Dimensionality Reduction in Machine Learning?

In this article We use the Fundamental Techniques Like PCA and t-SNE .

Principal Component Analysis(PCA): In Machine Learning PCA is the Unsupervised Learning Technique .

First try to understand some terms

Variance : It is a measure of the variability or it simply measures how spread the data set is. Mathematically, it is the average squared deviation from the mean score. We use the following formula to compute variance var(x).

Covariance : It is a measure of the extent to which corresponding elements from two sets of ordered data move in the same direction. Formula is shown above denoted by cov(x,y) as the covariance of x and y.

Column Standardization : It is a Means squishing the data points such that the mean vector comes at origin and the variance(by either squishing or expanding) on any axes would be 1 in the transformed space. This technique often called mean centering and variance scaling.

Column Standardization Formula

How does PCA work : We want to preserve the direction with maximal spread / variance in the data.

  1. First we want Standardize the given Data.

2. Calculate the covariance matrix X of data points.

3. Calculate eigen vectors and corresponding eigen values.

4. Sort the eigen vectors according to their eigen values in decreasing order.

5. Choose first k eigen vectors and that will be the new k dimensions.

6. Transform the original n dimensional data points into k dimensions.

Eigen Vectors: These vectors gives the in which direction the maximal spread occurs in the data.

Eigen Values: These Values gives the what % of the spread in the direction.

In Below example we implement the PCA using Python and some Libraries.

In these Example We used the Real world Kaggle dataset to Perform PCA .Visit to Download MNIST dataset downloaded from Kaggle .

What is MNIST dataset?

In MNIST dataset Contains thousands of handwritten Number Images , our task is classifing the handwritten character into one of the 10 numeric characters.

Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255, inclusive.

Handwritten Image

First we want to convert image in to a column vector. Represent a image into a matrix .If image Consisting Full black surface assign the value ‘0’ ,if the surface of the image is full white assign the value ‘1’ ,if the image is gray assign the value Between ‘0’ and ‘1’.

After Matrix Representation by using flattening convert matrix in to single column vector such as dimension of the image is (784 x 1).

Lets Begin with Python

#Load the Data set
data=pd.read_csv(‘train.csv’)

Always normalize your data before doing PCA because if we use data(features here) of different scales, we get misleading components.

# Data-preprocessing: Standardizing the data
from sklearn.preprocessing import StandardScaler
standardized_data=StandardScaler().fit_transform(data)
print(standardized_data.shape)

Find the co-variance matrix which is : A^T * A

# matrix multiplication using numpy
covar_matrix = np.matmul(standardized_data.T , standardized_data)

# finding the top two eigen-values and corresponding eigen-vectors
# for projecting onto a 2-Dim space.

Calculate eigen vectors and corresponding eigen values.

values, vectors = eigh(covar_matrix, eigvals=(782,783))

Projecting the original data sample on the plane .
new_coordinates = np.matmul(vectors, sample_data.T)

After doing all these plot the data it’s look like

After applying PCA

We can also find how much variance explained in the direction by doing explained_variance / Total number of comonents.In below plot if we use 300 dimensions approximately 97 % of data is preserved in thaa dimensions.

percentage of Explained Variance

To see the Detailed code visit.

Limitations of PCA :

  1. PCA try’s to preserve only global shape of the data.

2. when given data is circle or hyper sphere we lost the data when we applying the PCA.

t-SNE(t-distributed stochastic neighborhood embedding):

PCA preserve only global shape of the data but t-SNE preserves the both global and local shape of the data. It is also best Visualization technique .

Neighborhood: when the distance between two points are small we can defined as the points are in the neighborhood.

Embedding: Embedding is the technique takes the data points one by one from high dimension and putting into a low dimension.

How to apply t-SNE?

  1. t-SNE is a iterative algorithm.
  2. t-SNE for Neighborhood preserving embedding technique .
  3. Every time when we run a algorithm we get different values so t-SNE is called stochastic.
  4. when we increasing the more iterations we get better shape.
  5. t-SNE basically expands dense group of points and shrinks sparse clusters. below image show the visualization of t-SNE after applying t-SNE to MNIST dataset.
t-SNE on MNIST data set

To play with t-SNE in web without Coding visit .

To see the Detailed code of t-SNE visit.

There’s a reason that t-SNE has become so popular: it’s incredibly flexible, and can often find structure where other dimensionality-reduction algorithms cannot. Unfortunately, that very flexibility makes it tricky to interpret

Let’s discuss in comments if you find anything wrong in the post or if you have anything to add.

Thanks for reading…

--

--