Merging Principal Component Analysis(PCA) with Artificial Neural Networks

Published in

Analytics Vidhya

5 min readSep 22, 2020

What is Principal Component Analysis (PCA)?

Principal Component Analysis is an unsupervised learning method which is often used to reduce the dimensionality of large datasets or simplify their complexity, by transforming a large set of variables into a small one while trying to retain most of the information of the original dataset.

Principal Component Analysis reduces data by geometrically projecting it onto lower dimensions which in turn are called as Principal Components(PC). This method’s goal is to find the best summary of our data by using the least amount of principal components, by choosing our principal components we minimise our distance between the original data and its projected values on the principal components, as a result of minimising the distance we maximise the variance of the projected points, we similarly do this for all other principal components, while doing so our principal component should not correlate with previous components, by being uncorrelated we ensure that the number of principal components is equal to the number of variables or features of our dataset whichever is smaller.

Projection of data points onto Principal Components. Source: Stack Exchange

What are Artificial Neural Networks?

Artificial Neural Networks also called as Neural Networks are a form of machine learning algorithms with a structure based on the human brain. Like other kinds of machine-learning algorithms, they can solve problems through trial and error without being explicitly programmed with rules to follow.

In Neural Networks, the computer learns to perform a task by analysing training data, the data that is to be trained is pre-labelled with the expected output before feeding it in the model. As the neural network is structured on the human brain, it has thousands of nodes that are interconnected by importance factors called as weights. A typical neural network usually consists of an input layer, numerous hidden layers and an output layer. The input layer consists of training data that the algorithm learns from, which in turn is passed onto the hidden layer which has nodes interconnected with weights, it is in this layer that our algorithm fine tunes our weightings until the margin of error is minimal, in the output layer it does the classifications which our algorithm maps.

Image by Sabrina Jiang © Investopedia 2020

Why merge Principal Component Analysis and Neural Networks?

Before addressing this question we should understand why we perform neural networks on a dataset, suppose we have a huge dataset containing images of cats and dogs, we feed this data in our model and train it to classify images of cats and dogs which it would get in the future, so for a model to classify between two outcomes (binary) or many outcomes (categorical) we use the technique of neural networks. Now coming onto the question of merging PCA and NN, we use PCA to reduce the dimensions of our dataset so that when you apply the resulting dataset on a machine learning algorithm the computational time decreases while training the algorithm. As explained above PCA reduces the dimensions of our dataset to a value equivalent to the number of variables or features depending on whichever is lower.

Example: I have a training set consisting of 150 images of “me wearing glasses” and “me not wearing glasses” having 4096 features per image, when I directly apply NN to my dataset it would take a huge amount of time for it to train, but if I pre-process my data using PCA it will reduce the dimensions of my dataset to (150,150) from the original (150,4096) hence when I apply NN to my resulting dataset the time required to train the dataset will reduce considerably, without a huge loss in accuracy.

Time taken to train the NN with 5 hidden layers and 1000 epochs by reducing the dimensions with PCA

Time taken to train the NN with 5 hidden layers and 1000 epochs without reducing the dimensions with PCA

Classifying if a person is wearing glasses or not using Neural Networks and pre-processing the data using Principal Component Analysis.

The dataset consisted of 100 images of a person wearing glasses and 100 images of a person not wearing glasses.

After loading the data, the pixel values of all images are scaled and the image is reshaped into dimensions of 64*64, and both the type of images are merged together.

Now the images are spilt into training and testing datasets.

200 Images and 4096 Features

Applying Principal Component Analysis

Now we apply PCA on the training dataset and see the average image

After we get the average face we see the shape of the Eigen face and view the eigen face itself

In the Eigen face we can clearly see the spectacles around the face showing that it is an area of high variability.

After verifying the Eigen face we then reduce the dimensions of data used in PCA by calculating the Eigen Space or the “Omega”, which in our case would have the dimension of (150,150). To check omega we plot one of the images of our dataset as a projected image.

After getting the projected image we transform our test and training set into the reduced dimensions with training set having dimension of (150,150) and testing set (50,150).

After we get the new training and testing sets we use them in our neural network model for classification.

Applying Neural Networks

For the Neural Network we built a 4 hidden layer model

4 Hidden layers and 1 output layer having activation function sigmoid

After training the model and testing it with the new testing data we got an accuracy of 100% on the training set as well as with the testing set

Accuracy 100% and an execution time of 22s

Now we can save our models of PCA and NN locally and use it in any program in the future to classify between a person wearing spectacles or not.

I hope all the readers try this and develop new ideas!