PCA (Principal Components Analysis) applied to images of faces

Sebastian Norena
3 min readApr 26, 2018

--

PCA is very useful for reducing many dimensions into a smaller set of dimensions, as humans can not visualize data on more than 3 dimensions it is usually helpful to reduce multidimensional datasets into 2 or 3 dimensions and graph them in order to get a better understanding of the data.

On this blog Kelvin explains how PCA can be used to reduce the dimensions on a Titanic dataset, from 9 dimensions to 3 dimensions, and plots them on an interactive plot: https://medium.com/@kelfun5354/principal-component-analysis-and-what-is-it-really-using-python-8f04dbdb1600

Here I will explain how to use PCA to get the Eigenvectors of a dataset of faces from the AT&T Laboratories Cambridge.

First the dataset must be downloaded:

!wget http://www.cl.cam.ac.uk/Research/DTG/attarchive/pub/data/att_faces.zip -O att_faces.zip
!unzip att_faces.zip

Read one of the images to make sure the data was downloaded:

import numpy as np
from scipy.misc import imread
import matplotlib.pyplot as plt
img = imread(‘s1/1.pgm’)
img = img.astype(np.uint8)
img = img / 255
plt.imshow(img,cmap=’gray’)

Read all the faces and print 10 rows as an example on how to read them from the Pandas DataFrame:

from glob import iglob
faces = pd.DataFrame([])
for path in iglob(‘*/*.pgm’):
img=imread(path)
face = pd.Series(img.flatten(),name=path)
faces = faces.append(face)

fig, axes = plt.subplots(10,10,figsize=(9,9),
subplot_kw={‘xticks’:[], ‘yticks’:[]},
gridspec_kw=dict(hspace=0.01, wspace=0.01))
for i, ax in enumerate(axes.flat):
ax.imshow(faces.iloc[i].values.reshape(112,92),cmap=”gray”)

Here comes the magic of doing PCA on the images:

from sklearn.decomposition import PCA
#n_components=0.80 means it will return the Eigenvectors that have the 80% of the variation in the dataset
faces_pca = PCA(n_components=0.8)
faces_pca.fit(faces)
fig, axes = plt.subplots(2,10,figsize=(9,3),
subplot_kw={‘xticks’:[], ‘yticks’:[]},
gridspec_kw=dict(hspace=0.01, wspace=0.01))
for i, ax in enumerate(axes.flat):
ax.imshow(faces_pca.components_[i].reshape(112,92),cmap=”gray”)

With these Eigenvectors it is possible to redraw any of the faces on the dataset by executing transform of the PCA object to get the Eigenvectors out, and then inverse_transform on the Eigenvectors to get all the original images:

components = faces_pca.transform(faces)
projected = faces_pca.inverse_transform(components)
fig, axes = plt.subplots(10,10,figsize=(9,9), subplot_kw={'xticks':[], 'yticks':[]},
gridspec_kw=dict(hspace=0.01, wspace=0.01))
for i, ax in enumerate(axes.flat):
ax.imshow(projected[i].reshape(112,92),cmap="gray")

As they where redrawn from the Eigenvectors with 80% of the variation in the dataset, the resulting images lost 20% of their definition.

The dataset with faces was downloaded from AT&T Laboratories Cambridge: http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html

--

--