A deep dive into Factor Analysis

7 min readJan 24, 2022

Extensive and unnecessary datasets are a nightmare for any data scientist as several attributes affect the performance of machine learning algorithms. So, dimensionality reduction techniques are necessary to reduce the number of attributes we can further use for analysis.

In this article, I will be discussing Factor Analysis, a dimensionality reduction technique, and give an overview of this statistical technique. The basic outline of the working of Factor Analysis; when and how to employ factor analysis in the dataset are all discussed in the content piece.

If you want to get an overview of Dimensionality Reduction, you can refer to this blog.

Introduction

Factor Analysis is an unsupervised, probabilistic machine learning algorithm used for dimensionality reduction. It aims at regrouping the correlated variables into fewer latent variables called factors that share a common variance. The main aim of the factor analysis is to find the intercorrelations among n variables through a set of common factors (the number of factors is less than the n variables). In simple terms, it groups the variables into meaningful categories.

Factor Analysis is based on the idea that the latent factors are in lower-dimensional space. The new observations are modeled as a linear transformation of latent variables plus Gaussian noise.

Pre-requisites for factor analysis

To get accurate results, we should check the following pre-requisites before applying the factor analysis -

We should apply factor Analysis on the large datasets to diminish the error in the final results. There should be high factor loading scores (>0.80) to use the factor analysis on a small dataset.
The correlation between the factors and variables should be at least 0.30, as anything lower than this would imply a weak relationship between the variables.
The SMC(Squared Multiple Correlation) of the dataset needs to be checked, and variables that have issues with a singularity, i.e., SMC close to 0 and multicollinearity, i.e., SMC close to 1.0, should be removed from your dataset.
We should also remove outliers from the dataset.
The dataset should be standard scaled, and we should convert the categorical features to numerical features.

Assumptions

No perfect multicollinearity: Factor analysis is an interdependency technique.
Homoscedasticity: Since factor analysis is a linear function of measured variables, it does not require homoscedasticity between the variables.

Terminology

Before learning the methods to apply factor analysis, let’s briefly glimpse the basic terminologies used later in this article.

1. Factor

The factor is a latent (hidden or unobserved) variable representing the correlated variables that share a common variance. The maximum number of factors is equal to the number of variables.

2. Eigenvalues (Characteristic Roots)

Eigenvalues represent the total variance that a given principal component can explain. Variance cannot be negative, so negative eigenvalues imply an incorrect model. In contrast, eigenvalues close to zero indicate multicollinearity as the first component can take up all the variance. E.g., an eigenvalue of 2.5 means that the factor would explain the variance of 2.5 variables.

3. Factor Loadings

Factor loading is the correlation coefficient for the variable and factor. It is a measure of how much the variable contributes to the factor. So, a high factor loading score means that the variables better consider the dimensions of the factors.

4. Communalities

Communalities are the sum of the squared loadings for each variable. It indicates the amount of variance in each variable. If the communalities for a particular variable are low, say between 0–0.5, then this suggests the variable will not load significantly on any factor. Rotations don’t have any influence over the communalities of the variables.

Implementation of Factor Analysis

The various steps involved in factor analysis are:

Checking the factorability of factor analysis
Determining the number of factors
Interpreting the factors

Let’s go through each step one by one in detail.

Factorability of factor analysis

Checking the factorability of the dataset means ‘can we find the factors in the dataset?’. The methods to check the factorability are

Correlation Matrix Check
KMO MSA Check
Bartlett’s Test of Sphericity

Correlation Matrix Check

Is the given dataset a combination of high and low correlations? If yes, then we can proceed with the factor analysis.

KMO MSA Check

The Kaiser-Meyer-Olkin Measure of Sampling Adequacy tests whether the partial correlations among variables are minor. It is a statistic that specifies the proportions of variance in the variables caused by underlying factors. High values, i.e., close to 1, indicate that the factor analysis is helpful for the dataset. If the value is less than 0.5, one shouldn’t proceed with the factor analysis.

Bartlett’s Test of Sphericity

It tests the hypothesis that the correlation matrix is an identity matrix. It will further indicate that the variables are unrelated, and therefore, factor analysis is not applicable here. If the value is less than 0.05, you can go ahead with the factor analysis. Essentially it checks to see a certain redundancy between the variables that we can summarize with a few factors. The null hypothesis of the test is that the variables are orthogonal, i.e., not correlated.

Determining the number of factors

If we extract too many factors, then undesirable error variance might occur. On the other hand, removing a handful of factors might leave out valuable common variance. So it’s essential to select the most feasible way to decide the number of factors to extract.

Mainly, the eigenvalues and scree test, i.e., scree plot, determine the number of factors to retain. An eigenvalue is an analytical approach, while the scree plot is the graphical approach. Let’s see both methods in detail.

Analytical Approach

This approach is also known as Kaiser’s Criterion. In this method, all the factors above the eigenvalue of 1 are retained. An eigenvalue of more than one means that the factor explains more variance than a unique variable. The reason for choosing factors having eigenvalues more than 1 is quite simple. Our data is standard scaled, so the feature’s variance is also 1. We get the factors that explain more variance than a single observed variable.

It has been found that this criterion sometimes results in overestimation in the number of factors extracted. So, a better approach would be to use the scree test in conjugation with the analytical method.

Scree Test (Graphical Approach)

The graphical approach is based on the visual representation of factors’ eigenvalues, called scree plots.

Source of the image

The scree plot consists of eigenvalues and factors. The number of factors to be retained are the data points that are left to the “elbow” of the graph. The elbow of the graph is the point where the eigenvalues seem to level off.

The scree plot consists of eigenvalues and factors. The number of factors to be retained are the data points left to the graph’s “elbow.” The elbow of the chart is the point where the eigenvalues seem to level off.

Interpreting the Factors

After finding the optimal number of factors, we need to interpret the factors with the help of factor loadings, commonalities, and variance. Interpretation of factors is essential to determine the strength of the relationships among the variables in the factors.

We can identify the factors through the most significant loadings. The zero loadings and the low loadings are used to confirm the identification of the factors. The signs of the loadings show the direction of the correlation does not affect the interpretation of the magnitude of factor loading or the number of factors to retain. Loading scores range from -1 to 1. Values closer to -1 or 1 indicate that the factor influences these variables. In contrast, if the loading value is more relative to 0, the factor has a lower influence on the variable.

If the factor analysis is unrotated, then the variances will be equal to eigenvalues as rotation changes the distribution of the proportional variance, keeping the cumulative variance the same.

Communality is the proportion of each variable’s variance that the factors explain. Rotations don’t have any influence over the communality of the variables. Higher communality indicated that a more considerable amount of the variance in the variable had been extracted by the factor solution. For better measurement of factor, analysis communalities should be 0.4 or greater.

Factor Analysis with Rotation

Applying rotation and factor analysis does not inherently improve the predictive value of the derived factors. Still, it does help in better visualization and interpretation of the factors since the unrotated factors are sometimes ambiguous.

There are two types of rotation: Orthogonal Rotation & Oblique Rotation.

Orthogonal rotation is when the factors are rotated 90° from each other. Two standard orthogonal techniques are Quartimax and Varimax rotation. Quartimax involves minimizing the number of factors needed to explain each variable. Varimax reduces the number of variables with high loadings on each factor and makes small loadings even smaller.

Oblique rotation is when the factors are not rotated 90° from each other. The standard oblique rotation techniques are Direct Oblimin and Promax. Direct Oblimin attempts to simplify the output structure, while Promax is expedient because of its speed in larger datasets. Promax involves raising the loadings to a power of four which ultimately results in more excellent correlations among the factors and achieves a simple structure. The only problem with oblique rotation is that it makes the factors correlated.

Additional Resources

In this article, I have omitted the mathematical model and derivation of factor analysis as it will make the article lengthy, but if you are interested, then check out the link below

https://www.cs.princeton.edu/~bee/courses/scribe/lec_10_02_2013.pdf

Scikit learn has also explained the maths behind factor analysis in its user guide. Do check it out.

2.5. Decomposing signals in components (matrix factorization problems)

PCA is used to decompose a multivariate dataset in a set of successive orthogonal components that explain a maximum…

scikit-learn.org

For the implementation of the factor analysis with rotation, you can check the scikit learn user guide.

Factor Analysis (with rotation) to visualize patterns

Investigating the Iris dataset, we see that sepal length, petal length, and petal width are highly correlated. Sepal…