Feature Extraction Using Factor Analysis in R

Kelly Szutu
Analytics Vidhya
Published in
5 min readApr 15, 2020
Photo by Bill Oxford on Unsplash

What is Feature Extraction? A process to reduce the number of features in a dataset by creating new features from the existing ones. The new reduced subset is able to summarize most of the information contained in the original set of features.

There are two methods in feature extraction: factor analysis and principal component analysis. I’ll first talk about factor analysis in this post.

To eliminate the correlation between a large number of variables, we use factor analysis to find the root factors that represent their dependent ones. As we simplify the data, we also want to retain as much information as possible. Here, I use the “Brand Ratings” data (provided by “R for Marketing Research and Analytics” by Christopher N. Chapman, Elea McDonnell Feit, Springer) as an example.

rating <- read.csv("http://goo.gl/IQl8nc")
dim(rating)
head(rating)

There are 1000 rows and 10 columns in this data. Each row represents the consumer’s rating of a brand (column 10) in different aspects (column 1 to 9).

To make the data more consistent and have similar scales, we first use scale function to standardize the ratings.

rating.sc <- rating
rating.sc[,1:9] <- scale(rating[,1:9])
head(rating.sc)
summary(rating.sc)

Then, we do statistical testing to see the correlation between each independent variable. This will give us both a correlation matrix and a p-values table.

library(psych)
corr.test(as.matrix(rating.sc[,1:9]))

From the correlation matrix, we can see that there are some values greater than 0.5 (or some will use 0.6 as the index), meaning that there is a collinearity issue over here.

Root factors

Given that there are some similar variables, next, we’re going to estimate the number of factors by using scree test and eigenvalue test. Let’s see the result first.

library(nFactors)
nScree(rating.sc[,1:9])
eigen(cor(rating.sc[,1:9]))

Scree test contains four measurement index: optical coordinates (oc), acceleration factors (af), parallel analysis (parallel), and kaiser rule (kaiser). These values indicate how many factors are able to be selected, however, sometimes they are not the same (just like what the result shows). Then we can take the mode (in this case 3 factors) or check with the eigenvalue.

Eigenvalue test is another way to determine the number of factors. Usually, we keep the factors with eigenvalue > 1 (or some will adopt > 2). Therefore, we got 3 factors here.

Factor Rotation

After knowing the number of factors to use, we apply it (factors = 3) to the factor analysis. There are three rotation types we can try: varimax, oblique, none.

library(GPArotation)
factanal(rating.sc[,2:6], factors=3, rotation=”varimax”)
factanal(rating.sc[,2:6], factors=3, rotation=”oblimin”)
factanal(rating.sc[,2:6], factors=3, rotation=”none”)

1. Varimax (default): an orthogonal method of rotation that minimizes the number of variables with high loadings. It results in uncorrelated factors.

2. Oblique: the axes are not maintained at right angles, and allow correlation between factors (which can sometimes simplify the factor pattern matrix).

3. None: factor analysis without rotation.

In the above pictures, I put factor 1's attributes with blue shadow, factor 2’s with green, and factor 3’s with pink. We can see that whether adjusting the rotation or not in this case, there's only a bit different on the order of factor attributes and % cumulative loadings. (the overall patterns are similar)

Below, I adopt rotation=“varimax” and scores = “Bartlett” for the following analysis explanation.

factor_res = factanal(rating.sc[,1:9], factors=3, scores=”Bartlett”)

Factor Loadings

Factor loadings show the weights that determine how each factor affects each attribute. These loadings (take its absolute values) help to assign attributes to factors and determine the name that best describes the factor.

By explaining the loading matrix, we know attributes “bargain” and “value” contribute a lot to factor 1, so do “leader” to factor 2 and “latest” to factor 2. Thus, we can conclude these variables are the root factors. Additionally, the cumulative loading shows us that 57% of the data can be explained by factor 1, 2 and 3. In other words, in this case, when reducing 9 factors to 3 factors, there is 43% information loss.

Factor Scores

Next, let’s get the factor scores and add matching brand names for each individual on each factor.

brand.scores <- data.frame(factor_res$scores) 
brand.scores$brand <- rating.sc$brand
head(brand.scores)

Lastly, we compute mean values for each factor across brands and do some descriptive analysis.

(brand.mean <- aggregate(brand.scores[,1:3], list(brand.scores[,4]), mean))

We can find that brand f and g show strong value for money (factor 1), while they are positioned as the less latest ones (factor 3). Moreover, brand b and c are perceived as the leading brands (factor 2).

About me

Hey, I’m Kelly, a business analytics graduate student with journalism and communication background who likes to share the life of exploring data and interesting findings. If you have any questions, feel free to contact me at kelly.szutu@gmail.com

--

--

Kelly Szutu
Analytics Vidhya

Journalist x Data Visualization | Data Analyst x Machine Learning | Python, SQL, Tableau | LinkedIn: www.linkedin.com/in/szutuct/