This is great. What if I have a mixed dataset with numeric and categorical values? What do I need to do? PCA works only on numbers. Is it ok to factorise the categorical variables and use the underlying numeric representation of the factor? If we use the dummy approach (as implemented in the dummies R-package — basically creating 1 binary column for each level we end up with a potentially very high number of columns and the weights of the principal components will be very low so basically you need to consider a lot of components to not loose information). Does it make sense?