How to Generate Correlated Data in R

Hands-on example of how you can generate correlated data in R

George Pipis
Geek Culture

--

Sometimes we need to generate correlated data for exhibition purposes, technical assessments, testing, etc. We have provided a walk-through example of how to generate correlated data in Python using the scikit-learn library. In R, as far as I know, there is not any library that allows us to generate correlated data. For that reason, we will work with the simulated data from the Multivariate Normal Distribution. I would suggest having a look at the variance-covariance matrix and the relationship between correlation and covariance.

Generate Correlated Data

We will generate 1000 observations from the Multivariate Normal Distribution of 3 Gaussians as follows:

  • V1~N(10,1), V2~N(5,1), V3~N(2,1)
  • The correlation of V1 vs V2 is around -0.8, the correlation of V1 vs V2 is around -0.7 and the correlation of V2 vs V3 is around 0.9
library(MASS)
library(tidyverse)
library(GGally)

--

--

George Pipis
Geek Culture

Sr. Director, Data Scientist @ Persado | Co-founder of the Data Science blog: https://predictivehacks.com/