How to Generate Correlated Data in R
Hands-on example of how you can generate correlated data in R
Sometimes we need to generate correlated data for exhibition purposes, technical assessments, testing, etc. We have provided a walk-through example of how to generate correlated data in Python using the scikit-learn
library. In R, as far as I know, there is not any library that allows us to generate correlated data. For that reason, we will work with the simulated data from the Multivariate Normal Distribution. I would suggest having a look at the variance-covariance matrix and the relationship between correlation and covariance.
Generate Correlated Data
We will generate 1000 observations from the Multivariate Normal Distribution of 3 Gaussians as follows:
- V1~N(10,1), V2~N(5,1), V3~N(2,1)
- The correlation of V1 vs V2 is around -0.8, the correlation of V1 vs V2 is around -0.7 and the correlation of V2 vs V3 is around 0.9
library(MASS)
library(tidyverse)
library(GGally)