How to split a dataframe into 3 (or more) representative subsets in R

Paul Vella
Ones & Zeroes
Published in
4 min readJun 18, 2021

--

Consider this scenario. You have a list of customers or members and you want to test two versions of a promotional piece (i.e. one copy with high urgency and another emphasising a discount) to see which messaging works best for this segment. To ensure the results actually reflect differences due to the copy you need to have the following:

  • set aside 10% of the customers as a control group who receive nothing
  • the two groups receiving campaign materials should be approximately equal in size
  • obviously, no customer overlap between the groups, and
  • all three groups are representative of the population

How then do you split a list into 3 groups, with 2 equally sized AND ensure they are all representative?

Background

I had a list of 23,805 members of a loyalty program that I needed to split into three groups that were similar in terms of member type (low, middle, high), gender, location, and age groups. Starting with sample sizes, I expected my 10% control group to be around n=2,380 and the other groups to be n=10,712 each. Knowing this I could then trial a few methods in R to achieve my goal. My final code is shown below — based on the solution provided to creating a…

--

--