How to split a dataframe into 3 (or more) representative subsets in R
Consider this scenario. You have a list of customers or members and you want to test two versions of a promotional piece (i.e. one copy with high urgency and another emphasising a discount) to see which messaging works best for this segment. To ensure the results actually reflect differences due to the copy you need to have the following:
- set aside 10% of the customers as a control group who receive nothing
- the two groups receiving campaign materials should be approximately equal in size
- obviously, no customer overlap between the groups, and
- all three groups are representative of the population
How then do you split a list into 3 groups, with 2 equally sized AND ensure they are all representative?
Background
I had a list of 23,805 members of a loyalty program that I needed to split into three groups that were similar in terms of member type (low, middle, high), gender, location, and age groups. Starting with sample sizes, I expected my 10% control group to be around n=2,380 and the other groups to be n=10,712 each. Knowing this I could then trial a few methods in R to achieve my goal. My final code is shown below — based on the solution provided to creating a…