How to Clean Your Market Research Survey Data
Congratulations! You’re done fielding your market research survey.
For weeks, you wrote and edited your survey questions. You bought a SurveyGizmo license. You spent hours learning how to program your survey. You engaged tons of your industry connections to get survey responses — a hard-earned set of 500 respondents.
Getting here wasn’t easy.
But unfortunately, you’re not done. Before drawing findings from your survey, you need to clean your data. This is absolutely essential for maintaining the quality of your research. Here are the three most important things to look for when cleaning your survey data.
These are people who took your survey too fast. To identify these kind of respondents, first calculate the median time spent taking your survey.
The general rule here is to discard responses from people who finished your survey in less than half the median time. There are a few exceptions, like if your survey contains a logic branch that had certain people seeing just a few questions. But generally, anyone answer questions more than twice as fast as the average respondent is likely someone who sped through the survey without giving the questions much thought.
You can identify speeders by downloading your survey data into Excel, then subtracting the “time completed” from the “time started.” Every survey platform I’ve used records these timestamps. If yours doesn’t, you might have to skip this flag, but be sure to check for the following two.
In general, not more than 10% of your survey sample should be disqualified for speeding.
These are respondents who selected the same answer to every (or most) multiple-choice question in your survey. For example, imagine that you asked four open-ended questions about price (like a Van Westendorp question set). A flatliner answered the same thing for all four questions (say, $10).
If you notice that more than 10% of your survey respondents are flagging on flatlining, you may want to look closer at the questions you’re including in your scan for flatliners. Some respondents may only appear to be flatlining, when they are, in fact, giving honest answers. This likely has something to do with how you phrased your questions (for example, if you placed student, 18–22 years, unmarried, and no kids all as the first multiple-choice answer option to questions asking about employment, age, marital status,and children).
You can only determine this by looking at respondents’ individual answers — but don’t bother with this if less than 10% of your survey sample is flatlining.
Gibberish and Contradictory Answers
These types of responses are harder to spot. They require you to look, line by line, at answers to open-ended questions in order to identify ones that 1) are gibberish (i.e., dk3i8sw) and/or 2) don’t correspond to other answers in that row. For example, if someone says they are single at the beginning of the survey, then mention their “wife” or “husband” in a later open-ended question, disqualify that respondent. They are not being honest, and you want data that you can stand on.
If you designed your survey well, your data cleaning shouldn’t result in discarding more than 15% of your responses. If you’re worried you’re throwing out too many, take a closer look at the ones you’re throwing out. Consider keeping a few that give other indications of being good, honest answers or the ones that flagged on only one of the three criteria listed above.
(Also note that some firms, like PeopleFish, will both help you field your survey and clean your survey data for you!)
And finally — don’t actually delete disqualified respondents. Just remove them from your analysis. This is to make sure you can go back and look at who you disqualified later on — to both prove to others that you cleaned your data diligently, and to allow other analysts to double-check your work.