Surveys: exploring statistical significance
WARNING: Some stats and math ahead. Mostly based on this lovely post: T-test explained: what they mean for survey analysis
Who doesn’t like surveys?
Well, most people. And yet, we love using them in organizational contexts for various purposes.
One big challenge in using them in that context is that they are a one-sided exchange of information. And while that makes sense in many other contexts, for example, when asking customers for feedback about a product; inside the organization, what we’re really trying to create is dialogue, since survey “takers” have a big part to play in addressing any insights that may come up from the survey. But that’s a topic for a different post. Today, I want to focus on something a lot more concrete.
We like using surveys because they can provide us with a quantitative assessment of a situation. For example, to measure “how are we doing?” in a particular area and to track it over time or across different organizational demographics. But sometimes, if we’re not analyzing the data carefully enough, over-reliance on surveys can lead us to over-react.
Let’s say that we ran an inclusion survey in which participants were asked to respond to the following statement using our beloved 5-point Likert scale: “When I speak up, my opinion is valued”. When analyzing the survey results we discovered that women, on average scored a 4.5, while men, on average scored a 4.3. Can we say based on the survey data that men and women in our organization are not given an equal voice?
The answer, as always, is: “it depends”. Depends on what? Glad you asked! It depends on the following things:
- The size of our organization and the participation rate in our survey
- The confidence level we want to have in our answer. The standard 95% confidence level means that if we ran the survey again, we’ll reach the same conclusion 95% of the times.
- The difference in the means between the two groups
- The standard deviation of the responses in each of the group
1–3 are fairly straight forward. The standard deviations is the least intuitive of the bunch so we’ll focus on it and say that: assuming an organization of a certain size, in order for a certain difference in means to be statistically significant at a certain confidence level, the standard deviation of the results in each group must fall below a certain maximal threshold.
More so: the smaller the org (or the lower the participation rate), the smaller the difference in means and the higher the confidence level required— the lower the standard deviation threshold will be.
Let’s make this a bit more concrete: assuming a best case scenario in which there’s full participation in the survey and the groups are of equal size — these would be the standard deviation thresholds for various combinations of org size (n), confidence levels, and difference in means:
So in a 100-person organization, in order for a 0.1 difference in means to be statistically significant at a 95% confidence level, the standard deviation of both groups must be below 0.25. Keep in mind that this is the best case scenario, so if participation was lower or the groups were not equal in size, that threshold will be even lower.
Which leads us to the next question: what does a 0.25 standard deviation look like? Sure we can do the math and crunch the numbers, but for those of us (yours truly included) who don’t have a strong statistical intuition this may help:
The next time I’m running a survey, before jumping to action simply by looking at the means, I plan to look up my standard deviations at the table above and figure out whether action is truly needed. I’d encourage you to do the same :-)