Sample size selection in business experimentation needs a mind-shift

Published in

Gravityblog

7 min readNov 7, 2017

Business versus academics

The selection of sample size for experiments that require human participants within business settings seems to be a much more complicated dilemma than it is for academic research. In general, the answer is simple, the more the better. Fortunately, academic researchers typically have at their disposal a huge population of undergraduate students to select from. Convenience sampling (where subjects are selected because of their convenient accessibility and proximity to the researcher) is common practice in academic research, unsurprisingly so because large amounts of data can be gathered and low (if any) incentives are required to motivate the student participants. In some cases, they might offer them some additional points towards their term marks, which I’m sure is more than enough incentive for the average undergrad. Since the reliability of an experiment’s results are always dependent on the sample size, having a large pool of ‘cheap’ participants allows for the appropriate sample size, that will reduce the chance of type I and type II error, to be selected with greater ease.

In business settings however, there is a distinct need for concrete numbers and quick, actionable results. There are budgets and egos on the line, and with these, a deep need of those signing off on the experiment to be in a position of absolute knowledge and control. Whether in academics or business, the experimenter has to provide an accurate estimate of the sample size required to achieve an adequate level of confidence, which can then be brought to those funding the study. This makes absolute sense, budgets are budgets, however, unlike student participants, recruiting participants who have full time jobs and day-to-day responsibilities are generally more costly. This is at least the case for companies whose consumer population are not primarily made up of undergraduate university students.

In an effort to get an appropriate sample that is large enough to actually see results that, with some confidence, are an accurate representation of how the target audience would respond to an intervention, as well as meet stringent budget and time constraints, certain concessions have to be made. Typically, the focus of academic research has been scientific inquiry and pursuit of knowledge (epistemology). In contrast, business experiments are concerned with the application of academic research, with the aim of answering key strategic questions that give decision-makers more certainty when making a strategic change to their products or services. These answers also need to be quick, effective and feasible in order for the company to stay ahead within competitive markets; there generally isn’t time for experiments to span more than a month or two. Whilst a 95% level of confidence is the standard amongst academia, if a business experiment revealed that with 80% confidence, implementing a low cost intervention would result in an increase in profits over the next year or five; this is a beneficial result, and one that should not be dismissed!

Shouldn’t we be more open-minded about levels of confidence?

This is the first concession that needs to be made. Coming from an academic background myself, I’ll be the first to admit I was anchored to the ‘all-powerful’ 0.05 alpha level (corresponding to a 95% level of confidence). It’s amazing how much power this number has in academic writing. Reading a journal article and coming across a p-value of 0.1 immediately shuts off the part of the reader’s brain that considers anything else written in the article to be statistically relevant. The problem is that under the budget, time and sampling constraints that characterise the typical business context, it’s almost impossible to achieve these levels of confidence when sample sizes fall below a certain level. Conforming to this rigid academic mind-set can be damaging when the results of a business experiment are assumed to be less reliable because they fail to achieve statistically significant results with an alpha level of 0.05 or less. In this context, the chance of succumbing to a type II error (failing to reject the null hypothesis when it is false) is far higher, and in so doing, the results of the experiment lose their value as a support to decision-making. Business needs results, and the truth is that an 80% chance although not as strong as 95%, is still very high.

We have to become accustomed to small samples

Secondly, attempting to provide huge estimates for sample sizes is impractical. Make no mistake, if it is possible to manage a large sum of participants, incentivise them, have them be representative and conduct the experiment in a short time frame, then it should be done. But this is rarely the case in experiments that involve humans. Sample sizes are inevitably going to be smaller than we wish, which is a huge downfall to those who want to use inferential statistics for a statistical level of certainty when making a business decision. To do business experiments, we have to be ok with smaller samples and adjust our expectations to meet the realities.

If a business decision requires a 95% confidence level or more before it can be implemented then the sample size must increase, and consequently the budget must increase! If decision-makers understand this fact, then they will also understand that extra funding is needed to offer these levels of certainty. If they suggest simply paying participants less, then you run the risk of reducing the level of effort that participants put into their responses, and therefore the reliability of the results. Incentives can be lowered, only as long as they continue to motivate participants to answer in accordance with their own true preferences.

Can’t we just cut down on the monetary incentives?

The debate about whether incentives should or should not be used is ongoing between experimental economics and psychology research; however in business experimentation, more consideration must be made to the type of people in the sample. If monetary incentives are required to motivate people to participate, they must be at a level that makes taking a half day off work or spending 30 minutes filling out a survey (or multiple) worthwhile, whether it’s a lab experiment, in the field or an online survey. Motivating participants to answer in accordance with their true preferences does not always require the use of money, in fact, psychology experimenters argue strongly against it. However, additional thought should be given to providing participants with an incentive that matches the effort they are likely to exert taking part in your experiment.

Sample size checklist

Before attempting to calculate a sample size estimate to present to decision-makers, ask these questions:

With what level of confidence do they need to make their decision? Do they really need 95% or will 80% or even a 70% chance that implementing a change will lead to their desired behavioural outcomes be a good measure of confidence?
Consider the size of the financial impact that will result from implementing a change. Changes that are expected to have a large impact will probably require higher levels of certainty before implementing them than those expected to have a smaller impact.
Do you expect the effect of your intervention to be subtle or strong? This may seem similar to the point above, but the former concerns financial impact, whilst the latter is more concerned with the expected effect size of the change. For instance, an intervention for which we expect a small change in behaviour (or a small effect size) may indeed have a large financial impact on the business. These cases should be treated separately and require larger sample sizes in order to provide more certainty to the decision-making process.
How large is the total population that is in the market for the companies’ product or service? Larger populations will require larger samples in order to stay representative. If the sample does not represent the greater population, the chance of sampling error affecting the results increases.
Are you comparing subsets of participants or will you be analysing results as a whole? Subsets of a population should always be compared to another subset. In these cases, dividing the population into subsets reduces the number participants that you are comparing and as a result, it is likely that you will need more participants to reach the same level of confidence.
How homogeneous is your market? Do people tend to think alike, or are there many differences in preferences?
How will your sample be selected? Through random sampling? Convenience sampling?
If you are making use of monetary incentives, what is the maximum budget that can be assigned to recruiting participants, and how many can be recruited within this budget, at an incentive amount that motivates them to reveal their true preferences?

Because sometimes small samples are unavoidable in business settings, it is also the role of the experimenter to provide support, guidance and context to decision-makers, based on the results. Lowering confidence intervals and expecting smaller samples increases the standard error and consequently the accuracy and confidence with which we can say “changing X has a Y probability of leading to a change in Z”. This does represent a major limitation to business experimentation of this kind. Assessing the opportunity costs of implementing one change over another, in conjunction with results of the experiment, is how experimenters inform effective decisions; this becomes all the more important when interpreting results that are less statistically certain.

Conclusion

I should make it clear, that I am not saying that we should deliberately drop the reliability of the results to a level below what is required to implement a strategic change, particularly if the costs of failure are high. What I am saying is that if we want to avoid missed opportunities due to type II error, then we have to consider adjusting our sample size expectations to better fit the business context and the constraints that come with it. Even though the academic community may scoff at your 80% confidence level, asking these questions first will help you adjust your sample size to a level that is more practical for business experimentation.

Get in contact

If you are interested in exploring the world of business experimentation within your own business or just out of curiosity, send us an email at experiments@gravityideas.com.

Originally published at www.gravityideas.com.