Member-only story

Understanding the amount of data you need for analysis

Using the right statistical parameters can better inform the design of experiments

Kuan Rong Chan, Ph.D.

Published in

Omics Diary

2 min readDec 31, 2023

Formulas to calculate the statistical parameters to estimate average and variation of data

Why do you need to do your experiments in at least triplicates? Should we do more replicates?

These are questions I often get from students. While adding more replicates can improve your confidence in the data obtained, adding more replicates will prolong your experiments, which may introduce other kinds of confounding factors. For instance, having more replicates may prolong sample processing time, which may affect quality of the measurements. Having too many replicates may also impose fatigue, which may consequently compromise the quality of data.

The purpose of doing replicates is to determine the mean and variance of the measurements (Formula shown in picture below). Generally, increasing replicates will allow better estimation of the mean and standard deviation, especially if the error of your measurements/samples is large.

Note that the denominator for calculation of standard deviation is n-1 (see below), indicating you will need at least triplicates to get a reasonable estimate of the standard deviation. However, doing more replicates will not reduce the standard deviation or error bar, which is a common misconception.

While standard deviation may not improve with more replicates, confidence intervals and standard errors will. This is because such parameters are made to determine if the estimation of the mean is accurate. Based on their formulas, the confidence intervals and standard errors have to be divided by sample sizes (see below). Thus, increasing replicates will reduce the absolute value of confidence intervals and standard errors.

In summary, it is critical to know your research question, and choose the correct parameter to display. For instance, if you are doing an experiment and want to know your variability of measurements, then standard deviation is correct. In that case, you do not have to do exceedingly number of replicates as standard deviation is not affected by number of replicates. However, if doing a meta-analysis, estimating the true mean is crucial so presenting the mean with confidence intervals will be more suitable. In that case, having as many datasets to analyse is better, as this will greatly improve the confidence interval and estimation of mean.