Omics Diary

A platform to share knowledge and information on how to use Python for systems vaccinology and omics analysis. Also covers interesting scientific literature related to infectious diseases and vaccines. Interested in contributing? Contact me to collaborate!

Follow publication

Member-only story

Understanding the amount of data you need for analysis

Kuan Rong Chan, Ph.D.
Omics Diary
Published in
2 min readDec 31, 2023

Formulas to calculate the statistical parameters to estimate average and variation of data

Why do you need to do your experiments in at least triplicates? Should we do more replicates?

These are questions I often get from students. While adding more replicates can improve your confidence in the data obtained, adding more replicates will prolong your experiments, which may introduce other kinds of confounding factors. For instance, having more replicates may prolong sample processing time, which may affect quality of the measurements. Having too many replicates may also impose fatigue, which may consequently compromise the quality of data.

The purpose of doing replicates is to determine the mean and variance of the measurements (Formula shown in picture below). Generally, increasing replicates will allow better estimation of the mean and standard deviation, especially if the error of your measurements/samples is large.

Note that the denominator for calculation of standard deviation is n-1 (see below), indicating you will need at least triplicates to get a reasonable estimate of the standard deviation. However, doing more replicates will not reduce the standard deviation or error bar, which is a common misconception.

While standard deviation may not improve with more replicates, confidence intervals and standard errors will. This is because such parameters are made to determine if the estimation of the mean is accurate. Based on their formulas, the confidence intervals and standard errors have to be divided by sample sizes (see below). Thus, increasing replicates will reduce the absolute value of confidence intervals and standard errors.

In summary, it is critical to know your research question, and choose the correct parameter to display. For instance, if you are doing an experiment and want to know your variability of measurements, then standard deviation is correct. In that case, you do not have to do exceedingly number of replicates as standard deviation is not affected by number of replicates. However, if doing a meta-analysis, estimating the true mean is crucial so presenting the mean with confidence intervals will be more suitable. In that case, having as many datasets to analyse is better, as this will greatly improve the confidence interval and estimation of mean.

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Omics Diary
Omics Diary

Published in Omics Diary

A platform to share knowledge and information on how to use Python for systems vaccinology and omics analysis. Also covers interesting scientific literature related to infectious diseases and vaccines. Interested in contributing? Contact me to collaborate!

Kuan Rong Chan, Ph.D.
Kuan Rong Chan, Ph.D.

Written by Kuan Rong Chan, Ph.D.

Kuan Rong Chan, PhD, Senior Principal Research Scientist in Duke-NUS Medical School. Virologist | Data Scientist | Loves mahjong | Website: kuanrongchan.com

No responses yet

Write a response