5 most important considerations for biological data analysis

Ignoring these fundamentals could lead to wrong interpretation of data that complicates your machine learning models

Kuan Rong Chan, Ph.D.
Omics Diary
Published in
5 min readFeb 6, 2022

--

It’s a great sense of achievement to know that you have finished analysing your dataset. However, it is almost always a good habit to check through your data analysis and at times, even re-analyse the data in a different way to understand the data better and make the data analysis more outstanding. In this blog entry, I will highlight some key considerations that could be taken into account when checking through your data analysis.

  1. Could your variables be categorised differently, or expressed as continuous variables for data analysis?
Two different ways of showing association of age with antibody responses

In some instances, the explanatory variable can be expressed as a categorical variable or a continuous variable. If that is indeed the case, I would recommend analysing the data both ways. For example, consider a research question that studies the effect of age on vaccine antibody response. Based on literature, you may have classified subjects into two groups: (I) elderly subjects as 65 years of age and above; (II) young subjects as lower than 65 years of age. The antibody responses are then compared in these 2 groups.

However, this method of analysis is based on the assumption that 65 years of age can effectively classify the young and elderly. Hence, the analysis should also be done in a different way. You can plot a correlation plot of antibody response against age, and evaluate if the correlation is significant. Both analyses methods ask the research question in a slightly different way (see figure above for better clarity), but provides deeper insights into the association between antibodies and age.

2) Are the cutoff values in your data analysis justified?

We often choose default values as cutoff values for assigning categories, or filtering of data. However, the default values may not be applicable for every dataset. This is where a deeper knowledge of the analysis tools will be helpful, as you will be able to better appreciate the assumptions applied in the analyses. For instance, let’s consider the scenario where you are interested to find out the molecular pathways…

--

--

Kuan Rong Chan, Ph.D.
Omics Diary

Kuan Rong Chan, PhD, Senior Principal Research Scientist in Duke-NUS Medical School. Virologist | Data Scientist | Loves mahjong | Website: kuanrongchan.com