When should you bin your continuous variables?

Binning data can provide you with new perspectives. Article includes a secret formula to get best of both worlds

Kuan Rong Chan, Ph.D.
Omics Diary

--

Analysing data by binning continuous variables allows you to identify measurements with big effect sizes. Data from our recent publication by Ong et al., npj Vaccines, 2022

Do you know that continuous variables can be converted to discrete variables by binning? For instance, to look at genes associated with age, one can visualise the correlation between gene abundance and age. Alternatively, age can be binned into categories such as adults and the elderly, where elderly can be classified as age > 65 and adults with age 65 and below. In this case, the effect of age on gene abundance can be visualised using bar charts.

A common question arises. When should you consider your variable as continuous or discrete? Interestingly, both types of analysis can provide different perspectives of the data.

The advantage for analysing data as continuous variables is that you don’t have to set arbitrary cutoffs, allowing evaluation of data trends across a gradient of data points. On the other hand, binning variables can allow better visualisation of magnitude differences between categorical variables. Whether to consider binning your variables will largely depend on your research question, sample size and distribution of the data.

--

--

Kuan Rong Chan, Ph.D.
Omics Diary

Kuan Rong Chan, PhD, Senior Principal Research Scientist in Duke-NUS Medical School. Virologist | Data Scientist | Loves mahjong | Website: kuanrongchan.com