R for Biological Data Analysis

Published in

Women Techmakers Nagpur

3 min readFeb 15, 2021

In the past few decades from the birth of ‘The Human Genome Project’ to the present, leaps in genomic research have led to massive amounts of biological data. Not only genomic but different fields of omics like Proteomics, Metabolomics, and Transcriptomics have led to a huge amount of data contributing to novel discovery in the field of healthcare.

The R language is most commonly used by academicians and experts to analyze this huge data and get meaningful results from this. Biomarker discovery, being one of them, that is carried out on daily basis as a need for blood tests is courtesy of analyzing these biological data. This kind of data can be obtained using LC-MS/MS (Tandem Mass Spectrometry) setup.

Genetics, drug discovery, bioinformatics, epidemiology, etc. are some fields in the healthcare industry that use R heavily. It is used to analyze and predict the spreading of various diseases, for analyzing genetic sequences, to analyze drug-safety data, and to analyze various permutations and combinations of drugs and chemicals as well. R’s Bioconductor package provides facilities for analyzing genomic data. Lastly, R is a god-send for pre-clinical trials of all new drugs and medical techniques.

Let’s understand R Environment:

“R is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing” as defined by the R Core Team.

Being open-source, it plays a significant role in research purpose as it is accessible to non-professional as well, but it comes with the disadvantage of having ‘NO WARRANTY.’
R is highly extensible through the use of user-submitted packages for specific functions or specific areas of study. Not only R but also other programming languages like Python and MATLAB are used by academicians to get the best of the output.

R can be easily installed into one’s system via downloading the software from this site — http://www.cran.r-project.org/. Before this, it is advisable to download the “rtools” library so as to not face installation issues with packages. It works well with all third-party environments like RStudio.
There are many packages available inside the R environment for analyzing different types of data. Graphing template packages like “ggplot2” give R language advantage over other languages as data visualization is better. In R, packages are interdependent so it is required to have a prerequisite package already installed before installing the package which will otherwise lead to many errors even if installed.
R has five basic or “atomic: classes of objects: character, numeric, integer, complex and, logical. The most basic object is a vector. A vector can only contain objects of the same class, but the one exception is a list that is represented as a vector but can contain objects of different classes. R objects have attributes.
It is very important to calculate memory requirements while analyzing big data as it will determine the running time which can be very long if not properly determined.
Lexical Scoping is what differentiates R from S language, which is the parent language of R. Lexical Scoping means the values of free variables are searched for in the environment in which the function was defined. An environment is a collection of (symbol, value) pairs. Lexical Scoping is needed since the function can be defined in another function that becomes the body of another function. It gives the appearance of dynamic scoping. Other languages that support lexical scoping are Scheme, Perl, Python, Common Lisp.

All in all, R has come out to be the most preferred programming language by academicians for data analysis and visualization owing to its graphical packages, open-source and large-scale users in various fields.

R for Biological Data Analysis

Written by SaloniG