The Startup
Published in

The Startup

Bioinformatics: Biology, Computer Science and Statistics

What is Bioinformatics and what are people using it for?

Before making a career change to the field of Data Science, my field of study was Biochemistry. Even before studying Data Science, Bioinformatics is a field that I had heard a lot about and it interested me but I had not looked into the specifics of it. Now that I have more Computer Science knowledge to better understand the field, I thought I would look into it and wanted to share my findings with others who might be interested in the field.

What is Bioinformatics?


Bioinformatics is a subdiscipline of biology and computer science concerned with the acquisition, storage, analysis, and dissemination of biological data, most often DNA and amino acid sequences. Bioinformatics uses computer programs for a variety of applications, including determining gene and protein functions, establishing evolutionary relationships, and predicting the three-dimensional shapes of proteins.

DNA and RNA are the building blocks of life and contain instructions for the construction of molecular machines called proteins. DNA, RNA and proteins are all macromolecules, or molecules made up of smaller parts. The smaller parts of interest that are variable for both DNA and RNA are nucleotides and for proteins they are amino acids. These three macromolecules are the foundations of of the study of genetic information and they provide an enormous landscape of data and information in terms of their building block structures.

Image of the Primary Structure of a protein which shows the individual amino acid sequence of the protein

Recently, as a society, we have mapped out the entirety of the genetic material for Homo sapiens in the Human Genome Project (HGP). This project along with the many other projects working on other species genomes have provided us with enormous amounts of data on the genetic material and information and it would be too tedious to manually go through and analyze and interpret all of the data. Luckily, with the advancements in computer technology, we no longer need to manually process all of this information. Data Science allows the automation of processing this wealth of genetic material information by creating algorithms that are able to sort, analyze and interpret it.

It is the use of Computer Science to process complex amounts of genetic information and draw insights from the data that defines Bioinformatics.

Application of Bioinformatics

As discussed above, one of the main functions of those in the field is to analyze and interpret genome data. Researchers in the field are able to use this genomic information to further investigate gene variation and expression and thus predict the structures and functions of the resulting proteins. Having access to this knowledge is a powerful tool in epidemiology and in clinical applications for the treatment of patients.

The result of contributions from bioinformatic analysis and the Human Genome Project have allowed the identification and better understanding of disease genes for which drugs that target those specific genes have been designed and grant the opportunity to research and design many more new drugs and therapies.

The application of these drugs and therapies have expanded the field of personalized medicine (also known as precision medicine), in which specialized approaches can be made for patients to treat and prevent disease taking into account variability in genes, environment, and lifestyle for each unique patient.

What software tools are used in Bioinformatics?

Aside from having an understanding of the principles of Biology and more specifically, Genetics, what are the computer science skills that are in demand for those wanting to work in the field of Bioinformatics?
As it turns out, they are very similar to many of the tools used in general Data Science. Proficiency in at least one programming language, most commonly Python, R and Bash are in demand. Being able to navigate relational databases and cloud platforms are also in demand.

These tools will be used for data acquisition, data cleaning, exploratory data analysis, data pipelining and statistical modeling to draw insights for research projects.

Concluding Remarks

Bioinformatics is a relatively new field that is changing rapidly. The field provides great promise in the prevention and treatment of many different diseases and those working in the field might find value in potentially contributing to the improvement the lives of many people. If you have a background or are interested in the fields of computer science and biology, perhaps this is a career path to further explore.




Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +756K followers.

Recommended from Medium


8 SQL Date Functions You Need to Know

How to get started with Hyper-parameter Optimization

What is Predictive Analytics?

Simple Linear Regression from scratch

Pandas Complete guide ( Part 3 )

What is predictive modeling in Data Science?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
John Kundycki

John Kundycki

Data Scientist | Health Enthusiast | Learner

More from Medium

Learning To Program is Not Hard

3 Tips You Need to Be Successful in Data Visualization

6 Soft Skills for Data Scientists Working Remotely

My 13 marathons in data science