The Future of Computational Biology

Outside Two Standard Deviations
4 min readFeb 9, 2018

One of the things I enjoy is introspecting, synthesizing and projecting my thoughts to make an educated guess about the future. I recommend this activity to everyone, it helps a lot to frame things and keep in mind where we are going. Add a nice glass of wine to this process with some smooth jazz and you get an evening well spent.

Gazing into the future! One expectation is that in the future there will be more women in science :)

These days since I am on the job market, I decided to think more about the continuing evolution of computational biology and try to extrapolate to where we might be in a few years.

My summary is that we are going to see 1). The line between computational and bench researchers will blur and ultimately dissolve 2). Cutting edge computer science research (especially in machine learning) will be incorporated immediately in medical research and 3). Big institutions such as journals and universities will put increasing emphasis on open code and reproducible research.

Personnel

Currently, The trend which I observe in education around me is an increase in the number of classes that have a ‘computational’ component to them. Let me give you an example. In first year of graduate school, I had to take a course wherein we analyzed old research papers pointing out the strengths, weaknesses and potential improvements for each of them.

This was on the strictly scientific side. I have previously written about some of the challenges and inadvertant miscommunication that arise when computational and wet lab researchers work together. So it is evident that these silos are creating problems in advancing science. My expectation is that with the advent of more computationally literate biologists and more biologically literate computational folks these problems will get solved and we will move towards a discipline without such artificial boundaries. To some degree, some of the older sciences already have this phenomenon. I have come across theoretical and experimental physicists but its rare for either type to not have a decent background in coding and mathematics. My prediction is this is the direction in which biology will move.

Science

In the last 20 years or so, we have published the first draft of the complete human genome and gone from strength to strength, dropping the cost exponentially. In 1999, it cost $3 billion to sequence the human genome, and today it can be done for $1000, i.e. 10⁷ times cheaper — a true exponential decline. In between, we have learnt the proper ways to normalize and analyze newer and newer types of data sets (microarrays → RNA-Seq → single cell RNA-Seq) and made leaps of progress in epi-genomics getting a more complete picture of biology. Today, we are actively using machine learning methods to ‘learn’ more about biology, often reducing the need for costly and time consuming experiments.

In the future, I am confident we will keep pushing the boundaries of biology using computer science and newer, better machine learning techniques will help us understand biology without too many costly experiments. Computers and AI will be slowly but surely completely integrated into biology and medicine in general. And, I think it will not just be machine learning that will cross over rapidly — even advances in data structures will cross over much quicker than historically as medical data increases in size and complexity. I think for all this to happen though, we will need biologists who are computer scientists and computer scientists who are biologists. We will need to break the existing silos and ‘specializations’ and truly merge these fields.

Institutions

A few months ago, this tweet from one of my colleague’s went viral and recieved mostly positive feedback (hey! it’s twitter. some negative comments are to be expected). Every reader of scientific literature has come across this line at one point or another, gotten tremendously frustrated trying to figure out the underlying code and ultimately given up. This is bad for 2 reasons — a). the confidence in the papers claims are harmed due to this and b). it is harder for a new ‘entrant’ to gain access to the field. At the same time, there is a big debate brewing in the scientific community about reproducibility in general with many high profile papers unable to be replicated.

I believe that the situation has reached breaking point and future papers will be much more reproducible than past papers. Important institutions in the ecosystem — publishers and promoters (i.e. universities/departments) are trying to already solve the problem. Many journals now encourage submission of R Markdowns or other similar documents to ensure every figure is reproducible. The problem is also much easier to solve for computational research than wet lab biology. So with trends of increased incorporation of computational techniques and increasing computational savvy of everyone in the research system, I expect that in the future, the standard practice will be to provide a code document which reproduces all the figures panel by panel along with the publication.

--

--

Outside Two Standard Deviations

A blog about things in AI, healthcare and biotechnology. Things outside two standard deviations :)