Five tips for biologists that need to learn about computing
Computers can be powerful allies. They are ideal for automating repetitive
tasks. Furthermore, they can perform calculations and analysis that are too
complex for the human brain to process.
These days many parts of the biological sciences are becoming more and more data driven. Technological advancements have led to a huge increase in the generation of biological data. Data analysis is required to extract biological insights from this data. To a large extent the rate limiting factor in generating insight is the lack of appropriate data analysis tools.
However, as a biologist you may prefer to work in the lab over sitting in front of the computer. You may even ask yourself if learning to program has any benefit to you whatsoever.
Similar to learning a new language there are several benefits to learning
computer programming. Importantly it gives you a new perspective. In fact
professional programmers are encouraged to learn a new programming language every year to allow them to experience new problem solving techniques.
More concretely, learning to program will help you get a better understanding of data analysis. It may even expand your mind to allow you to see new possibilities available to you.
Furthermore, by getting more familiar with computing your interactions with programmers and data analysis folk will improve. So even if you discover that you don’t like programming learning to code will help you express your ideas to people that do.
Tip 1: Start with something simple
A good starting point for learning anything, including computing, is to start
with something simple.
For example, you could try to learn how to think like a computer. These days
computers can perform complex task. However, at their core computers only have a relatively small set of basic capabilities. Information is stored as zeros
and ones and Boolean logic is used to perform calculations. By reading up on how computers work you can start to understand what can be achieved using computers and what their inherent limitations are.
Anther good starting point is to launch a terminal and start interacting with
your computer from the command line. This is the starting point of learning how to automate your data analysis.
A simple, but effective, way of getting into coding is to find friends and
colleagues that enjoy computing and start sharing your endeavours with them. They will no doubt be more than happy to help you along.
Tip 2: Experiment with Python and R
Python and R are good starting points for learning to code in a scientific
context. I would lean towards Python if you are wanting to crunch numbers and towards R if you want to plot figures.
If you have a particular data analysis problem in mind don’t be afraid to
start experimenting. No doubt you will find yourself stuck. Don’t despair, this is normal. There is plenty of material online to try to get you unstuck.
However, in the beginning it is difficult to know what to search for. This is
when it is useful to have a peers that are into computing. Go speak to them.
If you don’t have a particular problem in mind you can have a look at the
problems listed in Project Euler. Again you are more likely to be fighting with obscure error messages than solving the problem at hand. As mentioned above this is normal; don’t despair.
Tip 3: Use version control
Once you find yourself writing your own data analysis scripts one of the
simplest ways of increasing your productivity is to start using version
control. It reduces your fear of changing existing code as you can always roll back to a previously working state. One of the tell-tale signs that you need to use version control is if your project directory contains files named along the lines of: data_analysis.py, new_data_analysis.py, old_data_analysis.py, data_analysis_160916.py, etc.
One of the most popular version control systems is Git and there are many
online sources for learning about it. I would recommend having a look at
A Quick Introduction to Version Control with Git and GitHub by John D. Blischak, Emily R. Davenport, Greg Wilson published in PLOS Computational Biology.
Tip 4: Automate your analysis
Once you are familiar with running scripts from the command line it is time to start automating things. The more you automate the more reproducible the analysis becomes.
One way to automate things is to simply write down all the commands that you need to run in a text file. You can then execute all the commands within that file by calling it with your shell. The “shell” is the program that runs inside your terminal and allows you to interact with the operating systems’ programs and services.
However, if part of your analysis has interdependencies, i.e. your second
command depends on output from the first, you may want to look into using
a tool such as Make. Make is a program that has built-in functionality for
building and resolving dependency graphs.
Tip 5: Take your time
Finally, let’s be honest, learning about computing and coding takes time and effort. However, although you may not be able to learn it all overnight, you will be able to benefit from what you’ve learnt straight away. Even if it is as simple as being able to ask more informed questions.
So be patient and have a read of Peter Norvig’s essay: Teach Yourself
Programming in Ten Years. For those of you that don’t know, Peter Norvig is the Director of Research at Google and knows more than most about computing.
Learning about computing and coding takes time and effort. However, even if you only learn a little it will help you improve your interactions with data
analysis folk. These are the people that you rely on to extract knowledge from all the biological data that you are generating. So it is well worth spending time on.
If you have found this article useful you can find more in depth material at
the: Biologist’s Guide to Computing.