Resources to become a computational biologist outside of academia

A shortlist

Deena Blumenkrantz
Deena Does Data Science
8 min readSep 7, 2018

--

Choosing someone to be your teacher is to bestow a great honor — you are giving them your time, which has a high opportunity cost, and your trust. I trust my teachers to tell me what they think is important given their knowledge and experience. Of course that does not relieve me of responsibility. Doing everything your teacher says is foolish. In the world of academia it is often said that your boss, aka the principle investigator (PI), is not the expert when it comes to your project. Instead, you, the PhD student or postdoc, are the expert because you hold specific relevant knowledge while your PI holds much more broad and varied knowledge. This makes sense because your PI has to connect more disparate far reaching stories to direct you and others in their research team, while you focus on one (or two) “local” stories. Good working situations should also reflect this interpersonal chemistry.

I am writing this blog because I have spent the last few months going around like P.D. Easterman’s newly hatched bird asking “Are you my teacher?” and I needed an outlet for my findings. Below is a list of resources that I have brought to the surface after sifting through a much longer list. The longer list can be found in this google doc. This doc is open to edits and your addition of resources that have helped you learn are both welcome and appreciated. For you to gauge if my filtering is relevant for you, it will help to know a little about me.

I graduated from UC Davis with a BS in Biochemistry and Molecular Biology, I got my PhD from Imperial College London, and I did a postdoc at Johns Hopkins. During my PhD and Postdoc I collaborated with bioinformaticians and used Geneious software to visualize phylogenetic analyses (aka hierarchical clustering) that allowed me to make fascinating insights into influenza virus evolution. When I started this transition, the most coding I had done was to write a short script for ImageJ, and a few if then statements in Excel, which basically means I did not know how to code!

Below I have outlined the resources that I think are worth looking into on my path to become a bioinformatician. Writing all this down helps me see that there is a finite amount of material that would get me to an intermediate level. (Hopefully, I will be employable before I get through all of it.) Furthermore, writing it down allows me to crowdsource input from all of you — please comment below to add things that I missed!

To become a bioinformatician, I figure that I need a five-pronged approach to learning. I plan to interweave my study time between these topics:

  • (1) Coding: I need to learn to use Python and some of its libraries. When Python is not enough, I need to be able to use R, SQL, CSS, and HTML to process and present big data
  • (2) Genetics and proteomics: I need to be able to design primers and barcodes with code. I need to understand the difference between de-novo and template assembly of microarray and NGS data and be able to perform both. I want to understand and be able to perform protein docking predictions. I also want to understand and be able to produce protein network diagrams. Generally, I want to predict how molecular and cellular interactions affect health
  • (3) Math and Machine Learning: I took calculus and basic statistics in college and I used some stats during my research, but I need to brush up on them. Plus I need to become strong in linear algebra and probability in order to build predictive models
  • (4) Projects!!! I can’t over emphasize this enough — projects are what push me to learn. They are what get me out of bed in the morning. The desire to know the answer gives me drive and keeps me going
  • (5) Blog: once a week to reflect on what I learned, what I liked and what I didn’t. To note any new resources that I think might be worth using and ask the community questions

Coding resources

Genetics, proteomics, and systems-biology resources

Math and machine learning resources

Bioinformatics Conferences

Projects

  • Kaggle: google with keywords: cancer, protein, seq, genome, gene expression, microarray, single cell, RNA, medicine, or virus, etc…
  • Google Data Search: search with same terms above

Blog

  • Medium: a good place to start
  • GitHub Pages: I might switch to this when I want to show my code

Disclaimer: I don’t claim to be perfect. If you catch a mistake or if you have better links, summaries or important points about any of these items, please comment below.

Bootcamps: I would be remiss not to mention bootcamps. Bootcamps offer many free seminars and I have taken some of those. I haven’t taken any bootcamp series longer than two days. Through talking to people about their bootcamp experience, I gather that the trick is to sign up for a bootcamp at the right time. You want to have some competencies and some questions. Bootcamps seem to provide a network of people that might connect you with a job, so the most important thing to ask the admissions office is where do their graduates work and the most important thing to ask yourself is do you want to work in those companies. It is also important to hear that it can take up to 6 months after doing a bootcamp to find a job if you are in the 80% of students who came into that bootcamp with significant competencies. This was different three years ago when you didn’t need to know any programming to enter a bootcamp and it was easy to find an entry level job afterward, but we now have a lot more Computer Science majors to compete against. (If anyone wants to post a graph or even ideas on how to gather data to support this point I would very much appreciate it!)

Udemy: I haven’t listed any Udemy courses here because I haven’t heard any raving reviews. Zero to Hero Python 3 is the only course that stuck out. I prefer to learn from PyCon videos. If you loved a Udemy course, please let the rest of us know what it was by leaving a comment.

Thanks for reading!

--

--

Deena Blumenkrantz
Deena Does Data Science

I’m a molecular virologist training to become a computational biologist / bioinformatician