Bioinformatics for the masses?

It’s safe to say that if you had to use a bioinformatics pipeline, you had a rough ride at some point.

At InSilico DB we run bioinformatics pipelines at-scale on 100,000s of samples. Even though those pipelines are open source and freely available, there’s a significant overhead in running, configuring, documenting their implementation, and updating them.

The folk that gather for example at Genome Informatics Conference at CSHL get all the new data and heroically write scripts to crunch it for the first time. Luckily, bioinformaticians are a very open crowd that publishes their scripts on GitHub. As it trickles down to the users (versus) developers of the pipelines, there remains still a big challenge in compiling these tools and running them on your own environment. There was a recent review of the travails involved compiled from “hackathons and workshops of the EU COST action SeqAhead”.

So, when we started hearing the words bioinformatics and Docker together we started to get really excited. For example:

We started dreaming about:

  • being able to support any type of data if there is a corresponding type of pipeline available would make so many of our users happy
  • being completely transparent about the algorithms that were run
  • allowing the bioinformatics-inclined user to tweak the pipelines without having to go into the guts of our cloud platform

What do you think, is this the moment when bioinformatics can be distributed to the masses?

My feeling is we’ll learn a lot from the upcoming [Bio in Docker] Symposium, 9–10 November 2015, Wellcome Building, London. If this resonates, would love to meet there. To register:

Please recommend this post to reach more people!

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.