BOSC 2017, Day 2, part 1 #BOSC2017 #ISMBECCB

For a BOSC Intro, see my Day 1 post

BOSC talks today were more varied, with themes running from citizen science to Open Data, a great panel towards the end of the day, and an entertaining and inspiring keynote from Nick Loman, that was nearly entitled “Talky McTalkface” when he failed to turn in an appropriate title to the conference organisers.

Beer photo by Landfeldt. Creative Commons CC BY-ND 2.0 via https://www.flickr.com/photos/landfeldt/2400107273

Community building and citizen science sessions

Beer Decoded

Day 2 of BOSC opened with a subject close to many a person’s heart: beer. Jonathan Sobel introduced us to Beer Decoded, a citizen science beer metagenome sequencing project. He discussed some of the barriers to citizen science, particularly funding — it’s harder to get grants for scientific projects that have no academic body backing them. Eventually they turned to kickstarter, allowing beer enthusiasts worldwide to get their inner geek on and fund delicious science. In total, 39 beer samples were analysed (via science and probably a tasting or two, too) and open sourced on GitHub.

Community Curation & Apollo Project

Jokingly lamenting how hard it is to follow on from a talk about beer, Monica Munoz-Torres introduced Apollo as a curation tool, describing it as a “social network for curators” where the social objects were genomes and their annotations. Following a web-based re-release and community engagement plan, Apollo saw its use soar significantly over the last few years, now boasting around 150 servers and 1,200 unique users. The large open source community helped by feeding back new developments into the project, helping Apollo grow even further.

JOSS — the Journal for Open Source Software

Getting credit for your academic software without having to write papers that may be ill-suited to your needs: every scientific software engineer’s dream. JOSS aims to enable this with a completely GitHub-based submission system, the only real requirements being little more than a paragraph or two of description alongside good quality open source code. It’s also free!

Reviewers, submitters, and editors all manage new submissions on GitHub, keeping the process transparent and speedy. Naturally, for a geeky project, the JOSS GitHub bot is named, pleasingly, “Whedon”.

Galaxy Training

Bérénice Batut provided an overview of Galaxy’s training network. Built in markdown, centralised neatly on GitHub, with metadata, Zenodo DOIs, and crediting the creators of the content. The training network addresses the strong need for formal computing training amongst biologists — too many are forced to pick up skills on the job, either self-taught or with the help of a colleague. Training packages are also automatically exported to Elixir’s TESS training portal!

Lightning talks

Next came two short talks, with Dexter Pratt discussing NDex: infrastructure for biological-network community sourced content — or in plainer English, it allows communities to be created around tools for biological networks of any kind.

Bioinformatics training in Africa

Nicola Mulder shared the inspiring story of h3abionet, bringing bioinformatics training to places across Africa, including rural areas with poor or no internet connection. Lectures were pre-recorded and also delivered online live, with the recordings available offline in case of connectivity problems.

Galaxy updates

Martin Cech brought us up-to-date with new Galaxy developments, including a nice Jupyter notebook integration (paper) and a gradual transition to using Conda for package management.

Continuous Analysis

Brett Beaulieu-Jones discussed the importance of reproducibility, citing the example of Custom CDF, where the version used will affect results significantly, yet the version number is rarely specified in papers! Continuous analysis is a solution for automated computational reproducibility.

GATK4

GATK4, introduced by Kate Voss, was an exciting moment for open source, as previous versions of GATK (Genome Analysis Tool Kit, release by the Broad Institute) had been released closed source. GATK4 is not only open source, but also runs up to 5 times faster than its predecessor, GATK3! There were some interesting pronunciation discussions, too…

More on the lightning talks, open data panel, and keynote in part 2!

Disclaimer: Any views expressed are my own, not necessarily those of PLOS.