BOSC 2017, Day 1 #BOSC2017 #ISMBECCB

yoyehudi
yoyehudi
Jul 23, 2017 · 6 min read

Open Source, Common Workflow Language, Containers, & War of the Package Managers

BOSC — the Bioinformatics Open Source Conference — kicked off this morning, as one of the COSIs (Communities of special interest) associated with ISMB/ECCB. Overall, popular themes included open source (unsurprisingly!), reproducibility, containerisation, package management, and a love/hate relationship with Docker.

Morning Sessions

Common Workflow Language

Global Alliance for Genomics and Health

Morning Lightning Talks

Single-cell epigenomic reproducible workflows are also apparently an area that doesn’t have a lot of work in it, but Kieran O’Neill introduced Epigenomics-SCREW as a possible solution.

Next up came a talk on BioThings Explorer, which integrates genomic data on the fly using APIs and json-ld. Integrating disparate data via its identifiers, which may be labelled differently between datasources, is a tricky problem, so this short talk seemed very impressive.

Anil Thanki introduced discovery of homology and gene families using Galaxy tools such as GeneSeqToFamily and Aequatus. The last talk of the morning was on YAMP, by Alessia Visconti. “Yet another Metagenomic pipeline” addresses usability & difficult setup issues in other tools, using NextFlow and Docker to get there.

Lunch and BoFs

Afternoon Sessions

Developer tools and libraries for open science and reproducibility

More lightning talks

Screenshot of the NGL codepen gallery

Alexander S Rose introduced us to NGL, a beautiful web-based molecular visualiser. The gallery (pictured above) was stored as a set of CodePen instances. CodePen is an online javascript source code editor, which allows you to interact with the visualisation or tweak the code and see results in real time — a really nice way to present such gorgeous content!

Aditya Bharadwaj talked about a network visualiser called GraphSpace, Timothy Booth discussed detecting well-hopping duplicate reads, and Monther Alhamdoosh discussed ways to detect relevant genes when you have really big datasets to explore, using the EGSEA R package.

Brad Chapman eloquently introduced bcbio, designed to take analyses to the data files, rather than copying data around.

Before coffee break, Kees van Bochove described The Hyve’s worldwide open source work. The Hyve is also a BOSC sponsor.

Data Science and Visualisation: The package manager wars

Björn Grüning introduced us to the package manager Conda, and its biologically-oriented sibling, BioConda. Conda describes itself as “Package, dependency and environment management for any language: Python, R, Ruby, Lua, Scala, Java, Javascript, C/ C++, FORTRAN”. Bioconda has only been around for about a year, but has over 2400 packages available and an active contributor community. Bioconda packages automatically generate Docker instances when they’re updated, saving effort maintaining separate dockerfiles. Browse the registry to get an idea of the packages it offers.

Guix, which I learned was pronounced “geeks”, was introduced by Ricardo Wurmus as a competing package manager solution to Bioconda, that doesn’t require Docker (but can use it if you really wish).

Kei Ono discussed Cytoscape, which grew from an early single Java visualisation package to become an entire ecosystem of related tools, including NDex, which he described as a “GitHub for biological networks”.

Olga Vrousgou introduced the SPOT ontology toolkit, providing all your biological ontology lookup and data mapping needs. When Olga asked who knew about and used ontologies, hands went up across the room — it was definitely a popular subject!

Last talk before the keynote was a BioPython update. As shown above, they have a great new logo, created to loosely mimic the Python software foundation colours with the foundation’s permission. Project updates include the plan to move from their old custom licence to the 3-clause BSD licence — a long-winded task as all past contributors must be contacted to confirm their acceptance — as well as the fact that they’ll eventually be discontinuing support for Python 2.7 by 2020.

Open Source Yourself: the keynote we were all waiting for

Nonetheless, a surprising number of people do share anyway. Open Humans makes sure that people who share their personal health data know what they’re getting into when they share it, making people complete a quiz that proves they understand the potential risks.

She also shared the impressive story of Dana Lewis, who hacked her own artificial pancreas system in order to create a more effective glucose monitor alarm — one she couldn’t accidentally sleep through when the built-in one was too quiet! She went on to build a community around this as well: OpenAPS.

If OpenHumans sounds exciting to you, consider applying for their small grant project for ideas up to $5000, or apply for their software developer vacancy.

That’s it for tonight folks! More after tomorrow’s talks.

Thirsty for more? Check out BOSC Day 2

Disclaimer: Any views expressed are my own, not necessarily those of PLOS.

PLOS Comp Biol Field Reports Blog

A collaborative blog with posts from researchers attending Computational Biology events all over the world. Interested in blogging for us? Email: ploscompbiol@plos.org

yoyehudi

Written by

yoyehudi

Software Engineer at InterMine, in the Dept of Genetics @Cambridge_Uni. Fond of UIs, open source, veggies, running & sci-fi. http://yo-yehudi.com

PLOS Comp Biol Field Reports Blog

A collaborative blog with posts from researchers attending Computational Biology events all over the world. Interested in blogging for us? Email: ploscompbiol@plos.org