A Run on the Bank

William L. Weaver
TL;DR Innovation
Published in
4 min readApr 17, 2018

Crowdsourcing Open Source Software for Biobanking

In the fall of 2008, software developers Jeff Atwood and Joel Spolsky launched an online forum encouraging programmers to post questions that were answered by fellow visitors to the site. Called Stack Overflow, it was not the first programmer’s forum to elicit expert advice from visitors, however, it was one of the first to be totally free to its users. Stack Overflow was soon followed by sister sites Server Fault, for use by system administrators, and Super User, for use by expert “power” users. Each described as hybrid wiki, blog, forum, and social bookmarking site, to date [2011] these websites have a combined membership of 700,000+ users that have posted over 2.5 million questions. These sites are excellent examples of the term introduced by journalist Jeff Howe in a June 2006 Wired magazine article titled “The Rise of Crowdsourcing”.

Photo by Benny Jackson on Unsplash

At the time it was coined, crowdsourcing was specifically used to describe the actions of a client as it outsourced a task to the general public. The century-old development term “open source” refers to the free availability of an end product’s source materials. Recently, crowdsourcing and open source have been found to be of great utility in the development of bioinformatics — a large, complex area of research and development that relies heavily on the collaborative efforts of many international teams of interdisciplinary scientists and engineers.

Physical biobanks continue to amass a growing archive of cryogenic biological tissue along with associated handling and temperature histories, location, and ownership records. The management of such a database of biobank records is an ongoing challenge; however, an even greater challenge lies in the development of bioinformatics software used for the analysis of biobank samples during the study of disease and treatments. Foremost of value to this analysis is the continued development of the open access database known as Genbank, a collection of nearly 200 million nucleotide sequences and their protein translations that is maintained by the National Center for Biotechnology Information (NCBI) of the United States National Institutes of Health (NIH). Genbank also provides access to a growing toolkit of algorithms used to analyze and understand the sequence information it contains.

Such a large amount of freely-available information is best served by an equally large, open, collaborative effort into the research and development of bioinformatics software. December 2010 witnessed the arrival of a new peer-reviewed journal titled Open Research Computation, one of more than 220 peer-reviewed “open access” journals produced by BioMed Central, a property of Springer Science+Business Media. Like most peer-reviewed journals, Open Research Computation levies a publication charge from the authors to cover the costs of facilitating the peer review and online publication; however, the resulting articles require no subscription and are freely available online. Open Research Computation requires that all software source code used or described in the article be made available in a public repository “under an Open Source Initiative compliant license” and also “welcomes submissions that review or describe developments relating to software based tools for research.” Open access and open source journals such as Open Research Computation facilitate the rapid development of much-needed bioinformatics tools, but their effectiveness can be enhanced though the use of crowdsourcing.

In February 2010, Jeff Atwood and Joel Spolsky decided to make the website engine underlying Stack Overflow and its sister sites available to users wishing to create collaborative crowd-sourced question and answer forums related to their own topics. After securing the necessary venture capital in April 2010 the product, called Stack Exchange 2.0, was launched under a Creative Commons licence and is free of charge to its users. Among the 25+ sites in the Stack Exchange Network is “BioStar”, a Stack Exchange site on the topic of bioinformatics, computational genomics, and biological data analysis. Currently enjoying a computationally-poetic 2048 members, over 4000 questions have been answered by this bioinformatics community.

Combining the open source character of Open Research Computation and the crowdsourcing nature of BioStar, Dr. Michael D. Barton, while a post-doc at Northern Kentucky University, submitted a question to BioStar asking its members to suggest improvements to software he had developed to join neighboring sequence regions together and fill gaps found in short-read sequencing data. A copy of the paper submitted to Open Research Computation was available for review on the Nature Precedings website while the source code was available on GitHub, a web-based hosting service for software development projects that use the Git revision control system.

In addition to developing much needed bioinformatics software to assist in the sequence scaffolding processes, Dr. Barton has provided an excellent example of good collaborative practice. One that provides a copy of software for review, solicits comments and suggestions for the software’s improvement in an open forum, and posts a draft of the publication with plans to ultimately publish the entire project in a peer-reviewed open source journal. As research and development groups adopt and evolve this practice of open collaboration, not only could we witness rapid advances in bioinformatics and genetic research, but this practice could serve as a model for all areas of scientific computing.

________

This material originally appeared as a Contributed Editorial in Scientific Computing, February 2012, pg. 5.

William L. Weaver is an Associate Professor in the Department of Integrated Science, Business, and Technology at La Salle University in Philadelphia, PA USA. He holds a B.S. Degree with Double Majors in Chemistry and Physics and earned his Ph.D. in Analytical Chemistry with expertise in Ultrafast LASER Spectroscopy. He teaches, writes, and speaks on the application of Systems Thinking to the development of New Products and Innovation.

--

--

William L. Weaver
TL;DR Innovation

Explorer. Scouting the Adjacent Possible. Associate Professor of Integrated Science, Business, and Technology La Salle University, Philadelphia, PA, USA