This open-source tool will visualize your massive biological datasets
by Yo Yehudi | A spotlight on Phinch, a 2018 Global Sprint project
Holly Bik, (@hollybik), is an Assistant Professor at the University of California. Holly was selected to join the current round of Mozilla Open Leaders with their project Phinch, which creates beautiful and insightful visualisations for biological data.
I interviewed Holly to learn more about Phinch and how you can help at the Mozilla’s Global Sprint 2018.
What is Phinch?
Phinch is an interactive, exploratory data visualization framework for massive biological datasets. Our project team represents an interdisciplinary collaboration between research scientists (e.g. myself, a Computational Biologist and Assistant Professor at UC Riverside) and Pitch Interactive (a data visualization studio based in Oakland, CA), and the project is currently funded by a 3-year grant from the Alfred P. Sloan Foundation. The Phinch framework itself is designed to promote rapid and routine visualization of complex genomic datasets — for example, millions of DNA sequences representing bacteria in the human microbiome or single-celled protists in a drop of seawater. These are datasets which scientists are now generating every day. However, to filter and analyze these datasets currently requires serious programming skills and often weeks or months of coding in order to visualize patterns from gigantic plain text files. Phinch is aimed at increasing scientific efficiency in this part of the data analysis workflow — enabling researchers to upload standard text files of biological data (HDF5 or JSON) and be able to immediately manipulate the data and metadata in a visual format. We also want end users to be able to share and export these visualized datasets, and we’re working on features that will enable download, storage, and export of publication-ready images.
The prototype Phinch framework currently exists as a web-based portal (live at http://phinch.org), but as part of the Mozilla Sprint we are working towards 2.0 software release that has been completely refactored and revamped. Phinch 2.0 will be released as an Electron desktop app, the same kind of software framework that powers Slack and Skype.
Why did you start Phinch?
Science has a bad reputation for terrible graphics (think: powerpoint slides crammed with text and grainy clip art images). Scientific software isn’t much better — programs are usually designed for power and computational efficiency, not aesthetics. Commercial software is sometimes an option for better graphics, but these programs can cost thousands of dollars. And researchers aren’t typically trained in Illustrator or Photoshop — even if we were, scientists shouldn’t have to manually edit every single image in one of these programs. Of course, programming languages like R and Python are powerful for visualizing data and generating graphics, but you have to invest quite a lot of time to learn these languages and then tweak the code to get graphical elements just right. As a researcher myself, I am well aware of the frustrations that end users face in visualizing their data. Even basic visualizations can be very labor-intensive.
We started Phinch because the underlying technology exists for making rapid, research-driven data visualization tools — but frameworks such as D3.js have been slow to make their way over to the scientific community. Scientists should be visualize their data just like the awesome interactive features on the New York Times website!
Do you have a favorite visualization in Phinch?
Bubble charts! This is probably one of the bog standard features in the data visualization community, but I love the soothing undulations of the bubbles in D3.js. I’m a marine biologist and so this visualization also reminds me of the ocean (and provides a small consolation when I’m stuck indoors analyzing DNA data for months at a time!).
What challenges have you faced working on this project?
The most challenging aspect of the project is also the most fun: serving as a “translator” between scientists and the data visualization team at Pitch Interactive. The Phich framework is being built by software engineers who are experts at telling storing with big datasets — which means making sure that everyone understands what the underlying data is, and what parts of it are most useful for visualizing. This usually involves explaining a lot of scientific jargon terms and trying to summarize technical visualization requests from scientists in more general language. This “translation” goes both ways though — the team at Pitch Interactive has been very patient explaining the technical side of the software engineering to me, especially our recent transition over to the Electron app framework. The technological landscape changes so quickly, it is often hard for academic researchers to keep track of what is considered “cutting edge” in the data visualization community.
What kind of skills do I need to help you?
You can help with you ideas, datasets, or coding skills! Most generally, we’re looking for feedback from researchers and end users who work with large biological -Omic datasets — what type of visuals do you want to quickly generate? What frustrates you most about existing tools? What is difficult to do in R/Python that could be made easier with the Phinch framework? Do you have a dataset that you would like to “donate” for testing out Phinch 2.0 features?
If you have specific skills in Electron app frameworks or the QIIME software package for biological data (we’re hoping to develop Phinch 2.0 as a plugin tool for QIIME2), we would especially love to hear from you.
How can others join your project at #mozsprint 2018?
Keep an eye out on our GitHub account to join in — We will have a variety of GitHub issues for all different types of contributors (and we’ll be adding lots of new issues, bug reports, and discussion topics right up until the Global Sprint): https://github.com/PhinchApp/Phinch/issues
What meme or gif best represents your project?
Join us wherever you are May 10–11 at Mozilla’s Global Sprint to work on many amazing open projects! Join a diverse network of scientists, educators, artists, engineers and others in person and online to hack and build projects for a health Internet. Register today
This post by Yo Yehudi is licensed under a Creative Commons Attribution 4.0 International License.