Using open source to visualize gene relationships

by Yo Yehudi | A spotlight on PhyloProfile, a 2018 Global Sprint project

Mozilla Open Leaders
Read, Write, Participate
4 min readMay 4, 2018

--

PhloProfile workflow from Ngoc-Vinh Tran

Ngoc-Vinh Tran, @trvinh, is a PhD student in Goethe University. Vinh was selected to join the current round of Mozilla Open Leaders with their project PhyloProfile, a tool that creates a visual way to understand relationships between genes.

I interviewed Vinh to learn more about PhyloProfile and how you can help at the Mozilla’s Global Sprint 2018.

What is PhyloProfile?

PhyloProfile is a visualization tool for analyzing complex phylogenetic profiles. In Evolutionary Biology, the presence/absence pattern of a gene over a set of species is called its “phylogenetic profile”. Phylogenetic profiles are commonly used to trace the functional and evolutionary history of genes across species and time. However, just finding a given gene in different species is not informative and the presence of genes estimated by computational methods could also be false. Therefore the genetic sequences need to be investigated for their conservation and how they have evolved.

With PhyloProfile, I am developing a visualization method that allows you to enrich regular phylogenetic profiles with further data, to make phylogenetic profiling more meaningful. The additional information integrated into phylogenetic profiles can be, for example the protein sequence similarity, protein domain architecture similarity, or semantic similarity of Gene Ontology-term descriptions. PhyloProfile enhances the analysis of phylogenetic profile with the interactive visualisation. Furthermore, the tool provides several functions to gain insights like estimation of the gene age or core gene identification.

PhyloProfile is open source and written in R, making heavy use of Shiny library.

Why did you start PhyloProfile?

A while ago during my PhD research, I had to show thousands of phylogenetic profiles of the microsporidia species — they are an interesting parasite that infect a wide range of species including human. The common method for representing presence/absence profiles is using a heatmap like this:

It is, however, very difficult to perform a detailed analysis — for example, when you are interested in some particular points in the heatmap, you have to zoom in the map, identify the coordinates to get the proteins and the corresponding species. Then you have to open several other files to find the information you need for those proteins. So, a static heatmap is not a smart way to analyze such a large number of profiles. It is not informative enough and also not useful for connecting different kind of information together in one place.

Thanks to Bastian Greshake Tzovaras, the guy who knows everything, I’ve got an idea to use Shiny — an R library — to create interactive plots. I have then created the first simple version of PhyloProfile. The interactive heatmap makes my work much easier. Just with a few clicks, I can get what we want. PhyloProfile has been found to be useful for other research groups in my lab. I got many suggestions to expand the functionality of the tool and comments to make the tool better.

What challenges have you faced working on this project?

My first challenge was that, I had no idea how cool R language is! To be honest, I started to create PhyloProfile with almost null knowledge about R.

Secondly, at the beginning I was almost the only person who developed the tool. I could not guess at all the requirements to make the tool usable for different purposes. Giving PhyloProfile to different users in different research fields is the best way to adapt the tool to their needs. This is also the main reason for me to join the Mozilla Open Lead program. I want to introduce the PhyloProfile to the community and collect feedback as well as contributions to improve not only the tool but also myself on an open working environment.

What kind of skills do I need to help you?

It would be great if you can do programming in R, so you can help us to solve some technical issues, improve the source code or implement new features to the program.

However, coding is not a must. If you are interested in using the tool for your work, it would be really nice if you can give us your feedback when you face any problems with the tool. Or your ideas to improve the tool are also very appreciated. We also need your hand to enhance our documentation (e.g. the manual).

How can others join your project at #mozsprint 2018?

You can find our project in github at https://github.com/BIONF/PhyloProfile or via the project website for #mozsprint https://www.mozillapulse.org/entry/657. On the project’s github page, you can find the instruction for contributing in the CONTRIBUTING.md, as well as the tasks for #mozsprint on the “issues” tab.

We will be in the office during two days of #mozsprint. It would be cool if we can meet and work offline together on those days ;-)

Join us wherever you are May 10–11 at Mozilla’s Global Sprint to work on many amazing open projects! Join a diverse network of scientists, educators, artists, engineers and others in person and online to hack and build projects for a health Internet. Register today

This post by Yo Yehudi is licensed under a Creative Commons Attribution 4.0 International License.

--

--

Mozilla Open Leaders
Read, Write, Participate

A cohort of Open Leaders fueling the #internethealth movement through mentorship & training on working open. Work Open, Lead Open #WOLO mzl.la/openleaders