Shiny Diversity — Visualizing Bacterial Diversity
Several weeks ago, I participated in my first hackathon. For the uninitiated, hackathons are software/hardware development events usually held over the course of a weekend. In these events, you work in teams to create a solution for a problem.
The hackathon I attended was called Hackseq and was based here in Vancouver. Hackseq is a genomics-focused hackathon whose core mandate is fostering a culture of Open Science. This focus was very evident in the projects that participants could sign up for. These projects ranged from predicting optimal guide sequences for CRISPR, all the way to generating reproducible workflows for methylation data. For any of you that zoned out, the tl;dr is that Hackseq had a lot of team projects that get bio nerds really excited :)
The project I chose to work on dealt with alpha and beta diversity in microbial populations. Before I get into what the problem was, let’s go over alpha and beta diversity.
In the map above, I’ve highlighted six areas of water around Vancouver. If we took a sample of water from just one of those squares, alpha diversity would be the different bacteria present in that sample. If we took a water sample from each of the squares, then beta diversity would be the differences in bacteria across all six samples.
As you can imagine, there are a variety of different metrics that can be used to calculate alpha and beta diversity. As such, given some microbial data, it’s difficult to determine which metric would be the “best” to use. Additionally, if there are several metrics that might work, how can you see their effect on your data? Those two questions became the main problem our group was trying to solve.
So on a very rainy and windy Friday morning, our team met for the first time and got to work brainstorming our ideas.
Our solution was to create an interactive web application where the user could select different alpha and beta diversity metrics and compare how those metrics affected their data.
The backend of our application was done using a great R package called phyloseq. phyloseq offered a variety of benefits, key among them being data import from a variety of different pipelines and preinstalled alpha and beta diversity metrics. The interactivity and UI of our application was handled by Shiny. Shiny is an R package that enables the easy creation of interactive webapps. As such, it was perfect for our use case.
With the details sorted out, the team got to work. My main focus for Friday and Saturday was the alpha diversity page. During our initial brainstorming session, we determined that the alpha diversity page should focus on three things.
- The user should be able to select from a variety of common alpha diversity metrics
- The page should enable the user to view a side-by-side plot of their original data + plots of the metrics they selected
- It should have a top-level view of their data and be able look at individual samples if needed
I started by figuring how to generate the plots we needed. The first step was to get a dataset. Luckily, phyloseq comes with several datasets, so I went with the “Global Patterns” dataset. Global Patterns features nine samples that are taken from a variety of places such as skin and freshwater. The very handy plot_richness function was used to produce a plot of this dataset.
plot_richness is great for two reasons.
- It supports five common alpha diversity indices: ACE, Shannon, Simpson, InvSimpson, and Fisher.
- plot_richness can take any or all of these metrics as inputs and generate side-by-side plots of those indices.
After all that, I took another coffee break. And then a long walk. Followed by more coffee. I think I summed up how the plots were generated pretty concisely but figuring all that out took all morning and a good chunk of the afternoon!
With the plot_diversity function answering two of the criteria for the alpha diversity page, I got to work translating the plots into a Shiny app. I have an upcoming post about how to make a basic Shiny application, so I’ll simply provide a brief overview here.
plot_richness(dataset, alpha diversity metrics)
From my example function above, we see that plot_richness takes two inputs. The first is a dataset and the second is a vector that contains all the alpha diversity metrics the user wants to see.
Shiny features several UI inputs that made it easy to show users their options for datasets and alpha diversity metrics. I used the selectInput() function to generate a dropdown list of possible datasets. The checkBoxGroupInput() function was added to display the five alpha diversity metrics and provided a method to select/unselect them.
Every time the user changes their selection, Shiny dynamically updates and saves what the user has chosen. By assigning each of the input methods an inputId, i.e. datasetSelection, those saved selections can then be referenced in other functions.
selectInput(
inputId = ns(“dataSelection”),
label = “Choose a Dataset”,
choices = c(“Global Patterns” = “GlobalPatterns”, “Esophagus” = “esophagus”, “GP3” = “GP3”),
selected = “GlobalPatterns”),checkboxGroupInput(
inputId = ns(“groupSelection”),
label = “Select Metrics”,
choiceNames = c(“ACE”, “Shannon”, “Simpson”, “InvSimpson”, “Fisher”),
choiceValues = c(“ACE”, “Shannon”, “Simpson”, “InvSimpson”, “Fisher”)
)plot_richness(get(input$dataSelection), measures = c("Observed", input$groupSelection), color = "SampleType")
Passing those inputId’s to plot_richness meant that every time the user selected a different dataset or changed the alpha diversity indices, the plots were regenerated to reflect those selections!
While I was working on the alpha diversity page, the rest of the team were hard at work on their own sections. Our team leader, Eric, was responsible for documenting the different indices featured on the alpha and beta diversity pages. Ali worked on the beta diversity page and Hakif was in charge of putting together all our code into a single cohesive application.
Most of Saturday went into optimizing the UI and working on the third focus of the alpha diversity page — providing a top level view of the data. I wasn’t quite sure how to tackle this, but being able to talk to everyone else on the team right away was fantastic. We were able to discuss the pros and cons of potential solutions and narrow it down to the best one.
Based on our team discussion, I went with a heatmap for a top-level overview of the data. Each column of the heatmap represented a single sample. The counts were displayed as colors with the higher counts more intense. I really wanted users to be able to take one look and see what exactly was going on.
Additionally, if users noticed an interesting sample on the heatmap, it was important that they were able to “zoom” into that sample for a closer look. To achieve this, I put a bar plot right beside the heatmap. The bar plot displayed the counts for a single sample and by using a slider, users could move to specific samples.
The last step was to add that code to our Shiny web application to make the slider responsive!
Sunday being the last day of the hackathon, I mainly worked with Ali to convert his beta diversity code into our Shiny framework. As a group, we all worked together to debug small issues and optimize our UI to complete our web application!
Hackseq concluded with all the participants presenting their projects. It was awesome to talk about our experiences over the last 3 days with other participants and live demo our app! You can check that out in the video below.
Hackseq was an absolute blast and in addition to learning a lot, I met some great people. Our team is still working on optimizing ShinyDiversity, but you can check out the current version by following the link below.
Or check out the paper we published about this nifty software tool!
You can also fork the code from Eric’s github.
Lastly, if you liked what you read, feel free to check out my bio and skills in the link below.
Thanks for reading!
PS. Thanks to Jonathan Lee for making the gifs and for tirelessly proofreading the article :) No demerit badges were awarded.