Progress Report: First half of My Outreachy Internship

Sakshi Srivastava
InterMine
Published in
5 min readJul 12, 2020

Time flies! I’ve successfully completed 7 weeks of my Outreachy Internship with InterMine which is also a perfect time to look back and reflect on my progress so far. If you want to know what Outreachy is and how to get involved head over to this blog to know my experience and flow of the application process.

The goal of my project is to develop BlueGenes compatible visualisation tools which can help biologists to get better insights of data particularly that of genes and proteins. Let see what I’ve developed so far:

GO Concept Relation

This visualises the association between the genes to their corresponding GO term. I contributed to this existing developed project to enhance the node positioning, added node colour, tooltip and filter data features based on selected ontology term in the contribution phase which made me familiar with the fundamentals of cytoscapeJS. CytoscapeJS is a graph theory library for visualising molecular graph interactions.

The main purpose of having this visualisation was to make it compatible for both single and multiple entities. But later, I worked on feedback to enhance the UI and make it much easier to get insights from a single sight.

  • Changed shape of the parent node to rounded square and child node to ellipse to distinguish gene and its ontology term easily
  • Created pie node in case of multiple parents
  • Created a toggle button to show only the shared GO term nodes
  • Wrote unit tests

Pull Request:-

  1. https://github.com/intermine/bluegenes-go-concept-relation/pull/21
  2. https://github.com/intermine/bluegenes-go-concept-relation/pull/16

Gene Pathway Visualiser

This visualisation shows the association between the genes to their corresponding pathways. Pathways play important roles in the development of complex diseases, such as cancers, which are multifactorial complex diseases that are usually caused by multiple disorders, gene mutations or pathways. It has become one of the most important issues to analyse pathways combining multiple types of high-throughput data, such as genomics and proteomics, to understand the mechanisms of complex diseases.

Here, the relationship is currently inferred by linking pathway annotations that have genes in common. I’ve added filters based on the distinct pathway name, allowing users to view only the selected pathway from the list and can also toggle to view only the shared pathway nodes.

PR: https://github.com/intermine/bluegenes-pathway-visualizer/pull/1

Pathway Visualisation

Gene Interaction Network

This visualisation allows us to view intermine gene network interactions. Interactions between genes are fundamentally important to understand the structure and function of genetic pathways of complex genetic systems.

Migrated the whole existing code to reactJS with a side panel where we can view the additional information about a particular node by clicking it and can also filter out the genes based on the interaction type. Followed the same UI enhancement as mentioned in the GO concept relation description above.

PR: https://github.com/intermine/bluegenes-interaction-network/pull/1

Gene Interaction Network

Tissue Expression Graphs

There were three expression graphs that needed to be developed. For each of them, we will show heatmaps in which the data is displayed in a grid where each row represents a gene and each column represents a tissue. The colour and intensity of the boxes are used to represent the expression level of the tissue. Also, provided filters to view the heatmap according to the selected tissues from the list and based on the expression level scale being low, medium, high or not detected. Used nivo/heatmap — a high level react-D3.js charting library.

Tissue expression — illumina body map:

This visualisation shows the tissue expression according to the illumina body map data for a gene or specified list of genes. This can be useful for identifying genes that are commonly regulated or biological signatures associated with a particular condition.

Tissue expression — GTex data:

Geno-Type expression data helps in studying tissue specific gene expressions. This visualisation shows the tissue expression level according to the GTex RNA-seq data for a gene or specified list of genes. This helps to study the relationship between genetic variants (inherited changes in DNA sequence) and gene expression (how genes are turned on and off) in multiple human tissues and across individuals.

Tissue expression — RNA sequence data:

This visualisation shows the tissue expression levels according to the Protein Atlas RNA-seq data for a gene or specified list of genes.

Note: The above expression graphs are merged under the same tool as only tissue names differed in both the dataset which provides a nicer way to compare the heatmaps by simply toggling the dataset.

PR: https://github.com/intermine/bluegenes-human-tissue-expression-visualizer/pull/2

Protein Tissue Localisation:

This visualisation returns tissue in which the corresponding protein has been identified for a specified human gene. This helps to detect the distribution and localization of specific proteins within individual cells or tissues using immunostaining. Here, we are having multiple heatmaps having tissue as category and tissue cell type as sub-category.

Protein Localisation

InterMine Visualisation Gallery

Currently, we have around 12–15 visualisations, and numbers will increase. If you want to view any of the visualisation, you need to clone and then run in your local machine. Viewing multiple visualisations could be a tedious task with such process. So, I suggested an additional project where we can showcase all our bluegenes visualisation tools for demo purpose instantly in one place without actually needing anybody to run the tools locally. I used storybook.js that enables structured UI development, testing, and documentation for every major view layer. Below is the link of the hosted visualisation gallery which uses the npm modules of all the published bluegenes tools. The updation is going on and the link will have more tools in future.

Demo: http://viz-gallery.surge.sh

The coding phase took an agile approach where each visualisation went through multiple iterations and testing being carried out simultaneously coming up with better code and clear visualisations. Requirements and solutions were continually evolving based on priority and discipline. I’ve had weekly zoom meetings with my mentors where we discuss the progress done, changes required, re-prioritize the tasks based on feedback. Looking back at half time, I am amazed at how much I’ve had the privilege to learn. I am thankful to my mentors for patience and for helping me out with good suggestions.

Follow Ups:

I have implemented few visualisations and opened merge requests. Now, I will work on the improvements/suggestions to get those PRs merged and published. Further, I will be developing more visualisations. Thanks for reading.

--

--

Sakshi Srivastava
InterMine

Outreachy’20 InterMine Intern | Full Stack Javascript Developer | Problem Solver