MetaOmGraph: An interactive data analysis and visualization tool for large expression datasets

Kumara Sri Harsha Vajjhala
5 min readJul 30, 2020

--

Nemulus S-N evaluating the expression of the MutT gene in human cancer data. Alleles of this gene are likely a risk factor for cancer (https://www.mdpi.com/2072-6694/11/9/1354/htm)

MetaOmGraph Philosophy

In recent years, omics data available in the public domain has ballooned to terabytes. Most existing tools either require programming or do not handle big datasets, preventing researchers from valuable insights. Moreover, domain experts trained in the wet lab and clinical techniques such as genomic researchers, cell biologists, virologists, and medical professionals often have limited programming skills and they frequently face challenges while analyzing big complex datasets. These experts should be allowed to effortlessly look at the data from all different perspectives and mine biologically, and clinically relevant knowledge out of it.

MetaOmGraph(MOG)was developed to address this challenge.MOG is a free, and open-source workbench for interactive, exploratory data analysis. Unlike other existing tools, MOG allows interactive exploratory analysis of large datasets without requiring any programming skills. It is designed with special attention to omics expression data, but actually can be used for any numerical data.

MOG provides a number of pre-compiled, well-vetted projects made with RNA-Seq expression data of Humans, Arabidopsis, Maize, and Yeast, which are readily available for download and analysis.

What features does it offer?

1. Fully interactive analysis and visualizations: MOG provides a simple and intuitive GUI to many types of statistical analysis and visualizations. Interaction with data visualizations is critical for large datasets. By adding interactivity for visualizations and statistical analyses, researchers with little or no programming experience are able to directly explore the underlying, often complex and multidimensional, data themselves.

Researchers can visualize data via line charts, histograms, box plots, volcano plots, scatter plots, and bar charts. Each plot is programmed to allow real-time interaction with the data and the metadata.

A few examples of interactive visualizations of >7,000 RNA-Seq samples from non-diseased and tumor tissues:

Interactions with a line chart. The expression of two human genes (CUTA and SEPT4) is visualized across a 7142 sample consisting of multiple non-tumor and tumor tissues.
Box Plots
Examining coexpression with a scatter plot

2. Fast and Memory efficient: MOG is extremely memory-efficient and works smoothly even with large datasets. It achieves computational efficiency via two complementary approaches. Firstly, it indexes the data file, rather than loading the whole data in the main memory. Secondly, it speeds up the computations using multithreading, optimizing the use of multi-core processors. Users can just use their regular laptop/desktop even for the analysis of several gigabytes of data.

3. Easy to install and get started: Setting up MOG is as simple as just downloading the executable and double-clicking it. You can download it from http://metnetweb.gdcb.iastate.edu/MetNet_MetaOmGraph.htm

4. Works with all major operating systems: MOG supports Mac, Windows, and Linux operating systems. Be assured that your MOG projects can be opened irrespective of the operating system used.

5. Reproducibility Logging and Playback: MOG meticulously logs all the user actions and ensures that all the actions a user performs are easily reproducible with just a single click.MOG allows replaying (re-executing) the historical actions of a project with all the parameters that were used. The log files and MOG project files can be easily shared or included with publications.

MOG’s interface to examine the user’s action history and reproduce previous results and visualizations.
Playback (reproducing results) is as simple as selecting the action and clicking the play button

6. Datatype Agnostic: Although MOG was developed for analyzing omics data, it can be used for analyzing any kind of data. A user can supplement a MOG project with any type of metadata about the features, and about the studies. Thus, a MOG user can interactively analyze and visualize voluminous data on any topic.

7. Interface to R: R is a popular tool for data analysis and provides a great number of packages to perform a variety of data analyses. MOG provides a simple interface to facilitate the execution of any R code. MOG’s GUI enables a user to interactively select or filter data using MOG; these data are then passed to R. This avoids the need to constantly write new R code to specify different genes and samples for analyses.

8. One-Click bug reporting and feedback: If you encounter any bugs in MOG, or have a feature request, or feedback, you can report it to the MOG development team with just a click, saving you the hassle of emailing different people.

9. Offline usage: The tool can be used offline. Researchers do not need to rely on internet accessibility for computations on MOG and are not slowed down by the data transfer latency. Furthermore, the data in a project is secure, remaining on the researcher’s computer, which is particularly important for confidential data.

What statistical analyses can MOG perform?

  1. Correlation, mutual information, and relatedness: MOG can compute pairwise Pearson and Spearman correlation for pairs of selected features across all samples or conversely, between selected samples across all features. MOG can also compute pairwise mutual information (MI) between selected features across samples. MI quantifies the amount of information shared between two random variables.
  2. Meta-analysis of correlation coefficients: MOG can perform a meta-analysis of Pearson correlations. Studies using microarray data showed that meta-analysis and analysis of pooled normalized samples each bring out meaningful, but different, relationships among genes [2].
  3. Differential expression between groups: Determining differentially expressed features from aggregated datasets provides direction for further data exploration. In MOG, we have incorporated several popular statistical methods to evaluate differential expression between two groups of samples. The user can choose the features and the samples to compare from among the data.
  4. Differential correlation between groups: Features whose correlation with other features is significantly different only under particular environmental, genetic, or developmental conditions are designated as differentially correlated. MOG can find the features whose correlation to a user-selected feature differs significantly between two groups of samples.

To read the full description of how MOG performs these statistical analyses, along with case studies, you can check out the published paper and user manual.

How to get started with MOG?

MOG executable is freely available at http://metnetweb.gdcb.iastate.edu/MetNet_MetaOmGraph.htm. A user manual is available at https://github.com/urmi-21/MetaOmGraph/tree/master/manual. Use one of our pre-compiled projects and start exploring data.

How can you contribute to MOG?

We value any kind of contribution to MOG development, be it code, additional documentation, bug reporting, or just giving a star to our Github repository https://github.com/urmi-21/MetaOmGraph

Acknowledgment:

Funded in part by National Science Foundation awards IOS 1546858, DBI-0520267, and MCB-0951170 and the Center for Metabolic Biology at Iowa State University.

References:

[1]Urminder Singh, Manhoi Hur, Karin Dorman, Eve Syrkin Wurtele, MetaOmGraph: a workbench for interactive exploratory data analysis of large expression datasets, Nucleic Acids Research, Volume 48, Issue 4, 28 February 2020, Page e23, https://doi.org/10.1093/nar/gkz1209

[2] Almeida-de Macedo,M.M., Ransom,N., Feng,Y., Hurst,J. and Wurtele,E.S. (2013) Comprehensive analysis of correlation coefficients estimated from pooling heterogeneous microarray data. BMC Bioinformatics, 14, 214.

***This article was written with the help of Dr. Eve Syrkin Wurtele (Eve Wurtele)and Urminder Singh***

--

--

Kumara Sri Harsha Vajjhala

MS Computer Science student at Iowa State University interested in the fascinating areas of ML, Data Science, Algorithms. Github :https://github.com/vksriharsha