TBP #10: Gene Expression Data Used Graphically

stay trying.
The Bioinformatics Press
2 min readOct 29, 2019
Photo by Ellen Qin on Unsplash

Biological systems are extremely complex, and researchers and scientists are barely scratching the surface when it comes to developing methods that properly represent these systems.

There has been an interest in graph interaction graphs that visually and computationally allow us to understand relationships between nodes. Graphs can also alleviate some of the curse of dimensionality by abstracting away some of the features and turning these into distances between nodes.

In this paper, the team looked at gene-based interactions that can represent interesting dependencies between one another using gene expression data. They posit that these graphs can be used in conjunction with machine learning models to extract feature importance — a sort of pre-processing step that should determine a good bias for features selected.

A good bias (as depicted below from their paper) would be when a generated gene interaction graph covers the true causal graph of gene interactions. However, the generated graph could also contain spurious connections that are not contained in the true causal graph.

Figure 1 from paper

Since one cannot test this property, they modeled the expressions of genes given all the other genes using a neural network — which can be called a single gene inference task. They are, in essence, assessing this graph theory with deep learning to assess whether it holds.

They test graphs generated from a variety of datasets and compare them against randomly created graphs that were simulated. These random graphs chose a differing number of random genes to be included in the deep learning model.

They found that randomly selecting 500+ genes as neighbors for their target gene can perform as well as a graph that was generated with its’ neighbors as input.

This could imply that genes, no matter how far or close to one another, can impact each other’s expression. An extremely dynamic system that is hard to pin down.

Future work includes figuring out whether hand-crafted graphs are valuable when interested in specific subgroups or domains of genes, but the jury is still out.

Thanks for reading!

--

--

stay trying.
The Bioinformatics Press

My life and brain in word-form ~||~ Views expressed are my own