TBP #11: Deep Learning in Genetics Review

stay trying.
The Bioinformatics Press
2 min readNov 6, 2019
Photo by Jacky Lo on Unsplash

Machine and deep learning have opened the door to better performance on a wide breadth of genomics related tasks. This review from Nature sums up many of the interesting applications such as:

  • Intron-splicing prediction
  • Biomarker discovery for patient stratification
  • Gene function prediction
  • Genomic region importance & classification

Tabular data — or the classical spreadsheet-like way to form data into rows and columns — is very amenable to machine learning models. In fact, logistic regression models can perform as well or better than some of the neural networks out there.

However, data that has spatial or longitudinal dependencies have an internal structure to them that may require different types of architectures. Merely creating rows and columns out of these data could disrupt this embedding information.

One application of convolutional neural networks (CNNs) the authors highlighted below could be used to scan a sequence for transcription factors. A series of convolutional, activation, and max-pooling layers (very common in the computer vision world) can be applied to genetic data. Here, the network is able to detect a motif that one is interested in.

Figure 2 from Paper

CNNs have now become commonplace to predict various phenotypes on genetic sequences alone. Their application is only limited by the data type and one’s imagination.

The paper goes on further to discuss RNNs, graph convolutional networks, and transfer learning approaches. In this article, I just wanted to highlight that there are many researchers pushing the boundaries of what these new waves of models can predict.

As always, it is an interesting time to be in the field of artificial intelligence, bioinformatics and the like.

Thanks for reading.

--

--

stay trying.
The Bioinformatics Press

My life and brain in word-form ~||~ Views expressed are my own