TBP #8: Automatic Annotation of sRNAs with ML

stay trying.
The Bioinformatics Press
2 min readOct 15, 2019
Photo by JR Korpa on Unsplash

Metadata annotation for genetic information is an important part of understanding the population specifics of a set of samples. Whether the data are blood samples or genomic reads, the more information we know about the subject, the more statistical analyses we can draw as researchers.

Sometimes — the lack of metadata can be a major hurdle. And finding it could be even harder.

In this paper, the authors compare a deep learning algorithm and a random forest model to be able to annotate small RNA (sRNA) expression profiles for different tasks. These multi-class classification tasks were tissue, age and sex annotations.

They found that the deep learning algorithm outperformed the machine learning-based algorithm for all three tasks, where the highest accuracy was 96.5% for the tissue-based annotation.

The authors went on to use DeepLIFT peer into the deep learning “black box”. It uses the activations of each neuron starting from the output and calculates backward until it hits the input layer. This way, the user can better understand which features, in this case, which pieces of the expression levels were important for determining the annotation.

They were able to display which sRNAs “voted” for each class. For example, they found that hsa-miR-142–5p was a strong component for the “blood” category for the tissue classification.

These types of methods should, in my opinion, be a part of most deep learning analyses after the fact. It is like looking at the slopes of a linear regression model to better understand relationships between the inputs and outputs.

Future work includes using hierarchical classification to get a grasp of the different tissue classes. This will help them approach an “ontological” hierarchy that may elucidate some structure behind the sRNAs

If you’d like me to write about a certain article, please email thebioinformaticspress@gmail.com!

Thanks for reading.

--

--

stay trying.
The Bioinformatics Press

My life and brain in word-form ~||~ Views expressed are my own