Unlearn.AI releases genemunge!

Graham Siegel
Unlearn.AI
Published in
1 min readApr 12, 2018

Yesterday we released genemunge — a python toolkit that lets you answer “Who is this gene and what does it do?” Now your machine learning algorithms in python can make sense of gene expression data. No R necessary!

Here’s the abstract from out our white paper on the bioRxiv:

Transcriptional regulation is extremely complicated. Unfortunately, so is working with transcriptional data. Genes can be referred to using a multitude of different identifiers and are assigned to an ever increasing number of categories. Gene expression data may be available in a variety of units (e.g, counts, RPKMs, TPMs). Batch effects dominate signal, but metadata may not be available. Most of the tools are written in R. Here, we introduce a library, genemunge, that makes it easier to work with transcriptional data in python. This includes translating between various types of gene names, accessing Gene Ontology (GO) information, obtaining expression levels of genes in healthy tissue, correcting for batch effects, and using prior knowledge to select sets of genes for further analysis.

Code for genemunge is freely available on Github.

We released genemunge under the MIT license to engage with the computational biology community and to spur further development of computational biology tools in python. If you find the library useful, drop us a line. We would love to hear how you integrated genemunge into your project, and we welcome feedback on how the library could be improved.

--

--