Kipoi: utilizing machine learning models for genomics

Number of publications per year containing keywords ‘deep learning’ and ‘genomics’. Credit Gokcen Eraslan. Source app.dimensions.ai.

Making model predictions also involves data-loading and pre-processing

  1. obtain model parameters
  2. install all the required software packages
  3. extract and pre-process the relevant information from the raw files
  4. run model predictions.

Main ingredients of Kipoi

1. Standardization of trained models

  • dataloader.yaml description (example)
  • dataloader.py implementation (example)
  • dataloader_files/(optional) directory with further required files
  • model.yaml description (example)
  • model.py (optional) class implementing predict_on_batch(x)
  • model_files/ directory with required files like model parameters
  • example_files/ directory with small test files

3. API for accessing and using the models

Python API. All components (model, dataloader, dependencies) can be accessed and used separately.
Command-line API. Each model makes predictions from standard file formats. BED (.bed) and FASTA are two such standard formats in bioinformatics for specifying the queried intervals (.bed) in the genome sequence(.fa). Output can be written sequentially to a compressed binary format (HDF5, .h5 extension) or as plain text (tab-separated values, .tsv).

4. Plugins

To asses the impact of genetic mutations on molecular phenotypes, two model predictions can be compared: one where the sequence doesn’t contain any mutations (reference) and one where the sequence contains the mutation of interest (alternative). Kipoi-veff package reads the mutations from the VCF file, makes models predictions and writes the obtained differences back to the VCF file as additional information.
Importance scores visualization. (notebook) Model highlights parts of the input that were most important for making the prediction. For DNA sequence-based models, most important regions are typically those bound by by the proteins.

How to get started?

Concluding remarks

--

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Creating a TensorFlow Lite Object Detection Model using Google Cloud AutoML

Recurrent Neural Networks to Forecast US Coronavirus Case Distribution

Introducing Vectorflow

Linear Regression

5 Take-aways from RRG 2019: The 15th International Conference on Role and Reference Grammar

Top Five Ways That Machine Learning is Being Used for Image Processing and Computer Vision

N-BEATS: NEURAL BASIS EXPANSION ANALYSIS FOR INTERPRETABLE TIME SERIES FORECASTING

Perceptions In Neural Networks, What Is It?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Žiga Avsec

Žiga Avsec

More from Medium

The concepts of Deep Learning, Neural Networks, Linear regression

What do the Machine Learning Training and Validation graphs tell us?

First Take: Self-Supervised Learning

Pharmaceutical Sales prediction Using LSTM Recurrent Neural Network