We are very excited to announce the major update to Kipoi and its model specification format. Here are the updates:
Hosting models on zenodo/figshare instead of Git-LFS
For better scalability, easier setup and easier contribution, we decided to abandon Git-LFS and use external services like zenodo or figshare to host model parameters and example files. These allow you to store 50GB of data per project for free and give you a citable digital object identifier (DOI). To contribute a model to Kipoi’s model repository, you now have to upload the model parameters to one of these services and provide a download link in the model.yaml:
You can obtain the md5 hash of the file either on zenodo’s website or run
md5sum <file> on linux and
md5 <file> on osx.
Kipoiseq — standard dataloaders for sequence-based models
We now provide a fast implementation of common dataloaders in kipoiseq. If your model takes as input DNA sequence (either one-hot-encoded numpy array or a string), you can simply use the kipoiseq dataloader in model.yaml:
Even if your model has a different ordering of the letters (say ATCG) or requires a different order of the axis than (batch, sequence position, letter), you can use
default_args to specify these.
The package structure was inspired by torchvision and provides three kinds of objects:
- dataloaders — Final object used to train models and make predictions. Example: SeqIntervalDl, MMSpliceDl.
- transforms — simple functions or callable classes that for example resize the genomic intervals or one-hot-encode the DNA sequence
- extractors — given a genomic interval, extract the values from genome-wide files like FASTA or BigWig. See also genomelake for more extractors.
These building blocks allows you to write new dataloaders for your own models. See our colab notebook on how to use kipoiseq dataloaders to train a Keras model.
Contributing multiple very similar models with a template
To easily contribute model groups with multiple models of the same kind, you can now specify two files describing all the models:
model-template.yaml— template for
models.tsv— tab-separated files holding custom model variables
One row in
models.tsv will represent a single model and will be used to populate
model-template.yaml and construct
model.yaml using jinja2 templating language. This allows you to even write
if statements in
model-template.yaml. See CpGenie model as an example.
We now also test that the models predictions match the expected ones. Here is the additional field in model.yaml:
File specified under
test.expect is an HDF5 file containing the input values and model predictions. You can generate this file either running
kipoi test <model> -o expect.h5 or
kipoi predict ... -o expect.h5 --keep-inputs.
Note that this command is used to generate the file and has to be ran only once. If model.yaml contains the
kipoi test <model> invocation will also test that the predictions still match.
Testing if the predictions match is extremely important as the deep learning frameworks are frequently releasing new versions and we have to make sure that the models stored using the older version still yield the same predictions. This becomes even more important once we start porting models from one framework to another via ONNX.
Common conda environments
We now also provide a set of hand-curated conda environments suitable for multiple model groups. These environments can be installed through
kipoi env create. Run the following two commands to install two common environments covering almost all the models in Kipoi:
kipoi env create shared/envs/kipoi-py3-keras1.2
kipoi env create shared/envs/kipoi-py3-keras2
You can see the list of covered models by these two environments here. For each model, you can get the appropriate environment name by running
kipoi env get <model>.
This allows you to automatically activate the right environments in bash scripts or Snakemake rules:
source activate $(kipoi env get <model>)
If you instead want to just invoke a single kipoi command within a custom environment, you can instead get the absolute path to the
$(kipoi env get_bin <model>) predict .... <model> -o file.tsv
We test that all the model predictions still match in this new common environment.
- Allow to parametrize custom models PR#245
- Keep track of the kipoi version required for the models source and display a warning if it has to be updated PR #377
- Allow to read yaml files with additional fields using the old kipoi version (e.g. only display a warning)
- Add option to disable automatic updates of the model repository. Use