How to upload your ML model to FathomNet

Eric Orenstein
FathomNet
Published in
6 min readOct 26, 2021

A major part of the FathomNet project is the ability to share and index lots of different automated models to analyze underwater images. Sharing your model will allow others to use and build off of it for their own work. In this article, we describe how to connect your model to FathomNet and ensure that other users can properly cite it. There are many ways to do this, but we recommend using Zenodo, an open science initiative built by the good folks at the European Organization For Nuclear Research (CERN) and the OpenAIRE project.

The process is very straightforward and will look pretty familiar to anyone who has used tools like Google Drive. There are three basic steps:

  1. Prepare your model and associated files for submission.
  2. Upload the model to Zenodo, write a description, fill out required fields, and secure a unique Digital Object Identifier.
  3. Connect your model to FathomNet on Zenodo and via our ModelZoo on GitHub.

As an example, we will walk through uploading out Benthic Supercategory Object Detector. The model is a version of RetinaNet fine tuned from localized image data collected by the Monterey Bay Aquarium Research Institute (MBARI) in the Northeastern Pacific. ‘Supercategory’ refers to the 20 categories of interest formed by semantically grouping lots of taxonomic classes.

What is Zenodo?

Zenodo is an open access repository primarily built to support high energy physics research at the Large Hadron Collider in Switzerland. Experiments there generate petabytes of data that must be shared by lots of collaborators across international borders. The infrastructure they built is available to the scientific community writ large, allowing researchers around the world to upload up to 50 GB per dataset. Once uploaded, your data is stored in the CERN Data Center and given a Digital Object Identifier (DOI) so others can properly cite your work. All you need to do to use this amazing resource is create an account with your email address, GitHub user profile, or ORCID. Truly open science at its best.

Zenodo is far from the only way to host your model or generate a DOI associated with it. We prefer to use this specific service due to the data safeguards Zenodo has implemented and the ease with which individual models can connect to the FathomNet Zenodo community.

1. Prepare your model

Before you start the upload, gather everything you want to put in the DOI. You do not need to share absolutely everything associated with your model. At a minimum, you will want to include the model weights and any ancillary information a user needs to run the system. Here are the basic elements your DOI should include:

  • Model parameters — A file that dictates model behavior. Different learning packages and model types are saved with a variety of extensions and formats. PyTorch, for example, saves model weights as .pth or .pt serialized files. Any type is fine, but be sure to specify what is needed to run it.
  • Train/validation lists — Documents listing the images, ideally itemized as URLs or Universally Unique Identifiers, and associated annotations used for model training and evaluation.
  • Performance metrics — An image or array providing a snapshot of model performance on an independent validation set. A confusion matrix, for example, gives users an at-a-glance idea of how well a classification model works. Other metrics or visualizations might be more appropriate depending on the model type and target output.
  • Description — A short, high level explanation of what is contained in the repository. This should include information regarding the software used, how the model was trained, and how the annotated data is structured and accessed.

There are many other files that might be relevant for your upload. For example, we uploaded benthic_label_map.json that describes the mapping between all the lowest taxonomic level annotations in the training data and the 20 final semantic classes.

2. Upload your model to Zenodo

With your Zenodo account and trained model ready to go, it is time to start uploading! Zenodo’s interface is well-designed and easy to use. Before you start creating DOIs, a word of caution: once you upload your model and the associated files you will NOT be able to add or remove any files. Be sure only to hit that ‘publish’ button when you are good and ready.

Drag and drop from your computer

We’ve all seen interfaces like this before. Select what you want from your local machine and drag it into the upload field.

Link to the FathomNet community

FathomNet has a community on Zenodo to help administrators and users keep track of available models. Simply type FathomNet in the ‘communities’ field and the community will drop down. Feel free to link your model to any other relevant communities. That might be a set of collaborators, an organization, or any other group that has a community on Zenodo.

Title, authors, and description

These are the important fields where you can fill potential users in on what specifically your model is. Basically, provide enough information that someone else could download everything from your DOI and use it fairly easily.

We gave the upload a descriptive name, linked it to the creators (in this case FathomNet founder Ben Woodward of CVisionAI with Lonny Lundsten and Eric Orenstein at MBARI), and wrote a description including a brief explanation of each file.

The ‘Version’ field is optional, but is a good way to keep track of what might change if you update your DOI. Again be aware that you are NOT able to add new files or delete old ones, but you can upload new versions (such as a better performing set of model weights).

Select license

Zenodo gives you the option to select from a few different licenses:

We strongly recommend full open access rights for any model that you would like to associate with FathomNet. We understand that users might want to temporarily restrict access while awaiting publication and will consider embargoes on a case-by-case basis. We will not link to Closed Access contributions in the FathomNet Model Zoo. Please check out details regarding different flavors of Creative Commons licenses if you have any questions about what type is most appropriate for your work.

Other details

There are lots of optional bits and pieces at the end of the Zenodo upload form. You can link the work to specific grants, other DOIs, journals, etc. FathomNet does not require any of this information and we leave it at your discretion to fill out.

3. Link your DOI to FathomNet

Once you click the ‘Publish’ button, you will be able to connect your work to the broader FathomNet project.

via Zenodo community

This will be done automatically when you submit the DOI for publication. Zenodo will alert FathomNet admins who will make every effort to verify your submission in a timely manner.

via GitHub

Add your model to the FathomNet ModelZoo on GitHub. You can include a name, the architecture, habitat type, a brief description, and a link to the model itself. We prefer that all models submitted to the ModelZoo have an associated DOI to ensure proper attribution.

Done and done

You have published and shared your model with the world! Thank you for participating in the FathomNet community. Be sure to check out our other Medium articles describing how to upload your annotated image data and use the FathomNet Python API to interact with the database. And, of course, stay tuned for loads more as we continue with beta testing.

--

--