Managing Deep Learning Models

At SearchInk we are starting to see a big growth in the number of machine learning and deep learning models being generated. The reason for this is that we are running numerous experiments and simultaneously deploying certain models into production. At some point, we asked ourselves the following questions:

  • Which model should we deploy in production?
  • Where is the model stored?
  • Where are the parameters and evaluations of the model stored?

To answer these questions, we ideated on a solution we considered to be simple and scalable. Our thought process was as follows:

Fig 1: Thought process
  • We cannot store models in our git repositories as they are too large. The better alternative is to store them in Cloud Storage Buckets. We are currently using Google Storage Buckets for the same. We select models from here for our deployments.
  • We also need to store the metadata associated to the model, such as, training/validation/test accuracy and loss, the dataset it was trained on, the hyper-parameters and other metrics like precision, recall, AUC ROC, to name but a few. We decided to store it on MongoDB.
  • The final part was having a simple dashboard that provides the overview for all the models.
Fig 2: Our first version of the dashboard

It is common to see the models in production that do not perform as well as previous models that were deployed. With this framework it is a breeze for us to switch to previous models.

In the forthcoming versions, we will be adding filters and various evaluation metrics to this dashboard, which will help us compare models and allow us to see the results generated by these models. This along with an admin console, where non-tech folks can manage models based on various criteria will give us a greater win. It will help in checking how the model performs on sample datasets which they can upload autonomously.