Implementing deep learning models for fast experiment and production

On Experience of Implementing Deep Learning Models?

Improve proficiency for both experiment and production with deep learning models.

Kien Hao Tiet

Published in

Aviation Software Innovation

7 min readFeb 22, 2021

This blog aims to show the readers how we implement deep learning models, which improve our proficiency for both experiment and production. Note: Yes, not just one model but models.

I. Introduction

In recent years, deep learning (DL) has gained enormous attention, and the desire of integrating deep learning into existing products is higher than before. At the same time, there are countless tutorials on different aspects of deep learning with different DL libraries or automatic grad libraries such as PyTorch, Tensorflow, etc. However, the reality is that there is a big difference between tutorials or even research repos and the production code. In this blog, we will layout some practices that we believe will improve your proficiency either for your experiment period or in production.

Disclaim: We are not aiming to establish any standard of developing deep learning systems. All of the points below are from our experience. The solutions are not 100% percent from us but they are also inspired by many tutorials that we came across. So if you have any opinions on these, please leave comments or responses, we would love to learn more about different practices.

II. View the models differently

The first aspect that we want to tackle is about organizing the models. Let’s consider why this step is important. Let’s say you are training the models with a supervised learning style. Then, you found that there are a large amount of unlabeled data available, which is sweat for a semi-supervised learning setup. Moreover, most DL practitioners are trendy; thus, you also want to implement and experiment with a contrastive learning setup. With simple scenarios like that, you can see if you spend a decent amount of time at the beginning of the project to plan out, it will “give you wing” to switch between different learning setups and fast implementation for the new ideas.

Trainers vs. models

In our opinion, we should distinguish between trainers and models to make the implementation and logic easier. First of all, what is the trainer, and what is the model? How are their differences?

The model, in some aspect, is the architecture in which we interact the data with some computations. For example, Transformer is a model, and ResNet50 is a model. Whenever we describe the model, we usually do not mention how the data looks like (except what type of data — e.g. images, paragraphs, etc.). All we care about are how we process an abstract batch of data. In short, the model will take a batch of data as the input. We do not care how many batches in the dataset, what are the optimizers using, and others. In general, the model is dedicated just to defining the forward-pass and backward-pass only.

It is worth noting that the models usually represent the inputs. For example, when we are dealing with supervised learning, the representation of the inputs will be before the Softmax level. On the other hand, the representation in the semi-supervised learning is the n-dimensional vector for the supervised part and another n-dimensional vector to measure the difference on the unlabeled part. When we set up in this way, it is easier to switch between setups.

On the other hand, the trainer is the object in which it will take the train_dataloader, val_dataloader, test_dataloader, optimizer, scheduler, loss functions, etc. The purpose of a trainer is to associate the model with other components of training a deep neural network, especially data.

The benefit of having a clear distinction between models and trainers is to easily and explicitly split the code between the model and the trainer, which can help us to easily tune the model or change the optimizer etc. For example, if you are looking at the following code that uses PyTorch Lightning.

As you can see, the model and the trainer combine into one class. If you want to experiment with another architecture, you either need to create a similar class with a different architecture or split the trainer and the model as we are suggesting. It is worth noting that we are not against the idea of PyTorch Lightning, but indeed, we are using Lightning as part of the work, and we believe we can avoid boilerplate and scale the code.

2. Different trainer for different purposes

Another the benefit of splitting the trainers and the models is as below:

We have the UniversalTrainer to define all the optimizers, schedulers, etc. that we can use in the course of training. Each Trainer will have a different purpose. For instance, the SupervisedTrainer will be used to unfold the dataloader into (data, labels) manner. The ContrastiveTrainer will be used to organize the dataloader into (data, positive samples, negative samples, [labels] — the labels were put into bracket because it depends on which type of contrastive learning).

Then, it does not matter which learning setup you are using, the model does not need to change as long as we are putting a batch of the data (can be either labeled or unlabeled data) into the model and expecting to receive the representation vector from the model.

3. Composition design pattern principle

Let’s imagine that we want to build an application for a hospital in which we have two types of organizing the data. First, we can either do the k-fold validation on all data points or organizing the k-fold at the patient-level. The ETL or ELT pipeline may be similar. The practical way is to create a data processing class that composes extract, transform and load components. As we can see from the scenario above, the extract can be different between the two types of evaluation. Therefore, it will not be suitable to use inheritance in this case. The composition design pattern will be the best practice in this case where each component will be implemented independently and be joined on the master script. Have a look at the picture below.

Each step of the ETL pipeline is put into different folder so that we can manage and make change quickly. Picture is captured by the author

Each pipeline will can be declared like above with composition design pattern. The image is captured by the author.

4. Dependency Injection

In the easy term, dependency injection means passing the dependency into the object instead of directly initializing in the class. With the image above, you can see that I passed the data_config, args, and graphql_executor into the class instead of initializing each E, T, and L steps. The reason for doing dependency injection is to give us the flexibility to pass in a different setup. Our Dataset class does not need to worry about which class it should initialize. Moreover, when you introduce changes into the setup, we can simply pass the new change into the class instead of manually changing the method of the class.

III. Using the YAML file to control the Workflow

It depends on the tasks, but the common workflow for the DL project will be extracting the data from CSV files or database, transforming the data into a desirable form, and loading it into some iterable objects (e.g., Dataloader like PyTorch). From there, the iterable objects will be put into the models for training and evaluation.

As you can see there will be three main steps: organizing data, training the model, and doing evaluation. During doing experiments, we want to try different parameters for each step. Let’s consider two scenarios:

Scenario 1:

An example of the configuration:

For example: In one experiment, we want to split the train-set and test-set in by 80–20 ratio. On another experiment, we want to experiment leave on sample out for train-set and test-set.

Note: The reason why we have the “ — “ in the configuration file because it will be consistent when parsing to the API or command run.

Scenario 2:

It will be easier for the code and model version control. Since we only change the value in the configuration file, this can be tracked through Github. On the other hand, we can easily track the model version and its performance with this file instead of tracking the whole code.

Also checkout our other blogs:

On the Design Space of Deep Architecture Models

Goal: This blog is to introduce readers to a new perspective about finding a new architecture based on statistical…

medium.com

When Does Transfer Learning Fail in Deep Learning

This blog is to investigate in the fallen of transfer learning in deep learning.