(Part I) Omni Channel Deployment Focused Machine Learning Workflow For On-Device Capture Quality Models

Published in

The Socure Technology Blog

5 min readJul 13, 2022

By Edward Li, Senior Manager of Computer Vision

To make sure machine learning projects are successful, it is paramount to have the right type of workflow to maximize team efficiency and increase iteration speed while eliminating blockers and potential problems as early on as possible. In this 2-part series, we will deep dive on the traditional machine learning workflow and follow up on what changes our team has added to that traditional flow to facilitate our on-device capture quality model deployment.

To summarize:

In Part One, we will discuss traditional machine learning workflow.
In Part Two, we’ll look at a modified Omni Channel workflow and dive deep into changes we make to the Omni Channel workflow to deploy our on-device capture quality model.

Part I: The Traditional Machine Learning Workflow

The traditional machine learning workflow (Fig. 1) consists of a few distinct stages, from project scoping to model deployment. There are some critical things that we consider during each stage:

1. Scoping the project

2. Dataset Curation

3. Model training and experimentation

4. Model deployment

Scoping the project

A project starts from understanding a problem and then scoping the work such that a solution is feasible, given certain accuracy or performance constraints. There are many different things to consider when scoping, including: how to acquire data, what is the desired objective of the model, the length of the development cycle required , and how this model fits in the product. Project scoping is a balancing act between feasibility, timelines and feature impact such that we solve a problem that maximizes the product. For example, an ID detector that can accurately find the four corners of an ID card looking object is a much easier task than a model that detects corners and discriminates between different ID types at the same time. Correctly scoping a project greatly reduces risk and improves iteration quality and speed.

Dataset Curation

After scoping, dataset curation is one of the most important stages of the machine learning workflow. In this stage we collect any necessary datasets and corresponding labels (if available). The training dataset directly determines your model’s performance and coverage. Having a representative high quality dataset directly affects how well the product performs. There are a few things to consider for this stage, subject to the specific situations. The first is data diversity: this is a very difficult problem as datasets can inherently contain biases in the distribution, and such biases can cause models trained from those datasets to generalize poorly.

The second is label availability: it can be difficult getting labeled data, as it is hard to label data at an acceptably speed and quality (e.g. depth estimation). In other situations, it might not be feasible to label every data sample manually. In these cases, hard label mining can help create massive amounts of labels at the expense of quality (we will talk about this in a later blog post).

Finally, in this stage, it’s important to think about a test/validation dataset that is representative of real-world use cases and can be used to measure model performance and facilitate comparisons with previous techniques.

Model training and experimentation

Now that we have our dataset for training we want to start doing some experiments to test our ideas. This process is highly iterative and can span for months depending on the scope of the project. As such, we apply the Scientific Method in this stage(yes, the one from high school)

Model development is the accumulation of multiple verified hypotheses.For example, can a resnet-50 architecture learn to classify my document data?
Is Adam the best optimizer for my architecture?
What activation function will give me the best result?

Understandably, some questions are easier answered than others, and certain intuitions guide us towards combinations of different components that we know perform well together, such as convolution with batchnorm. However, the rule of thumb is to break down experimentation into small simple chunks such that we can verify isolated changes we’ve made to our model or training process (e.g. elu instead of relu; or resnet50 vs resnet100; or batchnorm in front of convolution vs after). Each of these changes is an experiment with results verified from our validation dataset from the dataset curation stage.

Each of the experiments should have versioned code (git commit/Pull request/Merge request) and an entry on an experiment tracker, such as tensorboard or weights and biases that detail the exact configs and hyperparameter, in addition to versioned datasets. We need to know what exact circumstances to reproduce our model.

Additionally, hyperparameter optimizations help speed up the experimentation process. One additional tip I can offer: instead of training on your entire dataset with your first model candidate, train it on a small dataset to see if your model candidate is optimal to solve your type of problem. If so, scale the model/data iteratively until you converge on an acceptable model, making changes to model architecture or the training loop as necessary. This additionally helps us understand how our model scales with data.

If the validation performance of a model (accuracy, recall) is not optimal, iterate on that model. If you have exhausted your modeling experimentation without improvement, go back to the data curation step and improve your dataset to be more representative.

Models trained from the modeling stage can also be used to mine hard examples for labeling. During this iteration process, you can also adjust the scope of the project if needed.

Model Deployment

Once a deployment candidate has been identified, deploy the model in accordance with best practices. If you encounter issues with the deployment or performance limitations of the model running in production, iterate on your model or improve your model/dataset to address any production issues.

Part II

In the following post we’re going to look at a modified Omni Channel workflow and dive deep into changes we make to the Omni Channel workflow to deploy our on-device capture quality model.