(Part II) Omni Channel Deployment Focused Machine Learning Workflow For On-Device Capture Quality Models

Socure
The Socure Technology Blog

--

By Edward Li, Senior Manager of Computer Vision

In Part I of this mini series we looked at what a traditional machine-learning (ML) workflow encompasses. Now, in Part II, we’ll look at a modified Omni Channel workflow.

It’s logical to see that if our deployment target becomes edge devices, such as web browsers or cell phones, changes must be made to the workflow to ensure deployment to these edge devices is successful.

In this new flow there are two major additions to our traditional workflow:

1. Model conversion

2. On-device model benchmarking

Socure’s document verification capture apps have computer vision models on device to check for document and selfie capture quality, because user experience should be consistent — no matter the platform. Because our goal is to ensure omni-channel performance for our capture quality models, we must ensure that trained models can be converted to a format that the edge device can consume while at the same time having the same model performance in both speed and accuracy across different edge devices. This is where our team has adapted our workflow to ensure we derisk these deployment risks as early as possible ( see Figure 2).

Model conversion is a critical step in the ML workflow, as different edge inference platforms can have different implementation details that affect how specific layers and operations perform on device. One such example is the implementation of depthwise convolution. Certain frameworks in the past have converted depthwise convolution as a stack of kxkx1 convolution operations run after splitting a nxmxk input tensor to k nxmx1 tensors, followed by a concat back to n’xm’xk. Although this is a sensible solution, it’s slow on device and that can be suboptimal. Examples like this can be fairly common during the deployment phase, and some layers are even unsupported in certain deployment frameworks.

If we build our models without deployment in mind, we can end up redoing a lot of the experimentation when these issues arise. Therefore, to mitigate unforeseen conversion issues during deployment, we add model conversion to the workflow right after the model architecture candidate has been defined.

Without any trained weights we can still fully test out whether the architecture can be converted to our final edge format. Our team has built tools for converting from either tensorflow or pytorch (which is preferred, but more on that in another blog) training frameworks into our target frameworks of Core ML (iOS), TensorFlow Lite (android), and TensorFlow.js (web). This allows us to quickly adjust our model architecture, should there be any conversion issues and incompatibilities, before spending any resources on training.

So if our model converts, you may be wondering, how do we make sure these models are the same, performance wise, across our different channels? This is where adding on-device model benchmarking really helps. We add this stage to our workflow in two places, each with its own purpose.

The first is right after the model conversion in the modeling phase in which we mostly benchmark for speed. Using AWS device farm and our in-house model benchmarking apps, we can benchmark the inference speed of different architecture candidates across a wide range of devices.

This gives us a clear picture of the device coverage for each architecture candidate. Should an architecture candidate be too slow for a certain framework, we need to re-evaluate the model candidate and make adjustments until we satisfy the runtime requirements. The second place where we run on-device benchmarks is right before model deployment. At this stage we know that our models can convert to our target edge platforms and we know that they are fast enough on device. At this stage we need to validate that our converted edge models can retain the same level of accuracy/loss as our originally trained model checkpoint.

For a lot of on-device frameworks, quantizations and floating point reductions can be performed on the models weights among other optimizations and simplification to make models run faster on device, this can potentially result in sacrifices to model accuracy. Therefore, before we ship the model, we run benchmarks across our validation dataset using our previously mentioned benchmarking apps and on AWS device farm to validate that the inference model does not deteriorate in accuracy across the three different inference frameworks or from our original training framework. If there are any problems, we then go back to debugging the entire pipeline or perform retraining with modifications to address this problem.

After on-device validation is complete, we ship the model to our engineering teams to be included in our capture apps and sdks, knowing that it’s performant in both speed and accuracy.

Following this workflow and keeping track of datasets, code, model checkpoints, and model conversion allows us to deliver a model to our integration partners knowing it’s performing well and as risk free as possible. While at the same time, should it become necessary to retrain and update the model, we can easily do so with the best set of hyperparameters and configs. Doing work the right way makes it easier not just for us but for colleagues in other functional orgs as well.

Building ML models is a complicated and risky process. Iteration speed is the key to produce high-quality production-grade models that satisfy both speed and accuracy targets. Doing it the right way with the right workflow and tooling can greatly derisk the process. Being deployment-focused and thinking about the result will greatly improve the throughput of your ML teams and ensure successful delivery of your ML models.

In the last part of this series, we’ll dive deep into changes we make to the Omni Channel workflow to deploy our on-device capture quality model.

--

--

Socure
The Socure Technology Blog

The leading provider of digital identity verification and fraud solutions.