Running your Deep Learning models in a browser using Tensorflow.js and ONNX.js
Today we will discuss how to launch semantic segmentation and style transfer models in your browser using Tensorflow.js and ONNX.js.
The purpose of this article is to determine if relatively large models can be used in a browser on your PC and mobile device.
Semantic Segmentation is a classic Computer Vision problem which involves taking as input some raw data (eg., 2D images) and converting them into a mask with regions of interest highlighted. Many use the term full-pixel semantic segmentation, where each pixel in an image is assigned a classID depending on which object of interest it belongs to.
I chose to go with Fast-SCNN architecture as a semantic segmentation model, because “it yields a mean intersection over union (mIoU) of 68.0% at 123.5 frames per second (fps) on a modern GPU (Nvidia Titan Xp (Pascal)) using full (1024×2048px) resolution applied on Cityscapes” according to this article. 120 fps at such a high resolution is an amazing result, which means that the network should be pretty light-weight, which, as it turns out, it is. It uses some of the techniques for creating fast semantic segmentation models such as depthwise separable convolutions and two-branch methods to improve model inference time while sustaining the prediction accuracy.
Let’s define a style transfer as a process of modifying the style of an image while still preserving its content.
Given an input image and a style image, we can compute an output image with the original content but a new style. This is how input, style and output images might look combined.
At first, I tried using this Pytorch implementation of style transfer and converted it into ONNX format. This implementation of style transfer replaces some of the Pytorch operations such as reflection pad and interpolation with a sequence of basic operations because ONNX.js currently doesn’t support these operations. The problem occurred when I tried launching this model on a mobile browser and it didn’t work, which lead me to change this model to Tensorflow implementation of different style transfer architecture and running everything in tf.js. By the way, tf.js also currently doesn’t implement a reflection pad, which is why I replaced it with a regular zero pad, which tf.js supports.
All the models were converted in Python with tf.js version 0.8.6 using this guide:
tensorflowjs_converter --input_format keras \
path/to/tfjs_target_dirtensorflowjs_converter --input_format frozen_model \
Let’s get through the main workflow of launching these two models together. First, we should get our input:
Then we resize the image to model’s input size, normalize, predict and resize back to the size of our initial image:
Threshold our mask:
The process of running style transfer on top of semantic segmentation is pretty similar to the one described above. However, we should blend our style transfer output with source image data according to our segmentation mask:
In the end, we recursively call our function, to process the next frame from video cam:
Here we segment laptops and apply the style transfer to them.
If we run our models in the browser, we get the following results:
1.3 fps using WebGL backend: 141 ms — segmentation time (with full pre/post-processing), 576 ms — style transfer time on one iteration (with full pre/post-processing), the rest of the time is taken by getting/putting images into the canvas.
Hopefully, this article gave you some insight into the process of deploying models in a browser and you found out about the struggles you might bump into during development.