AI on a cloud native WebAssembly runtime (WasmEdge) — Part II

Published in

Wasm

8 min readAug 2, 2021

Our last article (Part I), showed how to run object detection in a WebAssembly runtime called WasmEdge. This Part II article is going to show you how to decode the TensorFlow Lite model’s outputs.

I found a nice Single Shot Detector (SSD) which has the outputs fragmented up into different sections i.e. bounding box coordinates, categories of objects detected, scores of the objects detected and so forth. This is perfect for demonstrating how to interpret TensorFlow Lite outputs. It is a little different than the last article; this time we are detecting food objects.

Let’s get started!

Looking at this SSD model we can see that there is a single input. This input is a single image (320px by 320px) in the format of RGB pixels.

Our Rust source code can easily provide the model with this format. Here we have a function called detect which accepts an image as a byte array. We load the byte array as an image object, resize the image to 320px by 320px and then proceed to flatten the image into byte array of RGB values.

pub fn detect(image_data: &[u8]) -> Vec<u8> {
    // Load image
    let mut img = image::load_from_memory(image_data).unwrap();
    // Resize image
    let mut resized = image::imageops::thumbnail(&img, 320, 320);
    // Flatten image
    let mut flat_img: Vec<u8> = Vec::new();
    for rgb in resized.pixels() {
        flat_img.push(rgb[0]);
        flat_img.push(rgb[1]);
        flat_img.push(rgb[2]);
    }

Next we load the TensorFlow model, then start a new TensorFlow Lite session in WasmEdge (which is talking directly to the TensorFlow C API), and then add an input placeholder (as per the model properties).

The input is called serving_default_images:0

// Load tflite model
let model_data: &[u8] = include_bytes!("/media/nvme/model.tflite");
// Start a new TensorFlow Lite session in WasmEdge
let mut session = wasmedge_tensorflow_interface::Session::new(model_data, wasmedge_tensorflow_interface::ModelType::TensorFlowLite);
// Add input 
session.add_input("serving_default_images:0", &flat_img, &[1,320,320,3]);

The outputs also need to be added to the session before we can run it.

Here is the Rust code to add outputs.

// Locations of detected boxes
session.add_output("StatefulPartitionedCall:3");
// Categories of the objects detected
session.add_output("StatefulPartitionedCall:2");
// Scores of the detected boxes
session.add_output("StatefulPartitionedCall:1");
// The number of detected boxes
session.add_output("StatefulPartitionedCall:0");

At this point we can run the object detection session

session.run();

After the session has run, we then collect up the outputs and load them into Rust vectors

let res0: Vec<f32> = session.get_output("StatefulPartitionedCall:3");
let res1: Vec<f32> = session.get_output("StatefulPartitionedCall:2");
let res2: Vec<f32> = session.get_output("StatefulPartitionedCall:1");
let res3: Vec<f32> = session.get_output("StatefulPartitionedCall:0");

Parsing the results is next; essentially we are making sense of the numerical values in the above vectors. The ultimate goal is to draw and label the highest scoring objects which were detected.

// Parse results.
let mut iter = 0;
let mut box_vec: Vec<[f32; 4]> = Vec::new();
let mut label_vec: Vec<u8> = Vec::new();
while (iter * 4) < res0.len() {
    // Check that the detection is high ranking
    if res2[iter] >= 0.3 {
        let x1 = res0[4 * iter + 1] * 512.;
        let y1 = res0[4 * iter] * 512.;
        let x2 = res0[4 * iter + 3]  * 512.;
        let y2 = res0[4 * iter + 2]  * 512.;
        if x1 > 0.0 && x2 > 0.0 && y1 > 0.0 && y2 > 0.0 { 
            box_vec.push([x1, y1, x2, y2]);
        }
        label_vec.push(res1[iter] as u8);
    }
    iter += 1;
}
println!("Parsed results in ... {:?}", start.elapsed());

In the above code, you can see that we iterate through the res0 vector. There are 4 f32 values per box. These relate to the 2 xand the 2y coordinates.

There is one line in there which relates to the labelling of the object; as you can see we push values to a label_vec each time we process a whole box.

label_vec.push(res1[iter] as u8);

More on labelling later.

A quick word on image shape. You may notice that we are multiplying the bounding box coordinate values by 512. This is because the original image is actually 512px by 512px.

The model’s input requires that we resize the original image to 320px by 320px; this is fine for processing. However, from an end user’s point of view we want to return the original image to the caller. We therefore need to perform the * 512 multiplication to reintroduce the original image’s shape.

You could perform this shaping dynamically (but we have done it statically here for demonstration purposes). You could capture the image_height and image_width when you first create the image object.

let image_height: f32 = img.height() as f32;
let image_width: f32 = img.width() as f32;

You would then apply these height and width values when parsing the output like this (x values are multiplied by width value and y values are multiplied by height value)

let x1 = res0[4 * iter + 1] * image_width;
let y1 = res0[4 * iter] * image_height;
let x2 = res0[4 * iter + 3] * image_width;
let y2 = res0[4 * iter + 2]  * image_height;

At this point we have a new vector called box_vec which holds the box coordinates which can be drawn onto the original image.

Before we move on to drawing the boxes, I should mention that we briefly accessed the res2 result. This res2 vector holds the scores for each detected box. The model properties look like this.

The actual output from res2 looks like this

StatefulPartitionedCall:1[0.8125, 0.72265625, 0.5859375, 0.45703125, 0.4296875, 0.36328125, 0.36328125, 0.3203125, 0.3125, 0.28515625, 0.26171875, 0.25, 0.23828125, 0.23828125, 0.23046875, 0.22265625, 0.1796875, 0.1796875, 0.1796875, 0.171875, 0.16796875, 0.16015625, 0.1484375, 0.1484375, 0.1484375]

As you can see, whilst iterating through the potential box coordinates, we have introduced a lower bound threshold of 0.3. This means that we will not process any bounding box coordinates for objects with scores lower than 0.3

if res2[iter] >= 0.3

Drawing the boxes is done by iterating through the box_vec vector like this; the line is green i.e. R,G,B,A is set to 0, 255, 0, 0

let line = Pixel::from_slice(&[0, 255, 0, 0]);
for i in 0..box_vec.len() {
    let xy = box_vec[i];
    let x1: i32 = xy[0] as i32;
    let y1: i32 = xy[1] as i32;
    let x2: i32 = xy[2] as i32;
    let y2: i32 = xy[3] as i32;
    let rect = Rect::at(x1, y1).of_size((x2 - x1) as u32, (y2 - y1) as u32);
    draw_hollow_rect_mut(&mut img, rect, *line);

For the labelling, we choose a font

let font = Vec::from(include_bytes!("DejaVuSans.ttf") as &[u8]);
let font = Font::try_from_vec(font).unwrap();

The labelling is done with a Rust image function called draw_text_mut. This allows us to pass in a few values such as:

a mutable image (the original image object)
text color
x and y coordinates
the scale of the text (text size)
the preferred font
the text

let scale = Scale {
    x: ((x2 - x1) / 6) as f32,
    y: ((x2 - x1) / 6) as f32,
};
println!("Drawing label at x: {:?} and y: {:?}", xy[0], yi);
draw_text_mut(&mut img, Rgba([50u8, 50u8, 50u8, 0u8]), (x1 +1) as u32, (y1 +2) as u32, scale, &font, text);
draw_text_mut(&mut img, Rgba([0u8, 255u8, 0u8, 0u8]), x1 as u32, y1 as u32, scale, &font, text);

I decided to write some code to dynamically size the text i.e. base the text’s size on the width ( x values ) of the object. I also noticed that coloured text is hard to read on some surfaces. Therefore a drop shadow was introduced by simply writing the same text in grey whilst adjusting the label’s coordinates by just 1 or 2 pixels.

Let’s go ahead and run all of this Rust source code on WasmEdge and see the results.

Using WasmEdge

WasmEdge is a cloud native WebAssembly runtime for edge computing created by Second State Inc.

WasmEdge is the first WebAssembly runtime to be a Cloud Native Computing Foundation (CNCF) project. WasmEdge enables serverless functions to be embedded into any software platforms; everything from cloud’s edge to SaaS in automobiles.

Please see this repository to ensure you have the appropriate dependencies.

Then, obtain the source code (which we have prewritten) and compile in readiness for WasmEdge execution

git clone https://github.com/second-state/wasm-learning.git
cd wasm-learning/faas/tflite_ssd
rustwasmc build

WasmEdge can be run on many different platforms. Today, for demonstration purposes, we will be running this object detection from inside node.

// Import file system library
const fs = require('fs');// Create WasmEdge instance
const wasmedge = require("wasmedge-extensions");// Use this first time (initial call)
const path = "/media/nvme/yolo/wasm-learning/faas/tflite_ssd/pkg/tflite_ssd_lib_bg.wasm";vm = new wasmedge.VM(path, { args:process.argv, env:process.env, preopens:{"/": "/tmp"} });// AOT path
aot_path = "/media/nvme/aot_file.so";// If you want to, please go ahead and make an aot file
vm.Compile(aot_path);// Use this after the first time (subsequent calls)
var vm_aot = new wasmedge.VM(aot_path, { EnableAOT:true, rgs:process.argv, env:process.env, preopens:{"/": "/tmp"} });// Open image
var img_src = fs.readFileSync("/media/nvme/image.png");// Run function
var return_value = vm_aot.RunUint8Array("detect", img_src);// Write image to disk
fs.writeFileSync("res.jpg", return_value);

I went ahead and downloaded just a random image of tomatoes growing to test this model. The idea is to never use images that the model has been trained on; for obvious reasons. Here is the result. It is also worth noting that this model was only trained with 25 images which is a very small training data set. You would typically use thousands of images to train a model thoroughly.

Just to make sure that we can actually detect a salad, I randomly downloaded an image of a salad from the web and Voilà

Conclusion

There are many different models to choose from. Each can have very different inputs and outputs; based on what the designer has built. It is very important to find the information about how to parse and decode a given output. If you are interested in learning more about running compute intense tasks on a WebAssembly runtime and taking advantage of both safety and speed aspects that WasmEdge provides please leave a comment and/or checkout the GitHub repositories for WasmEdge and also SecondState.

AI on a cloud native WebAssembly runtime (WasmEdge) — Part II

Using WasmEdge

Conclusion

Written by Timothy McCallum