Machine Learning Software Development Kit: an introduction and main functions

Dmitry Kiselev
Deelvin Machine Learning
6 min readJan 12, 2021

Today our Deelvin team would like to present Machine Learning Software Development Kit (ML SDK). It is a library which takes care of any ML related problems that one would need to solve. The first version of C API is somewhat simple, if not primitive, but it will undergo further development in the future.

ML SDK is a product to solve tasks in the field of computer vision.

Currently, ML SDK allows one to:

  • Detect ads in video streams or any frame sequences
  • Detect or perform segmentation of video compression algorithm artifacts
  • Perform human image semantic segmentation (and you can watch our demo here)
  • Upscale low resolution images using machine learning
  • Detect various objects and their coordinates.

This article serves two purposes: it is a press release and a short manual of the product. The article will demonstrate how ML SDK works and describe all its functions.

The following notes describe a general API structure:

  1. Each of the above problems has a “module” related to it and each such module has a short name, which serves as a prefix. For example, perseg is the prefix for a person segmentation module.
  2. Usually, for any kind of user-controlled object there exist two functions: <module-name>_*_create_<object> and <module-name>_*_delete_<object> , with obvious intended purpose.
  3. For “detectors”, “segmentators” and alike there are <module-name>_*_predict functions. They provide an interface to utilize model inference with some convenient pre- and post-processing.
  4. Every function works synchronously. Optionally, there may be several functions named <module-name>_*_async which provide asynchronous interface to their counterparts (with the use of user callbacks).
  5. ML SDK uses its own structures as input and output to functions, but they try to mimic a familiar layout. For example, an image is made like frame planes from AVFrame in ffmpeg, plus two user fields.

Showcase of work with ML SDK

As an example, work with person segmentation module will be demonstrated. More detailed info about human image semantic segmentation can be found here.

// Create segmentator and check if creation has failed
CVQPersonSegmentator segmentator = perseg_create_segmentator();
if (!segmentator) {
fprintf(stderr, "%s\n", perseg_errmsg());
exit(-1);
}
// Create Image struct and fill it
struct Image image;

image.image_data.width = 1920;
image.image_data.height = 1080;
int32_t stride[1] = {1920 * 3};
image.image_data.stride = stride;
image.image_data.colorspace = PIXEL_FORMAT_RGB24;

// Raw unformatted RGB24 bytes
// This buffer is defined and loaded somewhere else
//uint8_t* buffer[1] = {(uint8_t*)malloc(1920 * 1080 * 3)};
image.image_data.buffer = buffer;
// Just create mask, no need to think about its structure or what exact size is needed
struct Mask* mask = perseg_create_mask(image.image_data.width, image.image_data.height);

// Prediction
// Notice that user owns all arguments
// Nothing is allocated inside and is needed to be cleaned up later
perseg_predict(segmentator, &image, mask);
// But since user owns allocated storage, user must take care of it
//free(buffer[0]);
perseg_delete_mask(mask);
perseg_delete_segmentator(segmentator);

It’s a snippet from a person segmentation code sample with all unnecessary stuff omitted. A bit of a warning is in place: ML SDK does not support any kind of codec or picture format out-of-the-box. It could be done purposefully, though.

The images below demonstrate how ML SDK functions. ML SDK takes the left image as an input and produces person segmentation output visualized with the right image. In the second image the person’s contours are clearly outlined in white against the black background. ML SDK did a good job separating the hair as well as all other details of the image.

Person segmentation output visualization
Person segmentation output visualization

A short summary and the list of all functions

Content detection (vqcd)

Content detection takes one image from a sequence as its input and it can return structure with a predicted type of content (payload/ad) and a number of frames that have been fallen into that type. It also can silently eat frames until it will be sure about the content type and the number of frames in the current batch.

The functions vqcd_create_detector and vqcd_delete_detector obviously create and delete this detector. vqcd_predict is there to synchronously process next frame in sequence, but it may do nothing. In addition to output with info, there is a boolean value that says if current image has triggered inference. If vqcd_predict returned false , then output is meaningless (and was not touched at all).

This element has asynchronous mode of work. vqcd_init_async and vqcd_abort_async can be used to start and stop separate thread (it also stops automatically when detector is destroyed). vqcd_predict_async is an analogue of vqcd_predict , but it will use user callback provided to vqcd_init_async to notify the user about its prediction. vqcd_lock_context_async and vqcd_unlock_context_async , as one can infer from their names, can be used to lock and unlock user context mutex.

Artifact detection/segmentation (artdet)

Originally, this module was intended to handle only artifact detection, but later it deemed reasonable to have segmentation as well. Now it does only the latter, which can also be used to emulate the former, albeit slower.

There are several segmentators in this module. The first is a simple one. It takes an image as its input and returns a mask (one channel / grayscale image) to highlight every artifact it has found in the image.

The functions artdet_create_segmentator and artdet_delete_segmentator create and delete it, respectively. artdet_predict is there to run inference. artdet_create_mask should be used to create an appropriate mask, with artdet_delete_mask to clean it up later. artdet_errno and artdet_errmsg could be used to get an idea of which kind of error has occurred.

The second kind of segmentation is able to pinpoint different types of visual artifacts. It returns separate masks for noise, block and blur artifacts.

Again, artdet_multiple_create_segmentator and artdet_multiple_delete_segmentator manage the element. artdet_multiple_predict uses inference. artdet_multiple_create_mask_array and artdet_delete_mask_array manage the mask array, which will be the result of computation. artdet_multiple_errno and artdet_multiple_errmsg get a hang of a possible error.

Last, but not least, there are three segmentators that produce mask arrays for artifacts specific to a particular algorithm type: AVC, MJPEG, Wavelett or HEVC. Their main differences are in the accuracy aspects and existence or lack of probability labels.

Several other important functions are the following.

To create and delete:

  • artdet_multiple_2_create_segmentator and artdet_multiple_2_delete_segmentator
  • artdet_multiple_w_labels_create_segmentator and artdet_multiple_w_labels_delete_segmentator
  • artdet_multiple_w_labels_2_create_segmentator and artdet_multiple_w_labels_2_delete_segmentator

To make a prediction:

  • artdet_multiple_2_predict
  • artdet_multiple_w_labels_predict
  • artdet_multiple_w_labels_2_predict

To manage mask arrays:

  • artdet_multiple_2_create_mask_array
  • artdet_multiple_w_labels_create_mask_array
  • artdet_multiple_w_labels_2_create_mask_array
  • artdet_delete_mask_array

To understand errors:

  • artdet_multiple_2_errno and artdet_multiple_2_errmsg
  • artdet_multiple_w_labels_errno and artdet_multiple_w_labels_errmsg
  • artdet_multiple_w_labels_2_errno and artdet_multiple_w_labels_2_errmsg

Person segmentation (perseg)

It takes an image as its input, and gives a mask, which highlights all humans on that input picture. The mask format may seem to be an unusual one: it’s 0x00's and 0x01’s only.

The interface of this module is pretty obvious and straightforward. perseg_create_segmentator and perseg_delete_segmentator to manage segmentator. perseg_create_mask and perseg_delete_mask to manage masks. perseg_predict for inference. perseg_errno and perseg_errmsg to inspect errors that have occurred.

Video super resolution (VSR)

VSR takes an image as input, and gives an image as output. Its purpose is to obtain a picture with higher resolution using machine learning. The current ratio is 4:1 vertically and horizontally. It means that the output image will be 16 times bigger.

The interface of VSR is simple too. vsr_create_upscaler and vsr_delete_upscaler to create or delete upscaler. vsr_alloc_upscaled_image and vsr_delete_upscaled_image to manage upscaled image. Upscaled image is based on input one, thus user should not be bothered with calculations of size and memory allocation. vsr_errno and vsr_errmsg is there to identify an error cause.

Object detection (objdetect)

This one detects objects. It takes an image as its input and returns a variable amount of boxes to the user. Each box being a four number designation of a rectangle (x, y, width, height) with a class label on it.

Also, because of unknown number of boxes, this module is the only one that allocates memory on heap and dumps control of it onto the user.

objdetect_create_detector and objdetect_delete_detector are managing the detector. With objdetect_create_box_array and objdetect_delete_box_array to manage array of boxes and the user will not notice that something is off, but it is. One of the box array structure fields is assigned to a memory address allocated by the library. objdetect_predict is there to use inference. objdetect_errno and objdetect_errmsg to look up info about the error, should one occur.

Conclusion

Right now ML SDK is available as a trial version upon request. It is undergoing further development and it is expected that its API may change. Follow its development on Deelvin website. You can see ML SDK live using ML Player. For inquiries please contact our Deelvin team.

--

--