Machine Learning Software Development Kit: an introduction and main functions
Today our Deelvin team would like to present Machine Learning Software Development Kit (ML SDK). It is a library which takes care of any ML related problems that one would need to solve. The first version of C API is somewhat simple, if not primitive, but it will undergo further development in the future.
ML SDK is a product to solve tasks in the field of computer vision.
Currently, ML SDK allows one to:
- Detect ads in video streams or any frame sequences
- Detect or perform segmentation of video compression algorithm artifacts
- Perform human image semantic segmentation (and you can watch our demo here)
- Upscale low resolution images using machine learning
- Detect various objects and their coordinates.
This article serves two purposes: it is a press release and a short manual of the product. The article will demonstrate how ML SDK works and describe all its functions.
The following notes describe a general API structure:
- Each of the above problems has a “module” related to it and each such module has a short name, which serves as a prefix. For example,
perseg
is the prefix for a person segmentation module. - Usually, for any kind of user-controlled object there exist two functions:
<module-name>_*_create_<object>
and<module-name>_*_delete_<object>
, with obvious intended purpose. - For “detectors”, “segmentators” and alike there are
<module-name>_*_predict
functions. They provide an interface to utilize model inference with some convenient pre- and post-processing. - Every function works synchronously. Optionally, there may be several functions named
<module-name>_*_async
which provide asynchronous interface to their counterparts (with the use of user callbacks). - ML SDK uses its own structures as input and output to functions, but they try to mimic a familiar layout. For example, an image is made like frame planes from AVFrame in ffmpeg, plus two user fields.
Showcase of work with ML SDK
As an example, work with person segmentation module will be demonstrated. More detailed info about human image semantic segmentation can be found here.
// Create segmentator and check if creation has failed
CVQPersonSegmentator segmentator = perseg_create_segmentator();
if (!segmentator) {
fprintf(stderr, "%s\n", perseg_errmsg());
exit(-1);
}// Create Image struct and fill it
struct Image image;
image.image_data.width = 1920;
image.image_data.height = 1080;
int32_t stride[1] = {1920 * 3};
image.image_data.stride = stride;
image.image_data.colorspace = PIXEL_FORMAT_RGB24;
// Raw unformatted RGB24 bytes
// This buffer is defined and loaded somewhere else
//uint8_t* buffer[1] = {(uint8_t*)malloc(1920 * 1080 * 3)};
image.image_data.buffer = buffer;// Just create mask, no need to think about its structure or what exact size is needed
struct Mask* mask = perseg_create_mask(image.image_data.width, image.image_data.height);
// Prediction
// Notice that user owns all arguments
// Nothing is allocated inside and is needed to be cleaned up later
perseg_predict(segmentator, &image, mask);// But since user owns allocated storage, user must take care of it
//free(buffer[0]);
perseg_delete_mask(mask);
perseg_delete_segmentator(segmentator);
It’s a snippet from a person segmentation code sample with all unnecessary stuff omitted. A bit of a warning is in place: ML SDK does not support any kind of codec or picture format out-of-the-box. It could be done purposefully, though.
The images below demonstrate how ML SDK functions. ML SDK takes the left image as an input and produces person segmentation output visualized with the right image. In the second image the person’s contours are clearly outlined in white against the black background. ML SDK did a good job separating the hair as well as all other details of the image.
A short summary and the list of all functions
Content detection (vqcd)
Content detection takes one image from a sequence as its input and it can return structure with a predicted type of content (payload/ad) and a number of frames that have been fallen into that type. It also can silently eat frames until it will be sure about the content type and the number of frames in the current batch.
The functions vqcd_create_detector
and vqcd_delete_detector
obviously create and delete this detector. vqcd_predict
is there to synchronously process next frame in sequence, but it may do nothing. In addition to output with info, there is a boolean value that says if current image has triggered inference. If vqcd_predict
returned false
, then output is meaningless (and was not touched at all).
This element has asynchronous mode of work. vqcd_init_async
and vqcd_abort_async
can be used to start and stop separate thread (it also stops automatically when detector is destroyed). vqcd_predict_async
is an analogue of vqcd_predict
, but it will use user callback provided to vqcd_init_async
to notify the user about its prediction. vqcd_lock_context_async
and vqcd_unlock_context_async
, as one can infer from their names, can be used to lock and unlock user context mutex.
Artifact detection/segmentation (artdet)
Originally, this module was intended to handle only artifact detection, but later it deemed reasonable to have segmentation as well. Now it does only the latter, which can also be used to emulate the former, albeit slower.
There are several segmentators in this module. The first is a simple one. It takes an image as its input and returns a mask (one channel / grayscale image) to highlight every artifact it has found in the image.
The functions artdet_create_segmentator
and artdet_delete_segmentator
create and delete it, respectively. artdet_predict
is there to run inference. artdet_create_mask
should be used to create an appropriate mask, with artdet_delete_mask
to clean it up later. artdet_errno
and artdet_errmsg
could be used to get an idea of which kind of error has occurred.
The second kind of segmentation is able to pinpoint different types of visual artifacts. It returns separate masks for noise, block and blur artifacts.
Again, artdet_multiple_create_segmentator
and artdet_multiple_delete_segmentator
manage the element. artdet_multiple_predict
uses inference. artdet_multiple_create_mask_array
and artdet_delete_mask_array
manage the mask array, which will be the result of computation. artdet_multiple_errno
and artdet_multiple_errmsg
get a hang of a possible error.
Last, but not least, there are three segmentators that produce mask arrays for artifacts specific to a particular algorithm type: AVC, MJPEG, Wavelett or HEVC. Their main differences are in the accuracy aspects and existence or lack of probability labels.
Several other important functions are the following.
To create and delete:
artdet_multiple_2_create_segmentator
andartdet_multiple_2_delete_segmentator
artdet_multiple_w_labels_create_segmentator
andartdet_multiple_w_labels_delete_segmentator
artdet_multiple_w_labels_2_create_segmentator
andartdet_multiple_w_labels_2_delete_segmentator
To make a prediction:
artdet_multiple_2_predict
artdet_multiple_w_labels_predict
artdet_multiple_w_labels_2_predict
To manage mask arrays:
artdet_multiple_2_create_mask_array
artdet_multiple_w_labels_create_mask_array
artdet_multiple_w_labels_2_create_mask_array
artdet_delete_mask_array
To understand errors:
artdet_multiple_2_errno
andartdet_multiple_2_errmsg
artdet_multiple_w_labels_errno
andartdet_multiple_w_labels_errmsg
artdet_multiple_w_labels_2_errno
andartdet_multiple_w_labels_2_errmsg
Person segmentation (perseg)
It takes an image as its input, and gives a mask, which highlights all humans on that input picture. The mask format may seem to be an unusual one: it’s 0x00's and 0x01’s only.
The interface of this module is pretty obvious and straightforward. perseg_create_segmentator
and perseg_delete_segmentator
to manage segmentator. perseg_create_mask
and perseg_delete_mask
to manage masks. perseg_predict
for inference. perseg_errno
and perseg_errmsg
to inspect errors that have occurred.
Video super resolution (VSR)
VSR takes an image as input, and gives an image as output. Its purpose is to obtain a picture with higher resolution using machine learning. The current ratio is 4:1 vertically and horizontally. It means that the output image will be 16 times bigger.
The interface of VSR is simple too. vsr_create_upscaler
and vsr_delete_upscaler
to create or delete upscaler. vsr_alloc_upscaled_image
and vsr_delete_upscaled_image
to manage upscaled image. Upscaled image is based on input one, thus user should not be bothered with calculations of size and memory allocation. vsr_errno
and vsr_errmsg
is there to identify an error cause.
Object detection (objdetect)
This one detects objects. It takes an image as its input and returns a variable amount of boxes to the user. Each box being a four number designation of a rectangle (x, y, width, height) with a class label on it.
Also, because of unknown number of boxes, this module is the only one that allocates memory on heap and dumps control of it onto the user.
objdetect_create_detector
and objdetect_delete_detector
are managing the detector. With objdetect_create_box_array
and objdetect_delete_box_array
to manage array of boxes and the user will not notice that something is off, but it is. One of the box array structure fields is assigned to a memory address allocated by the library. objdetect_predict
is there to use inference. objdetect_errno
and objdetect_errmsg
to look up info about the error, should one occur.
Conclusion
Right now ML SDK is available as a trial version upon request. It is undergoing further development and it is expected that its API may change. Follow its development on Deelvin website. You can see ML SDK live using ML Player. For inquiries please contact our Deelvin team.