Client-side AI processing in video applications

Published in

Vectorly

3 min readMay 18, 2021

You can now use AI to process images/video in real-time on everyday devices

AI has been used in image and video processing for some time, but it's only been in the last few years that AI has been used for real-time video applications on everyday consumer devices.

It started with face-detection in camera apps and AR filters for Snapchat/Messenger. More recently, video-conferencing platforms have released virtual backgrounds and background noise-removal.

Developer tools

There are even developer tools for AI based video post processing. A few examples include:

FaceAR — SDK for adding AR Filters into any app
Body Pix — Javascript Library for background segmentation
NVIDIA Maxine — SDK for various AI filters on desktop
Anime4K — GPU shaders for AI and non AI based image/video upscaling algorithms

Use cases

The main use cases for client-side AI video processing have focused on video-conferencing, including:

Virtual Backgrounds & Background Blur
AR Filters
Video De-noising

Zoom, Google Meet and Microsoft Teams were quick to deploy these features, and in the post-covid work from home era, many startups are beginning to integrate these features as well.

A second set of use cases that I've seen has revolved around video streaming- specifically using client-side AI up-scaling to improve video quality for the end-user. This comes in the form of

Smart TVs
Libraries for real-time AI upscaling (like Anime4K, or Vectorly)

The goal is to provide better quality (4K) video quality where no 4K original exists.

Problem

It seems that client-side AI processing is just around the horizon, especially for WebRTC video conferencing.

As different companies & projects start to think about it, there's a few options for companies to consider:

Build this stuff yourself
Ue open source projects like BodyPix and Anime4K
Use existing SDKS like FaceAR and NVIDIA Maxine

In the long run, the "do-it-yourself" option seems like it would be a collective wasted effort if thousands of companies invested in building more or less the same feature.

Open source projects like BodyPix and Anime4K are fantastic, but they still require some set up and configuration for handling basic use cases like virtual backgrounds.

NVIDIA Maxine seems the closest to an "AI Filters SDK", however at least for now, there is now Web or Mobile implementation.

Streamlining AI filters

Most of these tools are doing the same thing:

input video → [filter] → output video

It would make sense if there were something streamlined, whereby you could just just choose a filter, and a video to apply it to

import VirtualBackground from 'ai-filters';const video = document.getElementById('video-track');const backgroundFilter = new VirtualBackground(video, {
   image: 'user-xyz-background.png'
});

This is just an example — in reality you could handle multiple filters, and provide lower level (frame by frame) control, and provide tools for monitoring performance and compute usage.

The general idea would be a common pipeline for incorporating AI filters into a video processing workflow, with easy access to a repository of available filters.

Open source

Such a library would need to be able to deploy existing models from Tensorflow/Pytorch, as well as to enable organizations to create/train/deploy their own custom models.

To me, such a project would make most sense as an open-core project, with a a core open source library connected to a central managed repository.

I believe this could be monetized / self-supported via:

A hosted, centrally managed repository of AI models which enjoys support and constant updates
Paid features, such as SaaS tools for training models on custom content

Thoughts?

While this is highly exploratory, we've decided to build an SDK for doing exactly this. If you're interested, please let us know here!

If there's any interest in contributing to this as an open source project, feel free to reach out at sam@vectorly.io

If you have any other thoughts or feedback, feel free to comment or ping me via email!

-Sam

Client-side AI processing in video applications

Written by Sam