Developing a productivity enhancement tool powered by on-device AI

Silviu Tudor Serban
Intel Software Innovators
6 min readFeb 20, 2019

Bepro.ai is a PC productivity enhancement tool that we are developing to help people who spend a significant amount of time doing work using their computers. Our outstanding goal is to grant professionals a previously unattainable level of self-assessment and provide the means to improve their performance and focus.

Maximizing achievements while minimizing time-span translates to more free time to spend with family and friends, to follow a hobby or to simply recharge.

Key user benefits

1. Improved time tracking — gaining a more granular view of how much effort a specific project or task really requires

2. Timeline of application usage — achieving a deeper understanding of the user’s workflow

3. Detection of productivity breaking points — identifying events that hurts daily throughput

4. Emotional state recognition — making sure the user is focused and in a state of well-being

In this article we will focus on the elements of Bepro.ai which allow it to capture richer user data, while simultaneously doubling-down on performance and privacy. We will cover features, implementation building blocks and deployment of Intel OpenVINO Toolkit and Intel Myriad X VPUs for optimized AI inference.

Smart Discover

Smart Discover is the component which governs the process of gathering and processing real-time data from multiple inputs via 3 modules.

The first of these modules is App Screen Time which can provide significant insight for the user, as it offers a wealth of data about the user’s application usage and breaks down the amount of time spent in each individual app.

App Screen Time does not, however, provide much in terms of context and that’s where our Computer Vision modules come into play.

Extracting richer data with Computer Vision

The Content Visual Understanding and Facial Emotion Analytics modules use Computer Vision to attain a new dimension of correlated data points:

  • The Content Visual Understanding module uses object recognition models to recognize the user’s screen content.
  • The Facial Emotion Analytics module uses a combination of face recognition, facial landmarks and emotions recognition models to provide a better understanding of the user’s physical and mental state.

A short example of how the correlated data fits together

Before getting to the technical implementation details, let’s take a simple use-case where using the App Screen Time module can provide the information that, for a given interval, the user is browsing the web using Google Chrome. That’s very generic and lacks any significant meaning. However, by adding the information from the Content Visual Understanding and Facial Emotion Analytics module to the mix the picture becomes much clearer: the user is browsing the web using Google Chrome, the screen content is dominated by cats and the user’s facials expression reveals happiness.

Given this level of information, it becomes a lot less difficult to appraise productivity levels and to infer what the user is actually focusing on for given periods of time (in this use-case: watching cat videos).

Building Computer Vision components with Intel OpenVINO Toolkit

When we began work on Bepro.ai we decided to go for a no compromise approach towards performance and data safety, thus our solution was to have everything running locally on the user’s computer.

By building with the Intel OpenVINO Toolkit we got access to a robust set of Computer Vision building blocks together with comprehensive tools for high-performance Deep Learning inference.

Introduction to Intel OpenVINO Toolkit

OpenVINO is a platform for computer vision inference and deep neural network optimization which focuses on high-performance AI deployment from Edge to Cloud and is compatible with both Linux and Windows.

It provides optimized calls for OpenCV and OpenVX and a common API for heterogeneous inference execution across a wide range of computer vision accelerators CPU, GPU, VPU and FPGA.

A complete set of guides for getting up and running with OpenVINO is available here: https://software.intel.com/en-us/openvino-toolkit/documentation/get-started

Furthermore, an open model zoo containing pre-trained models, demos and a downloader tool for public models is provided with the distro and at the following repository: https://github.com/opencv/open_model_zoo

Implementing Content Visual Understanding and Facial Emotion Analytics with the OpenVINO Toolkit

A very straightforward way to implement Content Visual Understanding is to use a public pre-trained image classification model such as SqeezeNet https://github.com/DeepScale/SqueezeNet.

The OpenVINO deployment tools provide a model downloader that simplifies the process of obtaining the SqueezeNet Caffee model. By using the model optimizer the Caffee model is converted to a model compatible with the inference engine.

cd C:\Intel\computer_vision_sdk\deployment_tools\model_downloader

python downloader.py — name “squeezenet1.1” — output_dir “C:\models”

cd C:\Intel\computer_vision_sdk\deployment_tools\model_optimizer

python mo.py — input_model “C:\models\classification\squeezenet\1.1\caffe\squeezenet1.1.caffemodel” — output_dir “C:\models_optimized\FP16” — data_type FP16

python mo.py — input_model “C:\models\classification\squeezenet\1.1\caffe\squeezenet1.1.caffemodel” — output_dir “C:\models_optimized\FP32” — data_type FP32

The classification_sample from the inference engine code samples works as a good starting point for running the model and recognizing objects in images.

/* Run Model on CPU */
classification_sample.exe -i image.jpg -m “C:\models_optimized\FP32\squeezenet1.1.xml” -d CPU

/* Run Model on Neural Compute Stick */
classification_sample.exe -i image.jpg -m “C:\models_optimized\FP16\squeezenet1.1.xml” -d MYRIAD

Next, the interactive_face_detection_demo from the inference engine samples can be used as a foundation for implementing Facial Emotion Analytics, as it provides functionality to run multiple facial analysis models and display results in real-time.

interactive_face_detection_demo.exe -m face-detection-adas-0001.xml -m_em emotions-recognition-retail-0003.xml -d CPU

For Windows users, the samples can be built with MS Visual Studio 2015/2017 by using the create_msvc_201x_solution scripts located at C:/Intel/computer_vision_sdk/inference_engine/samples

A complete walk-through for setting up OpenVINO on Windows is available here: https://software.intel.com/en-us/articles/OpenVINO-Install-Windows.

Intel Neural Compute Stick 2 Technology: Accelerating Deep Learning Model Inference

The Intel Neural Compute Stick 2 is a high-performance low-power vision processing unit with compute cores and a dedicated hardware accelerator for deep neural network inference.

The Intel Neural Compute Stick 2 can be deployed to improve inference performance with a minimal power consumption footprint.

The interactive_face_detection_demo below provides an example for concurrently running inference on two processing units, in this case the CPU and NCS2 (MYRIAD).

interactive_face_detection_demo.exe -m face-detection-adas-0001.xml -m_em emotions-recognition-retail-0003.xml -m_lm facial-landmarks-35-adas-0001.xml -m_hp head-pose-estimation-adas-0001.xml -d CPU -d_em MYRIAD -d_lm CPU -d_hp MYRIAD

Use the following link to learn more and get up and running with the Intel Neural Compute Stick 2: https://software.intel.com/en-us/neural-compute-stick

Conclusion

In this article we’ve discussed key advantages of using the Intel OpenVINO Toolkit as a building block for developing our solution.

We strongly believe in the benefits of running AI locally, particularly in terms of performance, latency and most importantly data privacy.

Make sure to stay tuned and follow our progress on Intel Developer Mesh: https://devmesh.intel.com/projects/bepro-ai

About the developers

Leveraging a background in Computer Vision, Artificial Intelligence and Internet of Things, Helios Vision develops forward-thinking projects such as HELIOS and ASTRO.

--

--