Computer Vision Tools And Libraries

For Applied AI Research And Development

Editorial @ TRN
The Research Nest
9 min readApr 27, 2020

--

Ever wondered how a machine or a robot can recognize images? All thanks to a multidisciplinary field of research/study called Computer Vision. But what exactly is computer vision? Just like how human eyes help us see and react to the world around us, in a computer, several deep learning, and machine learning algorithms and hardware components like cameras and sensors work together to achieve this. Computer vision enables a machine to perceive, classify, recognize, and react to objects around it.

Researchers have developed a wide range of tools, and software libraries to power various applications and projects with such computer vision capabilities. Here, we cover some of the most popular ones out there. The focus of this article is to make you aware and familiar with these technologies and redirect you to the documentation that will help you in learning how you can use them in your own applications.

With that being said, let’s move on to our list!

OpenVINO

Nearly 80 percent of the commercial PCs have Intel processors in them. It is not a shocking fact as it creates an 80 percent chance that you are running an Intel processor of some kind on the machine you are using right now.

The question comes, why Intel? Well, why not? They are pretty good at what they do: Create awesome processors.

But that’s not the only thing Intel works upon. Intel is pretty ahead in its demand centric models by creating a toolkit for a stream of AI applications, mainly based upon Computer Vision.

Intel released its OpenVINO toolkit on May 16, 2018. It was written in C++ and Python.

VINO stands for Visual Inference and Neural Network Optimization. Those are some big terms but honestly, if you know them, you would have completely understood what that toolkit does at one glance.

So, let’s start. Inference. It means doing some kind of model usage. Mostly using the trained model for prediction. Visual inference deals with using a trained model to predict things in visual reference, as in Computer Vision. Next, Neural Network Optimization. It is pretty straight forward, optimizing the neural network used for training your network initially.

That’s it. That’s all you need to know for understanding what OpenVINO does. It takes a trained model and then optimizes it and can directly provide tools to use that model for getting fast visual inferences. Simple.

So, what can you do with it?

So here’s the catch, OpenVINO always asks you for a pre-trained model. It doesn’t provide you a toolkit so that you can develop your models, or code the models. It comes on a much later step. The toolkit does consist of many pre-trained models, under the toolkit component “Model Zoo”, which you can use directly. But it is not a model developing toolkit. It is a model optimizing and deployment toolkit, from an application point of you.

So, you may use this toolkit for,

  • Optimizing the algorithms.
  • Deploying the algorithms seamlessly, anywhere, and on any platform.
  • Using the model zoo to just use the model which works best for your use case.
  • Using the inference engine to get faster inferences from your model.
  • Optimizing the lower level image processing capabilities by using OpenCV.

Excited to try yourself? Head down here to get started.

OpenCV

If we talk about computer vision libraries and don’t talk about OpenCV, it will be a shame. OpenCV is an image processing library originally authored by three corporations. Intel, Itseez (now acquired by Intel) and Willow Garage (formerly a robotics research lab). It has been in the industry since 2000. The library is written majorly in C++, with some modules written in C.

Unlike OpenVINO, OpenCV has far more applications and use cases to it. It can perform the pre-deployment and pre-model training steps with more emphasis on image processing.

OpenCV can be used for,

  • Pre-processing tasks, like scaling, noise-removal, and other formatting tasks on Image/Video.
  • It can be used to implement the pre-optimized 2500 models which are included in the library and to modify those according to your use cases.
  • It can be used to develop state-of-the-art models in Computer Vision as well as Machine Learning.
  • OpenCV can be used to develop models of various categories like, facial detection and recognition, object detection and tracking, 3D model extraction, and almost any other application that you can think of.

OpenCV is amazing because of its vast community consisting of over 47000 developers who have an answer for almost any problem that you may encounter. It even has its own Q&A forum.

It is not only used by some great start-ups but even big tech giants like Google, Yahoo, and Microsoft.

One of the many cool use cases of OpenCV is facial recognition. OpenCV, in its extensive function-rich library, gives you the capabilities to perform the pre-processing tasks seamlessly as well as the prediction algorithms. Here you can not only use the object detection algorithm but also the object tracker, to track the face in a video stream. OpenCV even has functions for you to easily set up and test the model on a live stream as well as on a pre-recorded video.

Ready to get started? Head down here and punch in your first model.

Cloud Service Platforms

If we are talking about the Computer Vision frameworks and if we don’t include the tech giants, it may happen that Google may not even show this article on its search engine. So, let’s please them.

Google has its product named Vision AI, which primarily includes two products: AutoML Vision and Vision API.

AutoML is the tool which makes the computer vision enthusiasts stupid, quickly. It has an easy GUI which can be used to train your custom CV algorithms. What do you need to do? Nothing. Just upload the images and select a model to train upon. It’s that simple!

Not just this one, Amazon, has its own Amazon Rekognition tool which helps you to put deep learning models into your applications easily.

Microsoft has its Azure cloud services through which it runs the Computer Vision API to process, analyze, and develop Computer Vision models on the cloud.

IBM has two of their brilliant CV engines, namely Watson Visual Recognition and PowerAI Vision (This is like mocking Microsoft analytics, but anyway).

Watson VR can be used for quickly analyzing images and videos for classification and other ML related tasks. PowerAI enables you to train highly accurate models for your custom applications with no deep learning expertise.

All of the cloud solutions are built to enable you to easily develop and deploy Computer Vision models, without much technical expertise. The catch? You need to pay for those services (but many do have free tiers for experimentation and minor use cases).

The services are used by millions, to develop various applications, including the companies own products developed over these technologies. Looks like no tech giant is backing out in this race to provide computer vision services.

You can easily check these links to get started with these products. They are documentation for almost anything you want to do with them.

NASA’s Vision Workbench (VWB)

We never think about it, but NASA is very much in need of image processing libraries, and NASA being NASA, may prefer using a library of its own.

Hence, the Ames Research Centre’s Intelligent Systems Division developed the Vision Workbench library, which is primarily written in C++.

Unlike other libraries, this is not as capable to achieve the state-of-the-art performance models and development. But rather form a base for applied research on Computer Vision.

VW was developed for space images analysis and enhancements. And also, for research purposes and robotics.

VW is mainly used for,

  • Image analysis tasks
  • Image enhancements and development functions
  • Transformations of projection files and images (primarily on spatial images)
  • To make concise and compressed models to be deployed on space robotic engines.

To try it out you can easily get started from its official documentation.

Nvidia VisionWorks

We all have heard about Nvidia for providing with the most awesome GPUs for training your computer vision models faster. But Nvidia also has its library for developing the computer vision models, named VisionWorks.

With VisionWorks you can easily develop your computer vision pipelines using the toolkit’s most amazing and simplified modules, which they call the “primitives” (Nvidia must have said: Pun seriously intended).

By using VisionWorks you can develop some interesting applications like-

  • Robotics, developing localization algorithms, and faster trackers.
  • Augmented Reality based applications, faster real-time rendering of objects.
  • Intelligent video analytics, by using their analysis primitives.

One of the best implementations was the autonomous driving application using the pipeline developed from Nvidia’s VisionWorks. It gave the developers a direct CUDA vision programming interface, which led to ease in the development of faster object detectors, which is one of the key features in Auto driving. Nvidia VisionWorks also supplies a thread-safe API, which made it easier to track and analysis multiple scenes together, and faster.

The toolkit can help you unlock the power of your Nvidia processor to the fullest for your Computer Vision applications.

You can get started with Nvidia VisionWorks here.

With that, we conclude our list. Remember that, you don’t necessarily need to have an in-depth knowledge of how these libraries work (especially the cloud services and ready-made models). Definitely not more than knowing how to integrate them with your application. Once you are clear with the inputs and outputs that these libraries need and can give, you will be able to identify where you can use them practically.

And at the same time, don’t get overwhelmed with all the choices you have. Pick one tool and explore it in-depth based on your needs and requirements.

References and further reading

Editorial note-

This article was conceptualized by Aditya Vivek Thota and written by Dishant Parikh of The Research Nest.

Stay tuned for more such insightful content with a prime focus on artificial intelligence!

--

--