Tools to help you dive into Computer Vision

Computer Vision is currently one of the most emerging fields in the industry and gaining lots of attention as it is gradually integrating into real-life applications, from social networks, mobile apps, and self-driving cars.

While there are still open research problems to be solved in the field, but many open source tools are available for developing research or industrial applications. Some areas such as Image Processing have very mature stable libraries such as OpenCV, BoofCV, etc.. Other areas are still open for progress like Tracking and Video Stabilization.

Here are the most popular libraries and tools used in the Computer Vision community nowadays:

1) Image Processing

Processing Images is about applying Mathematical operations on the images or videos.

The most popular and well documented library for general purpose image processing. Released under a BSD license and hence it’s free for both academic and commercial use. It has C++, C, Python and Java interfaces and supports Windows, Linux, Mac OS, iOS and Android.

An open source Java library for real-time computer vision and robotics applications. Written from scratch for ease of use and high performance. Its functionality covers a wide range of subjects including, optimized low-level image processing routines, camera calibration, feature detection/tracking, structure-from-motion, and recognition. It has been released under an Apache 2.0 license for both academic and commercial use.

NASA Vision Workbench
A general purpose image processing and computer vision library developed by the Autonomous Systems and Robotics (ASR) Area in the Intelligent Systems Division at the NASA Ames Research Center. VW has been publicly released under the terms of the [NASA Open Source Software Agreement][nosa]. The Vision Workbench was implemented in the C++ programming.

SimpleCV is an open source framework for building computer vision applications. With it, you get access to several high-powered computer vision libraries such as OpenCV — without having to first learn about bit depths, file formats, color spaces, buffer management, eigenvalues, or matrix versus bitmap storage.

2) OCR

Optical Character Recognition is about converting images to text.

Released under Apache 2.0 license, Tesseract has unicode (UTF-8) support, and can recognize more than 100 languages out of the box, It can be trained to recognize other languages.

3) Machine Learning Tools

Machine learning is about analyzing data and deriving insights from it based on applied algorithms.

Modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real world problems.

An open source Python library used for scientific computing and technical computing. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal and image processing, ODE solvers and other tasks common in science and engineering.

4) Deep Learning Tools

Deep learning is a branch of Machine learning, its algorithms is applied in successive layers where each layer uses the output of the previous one.

A library for numerical computation using data flow graphs, originally developed by researchers and engineers working on the Google Brain team within Google’s Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research.

A Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently

A deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and by community contributors. is released under the BSD 2-Clause license.

A scientific computing framework with wide support for machine learning algorithms that puts GPUs first. It is easy to use and efficient, thanks to an easy and fast scripting language, LuaJIT, and an underlying C/CUDA implementation.

A high-level neural networks library, written in Python and capable of running on top of either TensorFlow or Theano. It was developed with a focus on enabling fast experimentation.

5) Segmentation

Segmentation is about partitioning a digital image to multiple segments to help in analyzing and identifying the objects in the image.

SLIC Superpixels
An algorithm that clusters pixels in the combined five-dimensional color and image plane space to efficiently generate compact, nearly uniform superpixels.

6)Multi-View Geometry

MVG is about understanding the real world given several images of the same scene.

A library for computer-vision scientists and especially targeted to the Multiple View Geometry community, released under Mozilla Public License Version 2.0

7) Visual Odometry

Visual Odometry is about determining the position and orientation of objects by analyzing the camera images.

A very fast cross-platfrom (Linux, Windows) C++ library with MATLAB wrappers for computing the 6 DOF motion of a moving mono/stereo camera.

8) Scene Reconstruction

Scene Reconstruction is about computing a 3D Model of a scene

A GUI application for 3D reconstruction using structure from motion (SFM) written in C++.

An open source system for processing and editing 3D triangular meshes. It provides a set of tools for editing, cleaning, healing, inspecting, rendering, texturing and converting meshes. It offers features for processing raw data produced by 3D digitization tools/devices and for preparing models for 3D printing.

A structure-from-motion (SfM) system for unordered image collections (for instance, images from the Internet) written in C and C++. , distributed under the GNU General Public License

9) Video Tracking

Video Tracking is the process to locate a moving object(s) using a camera video stream

A general-purpose library for markerless tracking that provides a user-friendly high-level application programming interface (API) for the widest variety of methods and applications. Implemented in C++ providing multi-threading and GPU-accelerated capabilities for real-time efficiency.

10) Video Stabilization

Video stabilization is the process to remove undesirable shakes and jitters from a video stream.

A video stabilization library which can be plugged-in with Ffmpeg and Transcode. Developed under GPL