Computer Vision and Image Processing

Richard Sheng

Published in

The Startup

9 min readDec 22, 2020

Though these terms are related and often used interchangeably, the concepts are different. Here’s how…

Photo by Camila Quintero Franco on Unsplash

The Age of Selfie Filters

If you have used Instagram, or any photo sharing app, you have likely seen and/or used an image filter. These filters are enabled through Image Processing techniques. In popular social apps, you probably have also come across the ability to modify your live selfie, such as placing bunny ears or swapping faces with someone. These entertaining and delightful experiences are powered by a branch of Artificial Intelligence most often referred to as Computer Vision, which allows computers to make sense of digital images. Image Processing and Computer Vision are different concepts, but very much go hand-in-hand. This article aims to provides overviews of the concepts and how they are utilized.

What is Image Processing?

Image processing involves two methods, namely analog image processing and digital image processing. Analog image processing comprises the technique to process photographs, printouts, and different hard copies of images. In contrast, digital image processing involves manipulating the digital image for generating information with the help of complex algorithms.

The input for an image processing task is an image. However, it is essential to note that analog image processing always requires an image input. Still, digital image processing may include images or information associated with an image, such as features or bounding boxes, etc. Ideally, image processing is used for the following purposes.

Image visualization is the representation of the processed data in the form of visual output for better understanding. This task is mainly done for objects that are not easy to detect in an image.
Improving the quality of the image by using image sharpening and restoration.
Image search is associated with retrieving the image source from an investigation conducted by an image search engine.
To perform classification to distinguish different objects and locating their position in an image.

Essential Steps in Digital Image Processing

1. Image Acquisition

Typically, image acquisition involves capturing an image by a sensor such as a camera. If a non-digital form of output exists, it is converted to a digital form using an analog to a digital converter. This process also includes pre-processing, such as image scaling.

2. Image Enhancement

The process that is related to image manipulation to achieve relevant results for specified tasks to be performed is known as image enhancement. Ideally, this process relates to image filtering by performing tasks such as noise removal, contrast adjustment, brightness, and sharpening of the images for improving the quality of the image that were captured originally.

Deblurring of an image with the use of image enhancement technique (Wiener Filter). Source

3. Image Restoration

Image restoration involves improving the appearance of an image that may have been degraded by mathematical and probabilistic models. An ideal example would be the reduction of blurring in an image.

4. Color Image Processing

The extraction of features from an image with a color-based approach.

5. Wavelet and Multi-Resolution Processing

It involves representing images in terms of various resolution available that is generally used for image compression. This is useful for image data compression as well.

An illustration of Wavelet transformation (Haar Wavelet). Source

6. Compression

Reducing the storage space required to save an image or the bandwidth required for displaying an image is done with the help of compression. The techniques that involve image size reduction and adjustment such that the quality is least deteriorated falls under the image compression procedure.

7. Morphological processing

The extraction of essential components in an image describes the shape of a particular object in an image. Some of the typical morphological operations are erosion and dilation for producing image attributes.

Morphological operation results (imtophat transform) Source

8. Segmentation

Image segmentation is one of the necessary procedures under image processing that involves the partitioning of the image into multiple segments. This procedure allows to locate objects in an image and identify the boundaries of the objects. An important point to note is that the segmentation’s accuracy will lead to better recognition and classification accuracy.

Segmentation of regions according to color values, shapes, and textures. Source

9. Representation and Description

The representation is associated with displaying image output in the form of a boundary or a region. It can involve characteristics of shapes in corners or regional representations like the texture or skeletal shapes.

On the other hand, the description is most commonly known as feature selection, responsible for extracting meaningful information from an image. The information extracted can help to differentiate between classes of objects from one another accurately.

10. Object Recognition / Image Labeling

The process of assigning labels to an object depending on its description for classification purposes. This is a very important step for Computer Vision. To train models, a large enough corpus of images need to be processed and labelled, so that the Computer Vision model can be utilize to detect similar objects in other images.

Example of tagging/labeling of image. Source

Many companies now offer data labeling services, such as ClickWorker, CloudFactory, etc.

Computer Vision

As described above, Image Processing generally refers to the application of algorithms to images. The purpose of such algorithms are often meant to improve the quality of the image or to alter it for a different visual effect. However, Image Processing is also very important to prepare images for Computer Vision models, such as applying segmentation or labeling known objects.

Computer Vision generally refers to the technologies involved in allowing computers to make sense of images. The most common application of this is image recognition, which is a process that enables the identifying of objects and image features. Image recognition is used in numerous applications today, such as medical imaging, security surveillance, facial recognition, identification of logos, and buildings, to name a few. However, for these models to work, the images need to first be labeled, segmented, or have other processing steps taken as mentioned prior.

Today, Computer Vision applications have achieved tremendous success, and some of the most notable use cases are outlined below:

Defect Inspection

Image recognition has contributed positively to the manufacturing units. The primary task of image recognition has been to identify defective items during the manufacturing process. The ability to quickly examine thousands of defective items in the assembly line speeds up the overall process and leads to efficiency in the mode of operations.

Image Classification

Perhaps, the most crucial part of image recognition that has been part of many types of research is image classification. The possibility of assisting doctors in finding a region of interest for detecting and predicting a particular disease has been part of several researches in recent years. Image classification has been a critical contributor in e-commerce industries to enhance the user experience with quick search possibilities. Image classification allows categorizing images as per a specific image content. It is part of most of the recommendation systems and image retrieval engines that we use today.

Autonomous Driving

The state-of-the-art technology of autonomous driving is yet to reach its full potential before being allowed commercially. However, to have pedestrian detection capability and to stop when a stop sign is being shown has been possible to incorporate image recognition into computer vision techniques.

Robotics

Image recognition has been part of many robotics-based projects used to train them to identify objects for better navigation and detect objects that may be found in its path.

Text Detection

Text detection is yet another promising contributions with the help of image recognition. The detection of text and characters from an image such as a photograph that can include a street sign or a traffic sign has been a possibility with text detection. Cloud Vision by Google is one of the prominent companies in the field of text detection.

Facial Recognition

With the emergence of AI, facial recognition has been a possibility. From securing the device to surveillance, facial recognition has a strong demand in the market due to its potential. However, several experts are questioning the privacy aspects of the technology. Nevertheless, it is a fact that every technology has some limitations. Therefore, the proper implementation of facial recognition techniques will result in life essentials, such as traffic and city surveillance.

e-Commerce

Shoppers can now search for similar products by uploading images of existing products they have, or products they want to find complementary styles to. This requires the transformation of the image into a visual embedding, where then the recommendations are either products similar to the one uploaded or the ones known to be complementary.

Getting Started with Image Recognition Models

Some of the most valuable packages to utilize for Computer Vision and Image Processing include:

imutils
OpenCV
Dlib
Scikit-learn
Scikit-image
TensorFlow
Keras
Mxnet
Fastai
Pytessarct
PyTorchCV

You can also explore prebuilt cloud services by using:

Thanks to the continued rapid evolution of open source software, Data Scientists and engineers and get started on using Computer Vision easily. To get started with an Image Recognition model detecting cats and dogs, please refer to this article.

A note on Convolutional Neural Networks

It’s difficult to talk about Computer Vision without acknowledging the importance of Convolutional Neural Networks. A convolutional neural network (CNN) is a type of multi-layer neural network. The most effective technique for computer vision has been the use of neural networks. CNN is intended to reduce the computational time while performing image processing tasks. It is designed to act similarly as the brain interprets the image. In the last few years, there has been a significant contribution to deep learning technologies for image recognition, learning patterns, and classification to improve image processing systems. Medical imaging is one of the prominent areas where CNNs have accomplished success. Thus, there are limitless researches that have focused on several medical imaging problems using various CNN models.

Typically, CNN consists of the following.

Input Layer
Convolutional Layers
Max Pooling Layers
Activation Layers (ReLU)
SoftMax Layer

The neurons and weights in a CNN architecture are trained with the dataset for a specific problem. The neurons are responsible for producing an output. The hidden layers in a CNN architecture forward input in single vectors, whereas the fully connected layer is the output layer that produces an image’s classification results. The final fully connected layer of a CNN consists of the loss function. Convolutional layers use convolutional filters, which activate specific features from the image.

On the other hand, the pooling layers reduces the number of parameters that a network has to learn to produce a simplified output. ReLU enables faster training for the network. The softMax function is used for classification purposes. Several types of CNNs are used for different image processing tasks, such as Mask R-CNN, Faster R-CNN, Alex Net, Google Net. These models have been widely used in academic researches.

Convolutional Neural Network Architecture. Source

Conclusion

Hopefully next time you do a face swap or apply a cool image filter, you will appreciate the sophistication of Computer Vision models and the incredible engineering it took to make it all happen in real-time. Moreover, aside from creating delightful experiences, such technologies are create tremendous value in life-saving solutions in the fields of medicine and sciences. As the fields of Computer Vision and more generally, Artificial Intelligence, continues to evolve, it will be exciting to see what new applications we develop.

Richard Sheng is the Global Director of Data Science & Analytics at Z-Tech, part of Anheuser-Busch InBev, bringing data-driven technology solutions to small businesses around the world. Richard has 12+ years experience developing data products for startups and Fortune 500 companies.