UnitX AI Magazine
Published in

UnitX AI Magazine

Applications of Deep Learning for real-time Object Detection

The global computer vision market was valued at $27.3 Billion in 2019 with a CAGR of 19% from 2020 to 2027 [1]. Object detection is one of the core computer vision tasks that has a broad range of industrial applications such as -

  • Cancer detection in radiology-based images [Healthcare]
  • Detection of manufacturing defects, factory floor surveillance [Manufacturing]
  • Detection of seat belts, parking in restricted areas [Public Safety]
  • Stock level analysis and inventory management [Retail]

What is object detection?

There are four types of visual recognition tasks in computer vision. First, image classification, which is the assignment of labels to images, for example, labelling cows in a picture of farm as cows. Second, object detection, which is to not only label the cow but also to locate the cow using a bounding box. Third, semantic segmentation, which predicts the labels for each pixel of an image without differentiation between objects with the same label. Fourth, instance segmentation, which involves labelling as well as segmentation.

Different types of computer vision tasks

What is Deep Learning?

Deep learning is a subset of machine learning, that can process data from a very wide variety of sources. Compared to traditional machine learning, it requires lesser data preprocessing by humans and can often produce more accurate predictions from the data. In deep learning, interconnected layers of software-based calculators known as neurons, form a neural network. There are layers of such neurons, hence the word “deep” neural network. The network ingests data and processes them through each layer of the neural network, which each layer learning increasingly complex features of the data.

Once a deep neural network has learned how to make determinations from input data correctly, it can then use what it has learned to make determinations about new data. For example, once it learns what an object looks like, it can recognize the object in a new image. In other words, a deep neural network that has learned how to recognize cows, can quick detect cows in new images.

How a “Neural network” a.k.a “AI model” works: The network processes signals by sending them through a network of nodes analogous to neurons. Signal passes from one to another along links. “Learning” improves the outcome by adjusting the weights that amplify or damp the signals each link carries. Nodes are typically arranged in a series of layers, in other words, a “deep” neural network. Image from Waldropp Mitchell, PNAS, 2019, 116(4)

Technical detail: How does deep learning for object detection work?

Sequence of tasks involved in object detection
Use of a deep neural network for object detection

Recent trends in applications of deep learning for object detection

Overall, the accuracy and performance of state-of-the-art deep learning models reported in 2019 are significantly higher than those of the previous years. Higher accuracy has a profound impact on application of the technology in medical imaging as well as surveillance systems. Improvement in performance means results can be inferred much faster on modern edge-based computing systems, paving the way for applications such as real-time drone based video analytics.

Specifically, the new improvements to deep learning models came by way of the following advancements:

  1. Face Detection Mean Average Precision went above 90%

Face detection is a computer vision problem to detect human faces in images, which is the first step to applications such as face verification, face alignment and facial recognition. Face detection is different from generic object detection in two ways. First, the range of scale of objects is larger in face detection and blurring is more common in face detection. Second, face detection has a single target and depends strongly on the structural characteristics of the face.

WIDER FACE is currently the most commonly used benchmark for evaluating face detection algorithms. The high variance of face scales and large number of faces per image make WIDER FACE the hardest benchmark for face detection, with three evaluation metrics (easy, medium and hard).

In 2019, PyramidBox++ [2], VIM-FD [3], ISRN [4], Retinaface [5], AlnnoFace [6] and RefineFace [7] all reported mAP scores of greater than 90% for the easy, medium and hard metrics. This is a significant improvment over the previous years.

2. Recent trends in pedestrian detection

CityPersons is a new and challenging benchmark for pedestrian detection. The dataset is split into different subsets according to the height and visibility level of the objects, and thus it’s able to evaluate the detectors in a more comprehensive manner. In 2019, the APD model reported a 30% improvement in the object detection performance over 2018 [8].

These and many more applications of this concept that can help enterprises in various types of monitoring and analysis. ‘Object Detection in Videos’ is the first topic to be covered in our upcoming workshop “Applied AI for Beginners” where we will train engineers from enterprises about the basics of AI along with hands-on implementation.

To participate in our future e-workshops and learn more about AI products & solutions for your enterprise (in any industry), please visit unitx.io & feel free to contact us.

#ComputerVision #Artificial Intelligence #MachineLearning


[1] Xiongwei Wu Doyen S, Steven C.H. Hoi, Recent advances in deep learning for object detection, Neurocomputing, 396 (2020).

[2] Z. Li, X. Tang, J. Han, J. Liu, Pyramidbox ++ : high performance detector for finding tiny face, 2019. arXiv: 1904.00386 .

[3] Y. Zhang, X. Xu, X. Liu, Robust and high performance face detector, 2019. arXiv: 1901.02350 .

[4] S. Zhang, R. Zhu, X. Wang, H. Shi, T. Fu, S. Wang, T. Mei, S.Z. Li, Improved selective refinement network for face detection, 2019. arXiv: 1901.06651 .

[5] J. Deng, J. Guo, Y. Zhou, J. Yu, I. Kotsia, S. Zafeiriou, Retinaface: single-stage dense face localisation in the wild, 2019. arXiv: 1905.00641 .

[6] F. Zhang, X. Fan, G. Ai, J. Song, Y. Qin, J. Wu, Accurate face detection for high performance, 2019. arXiv: 1905.00641 .

[7] S. Zhang, C. Chi, Z. Lei, S.Z. Li, Refineface: refinement neural network for high performance face detection, 2019. arXiv: 1909.04376 .

[8] Y. Hu, J. Xie, J. Zhang, L. Lin, Y. Li, S.C. Hoi, Attribute-aware pedestrian detec- tion in a crowd, 2019. arXiv: 1910.09188 .




UnitX provides AI solutions, products & services for different industries

Recommended from Medium

Why is so much compute needed for Autonomy?

Breyer Capital’s Healthcare AI Investment Thesis: Learnings and Predictions (December 2021 Update…

AI and Customer Service: Now

Artificial Intelligence in Italy with Deep Learning Italia Chairman Matteo Testi

Operations Intelligence and OpsVeda…

Smarter interactions, fewer accidents: How we built Safety Feed

Talking Tech and Talking With Tech

Why Line of Business Drives More Digital Talent Demand

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Kiran Narayanan

Kiran Narayanan

Founder & CEO at UnitX, Science & Tech Enthusiast, Hobby Chef, Traveller

More from Medium

Automatically Find Label Issues in Image Classification Datasets

Examples of label errors in the MNIST dataset — an image depicting “3” but labeled as “5” and another image depicting the number “7” but labeled as “4”.

Using MPRNet and Autoencoder in denoising ShabbyPages

Tensorflow Similarity Model — Finding similar items on Fashion MNIST (Part 2)

LogBERT - Log file Anomaly Detection using BERT: An Explainer

logBERT Architecture