Computer Vision: What Can I do with it?

Akindele Michael
5 min readJan 20, 2022

--

Computer Vision, according to IBM, is field of artificial intelligence (AI) that enables computers and systems to derive meaningful information from digital images, videos and other visual inputs — and take actions or make recommendations based on that information. Pretty straight forward, right?

Computer Vision is a known field now and is steadily receiving interest at an high rate from Engineers, Scientists, Enthusiasts and even entrepreneurs from around the world. Maybe it’ s because of the issues it can now solve that were viewed humanly impossible or could solve in the near future. Computer Vision can solve some of our problems to a large extent and daily, people are thinking “what can I do with this technology?” but in order to know what we can innovate with this technology we must have an idea of the kind of tasks that can be done with it.

Face Image from istockphoto

Computer Vision Tasks

The following are some tasks you can use computer vision for:

Object Recognition/classification

Object Detection

It’s problem statement could be like: Given an image A, does the A contain an image of an orange. Here we are searching for an orange in the image which really just sums up object recognition. A computer vision technique for identifying objects in images or videos.

Object Localization(detection)

Here, we have to know if an object is present or not and also localize where the object is in the image or video. This technique can be used to pinpoint the particular position of an object in the image or video. Object localization refers to identifying the location of one or more objects in an image and drawing abounding box around their extent. It is also used in human detection.

Semantic Segmentation

Semantic Segmentation

The algorithms start to look scary, thinking straight out by themselves. Here, the computer knows where and which class each pixel in an image belongs too. It is the task of assigning a class label to every pixel in the image. It understands what it sees.

Face recognition

Face recognition

Heard of it before? I bet so. Should I say it is the act of the algorithm detecting the required face? This is a common technology used in most devices for some years now. The Machine Learning Mastery describes it as the problem of identifying and verifying people in a photograph by their face. In this area, it has been boasted that this deep learning algorithms are better than humans in this aspect. Could you perfectly count all the people in attendance for a football match with your hands?

Under face recognition, we have facial expression recognition used to explain how a person is feeling given the image or video. These are used in self-driving cars to know if a person is too fatigued to handle driving. This aspect is actually called fatigued detection. In this area too, you can find things like, Open-Universe Face Identification, Lip reading and Counting.

Visual Business Recognition

Visual Business Recognition Image from Center of Research in Computer Vision, University of Florida

A problem statement may sound like: Given a video, the algorithm should detect and describe the business of the place. This is a very convenient computer vision task. A business recognition system can automatically identify businesses in an image and retrieve additional relevant information such as reviews, ratings, and similar nearby businesses. It involves a web image matching and text processing.

Action Recognition

University Defence Research Collaboration LSSCN Consortium Demo video presented by Dr. Ioannis Kaloskampis

Given a video, what are the actions that are been taken? This is used to determine what kind of activity a person is carrying out. This action recognition can also be used with video surveillance and monitoring.

Cross-view action synthesis

Given an input video of an actor performing some action, the aim to synthesize a video with the same action performed from a novel view with the help of an appearance prior. That is, we try to replicate the video.

Object tracking

Object tracking

This is an application of deep learning where the program takes an initial set of object detections and develops a unique identification for each of the initial detections and then tracks the detected objects as they move around frames in a video. The meaning of this task is simple by reading it’s title.

It comes in many form: Video Tracking, Visual Tracking(it estimates the future position of a visual target that was initialized without the availability of the rest of the video), Image Tracking and also Object tracking camera.

A sample video where Computer Vision could have done the work faster (The video is long!)

In Conclusion

Computer Vision as we now have and idea of it is a very powerful tool, even stronger than Man in some areas. Computer Vision can do a lot more than has been shown here. I am here to tell you there is a lot we can do with Computer Vision example tasks and incorporate them into our ideas, jobs and businesses just like Amazon has done.

Let’s continue to design and build intelligent systems! Cheers.

--

--