From autonomous driving to social media search algorithms, Computer Vision seems to be intertwined with many of the most disruptive technology trends of our time. But what it is and how it works remained a mystery for me just like the magic behind other trending buzzwords like artificial intelligence or blockchain.
Being at Motius has allowed me to dig a little deeper and learn a great deal about the connections and differences between the underlying technologies driving these global tech trends. Here’s how my first real look at Computer Vision went:
An attempt at untying the Gordian Knot of computer vision and the technologies involved.
Coming from a management background, I recently sat down with one of the experts for everything Computer Vision here at Motius. The goal: get a decent grasp of the topic, including what it is, how it works, and what it is used for. I wanted to be able to summarize the most relevant points and explain to a non-expert what the heck it’s all about.
While the plan was simple, and I was confident I could grasp the concepts and ideas with her help, what was supposed to only take a meeting and a few hours of research, ended up being at least a two-week process. It was a constant back and forth with me asking questions, reading articles and questioning if I was understanding more, or less, about the topic.
Easy — not so easy?
Grasping the overall goal of Computer Vision was easy. It is to enable machines to understand and use visual information in the form of images, videos, or real-time camera feeds on or beyond the level of the human visual system. What was near impossible was to draw clear lines between the terms and technologies surrounding the buzzword.
From image processing, computer vision, machine learning and deep learning to recognition, segmentation, classification, 3D reconstruction, and many more, the sheer amount of new terms was already overwhelming.
By the way: we have summarized all the general information you need to know in a Motius 101 OnePager — you can find it here.
But what really made me question my sanity was the fact that some processes can have multiple different approaches, some of which are considered computer vision, others not. And that changes from author to author.
Most articles regarding Computer Vision applications mention some form of image processing and machine learning portion but often use it interchangeably with computer vision processes.
That’s when I started wondering:
Are there specific methods unique to Computer Vision that are not image processing, nor machine learning. Or is Computer Vision only a buzzword for systems that combine the two?
So, how did I end up understanding Computer Vision?
There is an enormous variety of tasks that can be solved by Computer Vision processes, including object recognition, motion detection, or 3D modeling. While they can all be grouped under the term Computer Vision, they can differ greatly in the actual technologies and methods used within them.
Many Computer Vision systems nowadays rely on deep learning approaches and use convolutional neural networks to, for example, recognize different objects within images. Others don’t require machine learning at all.
So there seem to be two equally valid definitions for what Computer Vision is.
- Any system or process related to computational processing, analyzing, and understanding of visual information regardless of the specific technologies involved.
- Specific algorithms and methods used within Computer Vision systems that cannot be placed into other technological categories like image processing or machine learning.
The first definition is the big-picture one, where anything that relates to, or uses computational processes based on visual information, is called Computer Vision. The decisive factor is the ultimate objective or task the system tries to solve. If it is in line with the general objective, it can be called Computer Vision regardless of the specific underlying technologies.
The second one is tougher. Many modern applications rely heavily on deep learning techniques that combine, or eliminate the need for, certain processes altogether due to their effectiveness, accuracy, and comparably easy setup. So separating machine learning from other steps and technologies within the overall process can be difficult.
Even more so, deciding if a technique is purely image processing related, or already considered specific to Computer Vision can differ from opinion to opinion.
At this level differentiating based on tasks also helps clear things up.
Basic enhancements of the raw input, like rotation, filters, contrast changes etc., belong to the category of image processing, since both their input and output is an image. They are used for improved usability in further steps of the Computer Vision process. No interpretation or content-related information or understanding can be generated with them. But some form of image processing is used in pretty much every Computer Vision system to improve the effectiveness of further steps.
Processes that generate information about an image purely reliant on technical aspects like the distance of an object from the lens, or the size and location of features like lines or corners, count towards Computer Vision techniques.
The crucial difference to image processing methods is the output, which is not an image but information regarding the contents of one. These more traditional techniques are still used extensively in localization and mapping tasks that build the base for applications for autonomous vehicles, for example.
Processes that go beyond the technical information of an image and focus on extracting a deeper understanding of the image’s contents like the different objects and what they are, for example, are based on machine learning algorithms.
Recognition and classification tasks are among the most well-known examples. This step of generating a much deeper understanding of the visual input offers the most possibilities, and deep neural networks in particular are proving highly effective in comparison to more traditional models.his step has become somewhat of a mainstream understanding of what Computer Vision can achieve, even though the key processes are machine learning based and Computer Vision solves many other tasks with or without machine learning.
Wanna know what it needs to use computer vision within your project? One of the results of my talks with our experts is the following cheat sheet.
So, to make clear again:
- Image processing tasks focus on enhancing an image but will only yield an improved image as the output.
- Computer Vision tasks provide technical information about the image such as distances of specific objects in relation to other objects or the sensor itself.
- Machine learning is its own field focused on enabling machines to use data to improve autonomously instead of a computer engineer programming every change by hand.
- In Computer Vision related applications, machine learning is used to generate a deeper understanding of an image’s contents and their relation to another, like a cat sitting on a tree because a dog is underneath the tree.
So, while the term Computer Vision can represent the general goal of enabling machines to understand and use visual information. And although all applications work towards some part of that goal, not every Computer Vision application is the same, and the underlying technologies can be significantly more complex than just “a computer vision algorithm.”