Published in


What does it take to create a real-world computer vision project?

image from TechTalks

Computer vision (also known as machine vision) along with Natural Language Processing (NLP), is one of the most intriguing but highly complex subfields of artificial intelligence.

At Finlabs we regularly dive into these technologies in our work and, to keep up with the rapidly evolving landscape, constantly grow our knowledge in this area. In doing so we have acquired a good deal of knowledge on what it takes to use these technologies effectively and what's involved in applying computer vision to a real-world project.

There are many factors to consider in setting the foundation for these types of projects. The kind of camera to select, the placing of the computation of the computer vision algorithms, the libraries and frameworks that will be needed, etc. Computer vision is a demanding field that requires a lot of skills and know-how from developers. But it is a field that will certainly not leave you bored.

There has been a large amount of high-quality research conducted on computer vision technologies and the Machine Vision Group in Oulu, Finland, is a highly recognized worldwide research group that has expertise spanning nearly 40 years. To give an example of their impact in the field, the Local Binary Pattern (LBP) method was originally developed in Oulu. Nowadays, the LBP method is widely used, often with OpenCV and Python. Computer vision algorithms and methodologies are constantly evolving as scientific papers are being published in international conferences and journals. Private consultancies are also conducting research on computer vision to practically solve demanding problems.

The science is there, and you have many options for choosing the algorithms and methodologies.

But how do you implement a fully functional computer vision product to be fully utilized by customers, in practice?

In general, many computer vision cases are possible to implement with just a laptop, Python, and OpenCV. The fact is that there exists a variety of use cases for computer vision, and each case is highly different. One should not solely rely on one set of requirements.

Check out this awesome tutorial on how to implement a COVID-19 social distancing detector using OpenCV, deep learning, and computer vision.

What kind of use cases are there for computer vision?

One of the benefits of computer vision is that the impact of the solution can be easily noticed and explained. This ultimately leads to cost savings and an increased return on investment (ROI). For example, a football club may analyze players’ movement from the video data to understand what kind of foot control do the players have, what their running technique is during the game, how they pass the ball, and the players’ involvement in kicks, etc. The main idea in analyzing football video data is to gather the information that will help design and adapt training, helping coaches and scouts to make correct recruitment decisions for players. What are the players like and what kind of characteristics define them in the football field?

by NFL Player Health & Safety

In the case of factories and manufacturers, they may also detect certain flaws from production data. If the video of a production line shows invalid products, an alarm may be raised and a human gatekeeper can come to the rescue. The camera may inspect various indicators and combine these in order to make decisions regarding security, machine breakdown, or quality of the product. A process could be fully stopped or controlled automatically. In a waste incinerator facility, it is possible to inspect the quality of the upcoming waste and remove unwanted material.

Facial recognition is also another solution provided by computer vision technology. This is already present in mobile devices where users can be recognized. However, in the world of television and live broadcasting, conducting computer vision is challenging. Incorporating computer vision algorithms in broadcast servers quickly interferes with the TV broadcasts.

Egoscue, a company specializing in non-medical pain treatment with the Egoscue method, is utilizing computer vision software to analyze client postures to create customized workout plans for each individual client. It’s also possible to track the client's progress in improving their posture with computer vision.

As you can see, there is a large number of varying use cases for using computer vision. All are dependent on the exact business case, local environment, weather conditions, network environment, and available equipment.

How to select a camera?

Selecting a camera is the most important aspect of every computer vision project. This selection is defined by the exact use case. One could select a cell phone camera, a hardware camera, or a ready-made off-the-shelf camera, such as a security camera or a traffic camera. In the last case, only the video stream is taken for further processing. The main benefit of having a camera in comparison to just having Internet of Things (IoT) sensors is that you are able to retrieve multifaceted video data that can be used for a variety of purposes.

It is extremely important to have a broad knowledge of how to retrieve enough quality pictures and video data so that producing your data does not become too costly. There are also environmental challenges. For example, a factory may have a lot of dust. It may be required to install the camera to the roof making adjustments hard. When utilizing outside cameras and computers they have to withstand the local climate. Plus, setting the correct lighting conditions along with the camera is of utmost importance for a successful computer vision project.

by ShareGrid on Unsplash

Where to place the computer vision algorithms?

This naturally depends on the specific use case. Imagine having a computer vision solution employed in a very harsh environment in which there are lots of variables involved, and the network connections are highly unstable. In such a dynamic environment, having computer vision algorithms located in the cloud is usually not possible due to the high network bandwidth demand. In this case, the computer vision algorithms must be placed into the camera device itself.

However, in a very stable environment where network conditions are predictable and reliable, the computer vision algorithms may be placed into the cloud, such AWS or Azure.

Another hybrid solution is to use the cloud environment primarily for algorithm hosting and keep the mobile device as a backup alternative for situations in which the network conditions become problematic and unstable.

What kind of libraries, frameworks, and programming languages should you use?

When it comes to programming languages, Python and C++ are the most popular and common in computer vision. Python is a very handy language in learning the basics of computer vision, while C++ is the de facto language of computer vision in an environment where speed is of utmost importance, such as an embedded system.

There is a long list of frameworks and libraries for computer vision. OpenCV is a well-known library available both for Python and C++. It can be used as a basis for learning different computer vision methodologies and algorithms. Keras, TensorFlow, and TensorFlowLite can be used for training neural networks. However, in a real-world scenario, everything needs to be developed from the ground up by using different Graphical Processing Unit (GPU) frameworks such as Cuda Nvidia or Jetson Nano SDK (that uses Cuda as a basis). There are also various iOS and Android libraries for computer vision on mobile devices. Notable ones include ARkit and ARCore.

In addition to libraries and frameworks, Azure Kinect DK is a Developer kit (DK) used with the Azure Kinect camera, which includes advanced AI sensors for building computer vision and speech models. Furthermore, the sensors and the powerful Software Development Kits (SDKs) can be connected to Azure cognitive services.

by Kevin Ku on Unsplash

What kind of computer vision companies are there?

Companies utilizing computer vision come from a variety of industries such as automotive, sports, and autonomous driving, to name a few. SportLogiq is a Canadian based company specializing in ice hockey, soccer, and American football sports insights through their offering of a “fully automated, real-time player 6 camera tracking system”. Univrses is a Swedish company based in Stockholm that is currently developing a platform called 3DAI City, where they are deploying camera units on public vehicles operating throughout cities. The images are processed by Univrses’ proprietary algorithms to derive meaningful data relevant to the successful functioning of the city. This data can be used for smart city monitoring.

There are also numerous companies related to autonomous driving vehicles. is a Pittsburgh based company building self-driving technology. The company builds the software, hardware, maps, and cloud-support infrastructure that power self-driving vehicles. Waymo began as the Google Self-Driving Car Project in 2009.

by Waymo

What is the general workflow of a computer vision project for a customer?

The most important aspect of a computer vision project is to understand the business case of the customer. What is the exact return on investment? What is the general problem area, and the key performance indicators? The second especially important aspect is camera selection. What is the most suitable camera for the business case? Is it a cell phone camera, a fixed camera, or a camera off-the-shelf? Determining this requires research and investigation on different camera models, their technical specifications, and their characteristics.

Following that, a decision should be made on where to place the computation of the computer vision algorithms. This is again dependent on the exact business case. The computation can be placed into the camera itself, into a cloud environment such as AWS or Azure, or into a private server.

The choice of a programming language is then dependent on the selection of the computation location. If the computer vision algorithms are placed into the cloud, there are more options for choosing the programming language. However, in some cases the algorithm computations must reside in the camera itself, meaning that computation speed is of utmost importance leaving fewer options for the language. Consequently, many important decisions need to be made before selecting the programming language, libraries, and frameworks.

Once a decision on the language is made you can start to look at the different libraries and frameworks that are available. A best practice is to look for ready-made implementations of computer vision algorithms. If an implementation for an algorithm is not available it is then up to the developers to make their own. Notable libraries and frameworks to start looking at include the following:

OpenCV, Cuda, ARkit, ARCore, Unity AR Foundation, TensorRT, Keras, Pytorch, SimpleCV, AWS Rekognition, Azure Cognitive Services, and AWS SageMaker, to name a few.

Finally, once the aforementioned dependencies have been settled, the actual development work and training of the machine-learning or deep-learning algorithms can start.

Where can you learn the basics of computer vision?

There are many courses that focus on computer vision utilizing Python, C++, and OpenCV for learning the basic methodologies and algorithms.

A very popular website that has numerous computer vision-related blog posts and tutorials is PyImagesearch. They also offer books and crash courses. But the crown jewel of the site is the PyImageSearch Gurus course. This is an excellent practical computer vision course aimed for beginners with in-depth instructions on how to install all the necessary software on a laptop. I highly recommend this course as a basis for practical computer vision work.

Another website on learning computer vision is Both Python and C++ are used by the courses, and there is also a course that uses Pytorch in Azure environment. Finally, Udacity offers a nanodegree program, “Become a Computer Vision Expert”.

In conclusion…

Hopefully, this is a helpful high-level overview of the practical steps it takes when establishing and undertaking a computer vision project. The natural workflow of a fully functional computer vision product includes many small and large technical details not mentioned in this article. All in all, computer vision is a demanding and complex field that requires high-quality skills and knowledge from developers. Finding a solution is not always straightforward and requires careful inspection and research before the actual hard development can begin.

There are a plethora of different use cases for computer vision, and the field is constantly evolving. New algorithms are being developed. New frameworks and software libraries are published. And different kinds of services arise as more is explored and discovered. Developers are thus required to have an up-to-date multidisciplinary set of skills in order to excel.

by Ben Stern on Unsplash

Ultimately, learning computer vision is a marathon, not a 100-meter dash, and is a field that will definitely not leave you bored!

Olli Mämmelä is Finlabs Senior AI Engineer and our master of data science and machine learning at Finlabs.

He is an integral part of the Finlabs computer vision team where his passion for learning and development takes our products and services to another level. His specialties include algorithm design and development, machine learning, data analysis, and programming.

Olli has a Master’s Degree in Information Engineering and a Ph.D. in Telecommunications Engineering. He has more than 10 years of working experience in both international and national research projects coming from University of Oulu and VTT.




Insights and news from Finlabs and our dedicated team of product design and development experts.

Recommended from Medium

Tech for Good London at MozFest House

The rise of the pro-sumer — feeding deep learning’s curiosity

How intelligent is Artificial Intelligence?

Rudiments of AI

Leveraging AI for Customer Experience

How can Businesses Recruit Talent in AI

How does ArtiChain work?

How Is AI Working For Rideshare Drivers?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store


We focus on creating human-centered products and services that are built with a process that is also deeply human.

More from Medium

What is Object Detection? An Overview and An Introduction

Data Labeling 101: An Introduction to Annotation Techniques for Computer Vision

Intuition and Implementation of Non-Max Suppression Algorithm in Object Detection

Fun With AI: Create Figures ‘Magically’ on Screen Using MediaPipe and OpenCV